Question.41 You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service. What should you do? (A) Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs:// (B) Deploy a Cloud Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs:// (C) Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs:// (D) Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs:// |
41. Click here to View Answer
Answer is (A) Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs://
Cloud Dataproc for Managed Cloud native application and HDD for cost-effective solution.
Question.42 You need to choose a database for a new project that has the following requirements: – Fully managed – Able to automatically scale up – Transactionally consistent – Able to scale up to 6 TB – Able to be queried using SQL Which database do you choose? (A) Cloud SQL (B) Cloud Bigtable (C) Cloud Spanner (D) Cloud Datastore |
42. Click here to View Answer
Answer is (A) Cloud SQL
It asks for scaling up which can be done in cloud sql, horizontal scaling is not possible in cloud sql
Automatic storage increase
If you enable this setting, Cloud SQL checks your available storage every 30 seconds. If the available storage falls below a threshold size, Cloud SQL automatically adds additional storage capacity. If the available storage repeatedly falls below the threshold size, Cloud SQL continues to add storage until it reaches the maximum of 30 TB.
Question.43 What are two of the benefits of using denormalized data structures in BigQuery? (A) Reduces the amount of data processed, reduces the amount of storage required (B) Increases query speed, makes queries simpler (C) Reduces the amount of storage required, increases query speed (D) Reduces the amount of data processed, increases query speed |
43. Click here to View Answer
Answer is (B) Increases query speed, makes queries simpler
Cannot be A or C because:
“Denormalized schemas aren’t storage-optimal, but BigQuery’s low cost of storage addresses concerns about storage inefficiency.”
Cannot be D because the amount of data processed is the same.
As for why is it “simpler”, I don’t see it directly stated but it is hinted at: “Expressing records by using nested and repeated fields simplifies data load using JSON or Avro files.” and “Expressing records using nested and repeated structures can provide a more natural representation of the underlying data.”
Reference:
https://cloud.google.com/solutions/bigquery-data-warehouse
Question.44 Which of the following are examples of hyperparameters? (Select 2 answers.) (A) Number of hidden layers (B) Number of nodes in each hidden layer (C) Biases (D) Weights |
44. Click here to View Answer
Answers are;
(A) Number of hidden layers
(B) Number of nodes in each hidden layer
Hyperparamters are configuration variables and cannot change
Reference:
https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview
Question.45 Which of the following are feature engineering techniques? (Select 2 answers) (A) Hidden feature layers (B) Feature prioritization (C) Crossed feature columns (D) Bucketization of a continuous feature |
45. Click here to View Answer
Answer are;
(C) Crossed feature columns
(D) Bucketization of a continuous feature
Selecting and crafting the right set of feature columns is key to learning an effective model. Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive bins/buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into.
Using each base feature column separately may not be enough to explain the data. To learn the differences between different feature combinations, we can add crossed feature columns to the model.
Reference:
https://cloud.google.com/solutions/machine-learning/ml-on-structured-data-model-2