Question.16 Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of data. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.) (A) Create a new view over events using standard SQL (B) Create a new partitioned table using a standard SQL query (C) Create a new view over events_partitioned using standard SQL (D) Create a service account for the ODBC connection to use for authentication (E) Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connection and shared “events” |
16. Click here to View Answer
Answers are; ( C)Create a new view over events_partitioned using standard SQL (D) Create a service account for the ODBC connection to use for authentication
C = A standard SQL query cannot reference a view defined using legacy SQL syntax.
D = For the ODBC drivers is needed a service account which will get a standard Bigquery role.
Question.17 Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost? (A) Migrate the workload to Google Cloud Dataflow (B) Use pre-emptible virtual machines (VMs) for the cluster (C) Use a higher-memory node so that the job runs faster (D) Use SSDs on the worker nodes so that the job can run faster |
17. Click here to View Answer
Answer is (B) Use pre-emptible virtual machines (VMs) for the cluster
Hadoop/Spark jobs are run on Dataproc, and the pre-emptible machines cost 80% less
Reference:
https://cloud.google.com/dataproc/docs/concepts/compute/preemptible-vms
Question.18 Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log data. How should you set up the log data transfer into Google Cloud? Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional storage bucket as a final destination. |
18. Click here to View Answer
Answer is (A) Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
Destination is GCS and having multi-regional so A is the best option available.
Even since BigQuery Data Transfer Service initially supports Google application sources like Google Ads, Campaign Manager, Google Ad Manager and YouTube but it does not support destination anything other than bq data set
Question.19 You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do? (A) Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query. (B) Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query. (C) Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query. (D) Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query. |
19. Click here to View Answer
Answer is (B) Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query.
A and B are correct, but B is the best answer
The advantages of creating external tables are that they are fast to create so you skip the part of importing data and no additional monthly billing storage costs are accrued to your account since you only get charged for the data that is stored in the data lake, which is comparatively cheaper than storing it in BigQuery
A : Importing data into BigQuery may take more time compared to creating external tables on data. Additional storage costs by BigQuery is another issue which can be more expensive than Google Storage.
Question.20 You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use? (A) Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data. (B) Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query. (C) Use Cloud Storage for storage. Link as permanent tables in BigQuery for query. (D) Use Cloud Storage for storage. Link as temporary tables in BigQuery for query. |
20. Click here to View Answer
Answer is (C) Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
BigQuery can access data in external sources, known as federated sources. Instead of first loading data into BigQuery, you can create a reference to an external source. External sources can be Cloud Bigtable, Cloud Storage, and Google Drive.
When accessing external data, you can create either permanent or temporary external tables. Permanent tables are those that are created in a dataset and linked to an external source. Dataset-level access controls can be applied to these tables. When you are using a temporary table, a table is created in a special dataset and will be available for approxi- mately 24 hours. Temporary tables are useful for one-time operations, such as loading data into a data warehouse.