Question.36 You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones. What should you do? (A) Create an API using App Engine to receive and send messages to the applications (B) Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them (C) Create a table on Cloud SQL, and insert and delete rows with the job information (D) Create a table on Cloud Spanner, and insert and delete rows with the job information |
36. Click here to View Answer
Answer is (B) Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them
Pub/sub will be used to streaming data between application
Question.37 You need to move 2 PB of historical data from an on-premises storage appliance to Cloud Storage within six months, and your outbound network capacity is constrained to 20 Mb/sec. How should you migrate this data to Cloud Storage? (A) Use Transfer Appliance to copy the data to Cloud Storage (B) Use gsutil cp “”J to compress the content being uploaded to Cloud Storage (C) Create a private URL for the historical data, and then use Storage Transfer Service to copy the data to Cloud Storage (D) Use trickle or ionice along with gsutil cp to limit the amount of bandwidth gsutil utilizes to less than 20 Mb/sec so it does not interfere with the production traffic |
37. Click here to View Answer
Answer is (A) Use Transfer Appliance to copy the data to Cloud Storage
Huge amount of data with log network bandwidth, Transfer applicate is best for moving data over 100TB
Question.38 You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include: – Executing the transformations on a schedule – Enabling non-developer analysts to modify transformations – Providing a graphical tool for designing transformations What should you do? (A) Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis (B) Load each month’s CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query (C) Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data’s schema changes (D) Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery |
38. Click here to View Answer
Answer is (A) Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis
Dataprep is used by non developers
Question.39 You work for a shipping company that has distribution centers where packages move on delivery lines to route them properly. The company wants to add cameras to the delivery lines to detect and track any visual damage to the packages in transit. You need to create a way to automate the detection of damaged packages and flag them for human review in real time while the packages are in transit. Which solution should you choose? (A) Use BigQuery machine learning to be able to train the model at scale, so you can analyze the packages in batches. (B) Train an AutoML model on your corpus of images, and build an API around that model to integrate with the package tracking applications. (C) Use the Cloud Vision API to detect for damage, and raise an alert through Cloud Functions. Integrate the package tracking applications with this function. (D) Use TensorFlow to create a model that is trained on your corpus of images. Create a Python notebook in Cloud Datalab that uses this model so you can analyze for damaged packages. |
39. Click here to View Answer
Answer is (B) Train an AutoML model on your corpus of images, and build an API around that model to integrate with the package tracking applications.
For this scenario, where you need to automate the detection of damaged packages in real time while they are in transit, the most suitable solution among the provided options would be B.
Here’s why this option is the most appropriate:
Real-Time Analysis: AutoML provides the capability to train a custom model specifically tailored to recognize patterns of damage in packages. This model can process images in real-time, which is essential in your scenario.
Integration with Existing Systems: By building an API around the AutoML model, you can seamlessly integrate this solution with your existing package tracking applications. This ensures that the system can flag damaged packages for human review efficiently.
Customization and Accuracy: Since the model is trained on your specific corpus of images, it can be more accurate in detecting damages relevant to your use case compared to pre-trained models.
Let’s briefly consider why the other options are less suitable:
A. Use BigQuery machine learning: BigQuery is great for handling large-scale data analytics but is not optimized for real-time image processing or complex image recognition tasks like damage detection on packages.
C. Use the Cloud Vision API: While the Cloud Vision API is powerful for general image analysis, it might not be as effective for the specific task of detecting damage on packages, which requires a more customized approach.
D. Use TensorFlow in Cloud Datalab: While this is a viable option for creating a custom model, it might be more complex and time-consuming compared to using AutoML. Additionally, setting up a real-time analysis system through a Python notebook might not be as straightforward as an API integration.
Question.40 You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do? (A) Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages. (B) Consume the stream of data in Cloud Dataflow using Kafka IO. Set a fixed time window of 1 hour. Compute the average when the window closes, and send an alert if the average is less than 4000 messages. (C) Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtable. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hour. If that number falls below 4000, send an alert. (D) Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuery. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hour. If that number falls below 4000, send an alert. |
40. Click here to View Answer
Answer is (A) Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages.
Kafka IO and Dataflow is a valid option for interconnect (needless where Kafka is located – On Prem/Google Cloud/Other cloud)
Sliding Window will help to calculate average.