👉Get Full PDF
Question.16 You are designing an Azure Databricks interactive cluster. The cluster will be used infrequently and will be configured for auto-termination. You need to use that the cluster configuration is retained indefinitely after the cluster is terminated. The solution must minimize costs. What should you do? A. Pin the cluster. B. Create an Azure runbook that starts the cluster every 90 days. C. Terminate the cluster manually when processing completes. D. Clone the cluster after it is terminated. |
16. Click here to View Answer
Answer: A
Explanation:
Azure Databricks retains cluster configuration information for up to 70 all-purpose clusters terminated in the last 30 days and
up to 30 job clusters recently terminated by the job scheduler. To keep an allpurpose cluster configuration even after it has
been terminated for more than 30 days, an administrator can pin a cluster to the cluster list.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/clusters/
Question.17 You are monitoring an Azure Stream Analytics job. The Backlogged Input Events count has been 20 for the last hour. You need to reduce the Backlogged Input Events count. What should you do? A. Drop late arriving events from the job. B. Add an Azure Storage account to the job. C. Increase the streaming units for the job. D. Stop the job. |
17. Click here to View Answer
Answer: C
Explanation:
General symptoms of the job hitting system resource limits include:
If the backlog event metric keeps increasing, its an indicator that the system resource is constrained (either because of
output sink throttling, or high CPU).
Note: Backlogged Input Events: Number of input events that are backlogged. A non-zero value for this metric implies that
your job isn’t able to keep up with the number of incoming events. If this value is slowly increasing or consistently non-zero,
you should scale out your job: adjust Streaming Units.
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-scale-jobs
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-monitoring
Question.18 You have an Azure data factory. You need to examine the pipeline failures from the last 60 days. What should you use? A. the Activity log blade for the Data Factory resource B. the Monitor & Manage app in Data Factory C. the Resource health blade for the Data Factory resource D. Azure Monitor |
18. Click here to View Answer
Answer:D
Explanation:
Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.
Reference: https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor
Question.19 You create an Azure Databricks cluster and specify an additional library to install. When you attempt to load the library to a notebook, the library in not found. You need to identify the cause of the issue. What should you review? A. notebook logs B. cluster event logs C. global init scripts logs D. workspace logs |
19. Click here to View Answer
Answer:C
Explanation:
Cluster-scoped Init Scripts: Init scripts are shell scripts that run during the startup of each cluster node before the Spark
driver or worker JVM starts. Databricks customers use init scripts for various purposes such as installing custom libraries,
launching background processes, or applying enterprise security policies.
Logs for Cluster-scoped init scripts are now more consistent with Cluster Log Delivery and can be found in the same root
folder as driver and executor logs for the cluster.
Reference: https://databricks.com/blog/2018/08/30/introducing-cluster-scoped-init-scripts.html
Question.20 You have an Azure Synapse Analytics dedicated SQL pool that contains a large fact table. The table contains 50 columns and 5 billion rows and is a heap. Most queries against the table aggregate values from approximately 100 million rows and return only two columns. You discover that the queries against the fact table are very slow. Which type of index should you add to provide the fastest query times? A. nonclustered columnstore B. clustered columnstore C. nonclustered D. clustered |
20. Click here to View Answer
Answer: B
Explanation:
Clustered columnstore indexes are one of the most efficient ways you can store your data in dedicated SQL pool.
Columnstore tables won’t benefit a query unless the table has more than 60 million rows.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool