Question.31 A company is using Amazon SageMaker to create ML models. The company’s data scientists need fine- grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model discovery experiments and must establish model governance for auditing and compliance verifications. Which solution will meet these requirements? (A) Use AWS CodePipeline and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications. (B) Use AWS CodePipeline and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications. (C) Use SageMaker Pipelines and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications. (D) Use SageMaker Pipelines and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications. |
31. Click here to View Answer
Correct Answer: C
SageMaker Pipelines provides a directed acyclic graph (DAG) view for managing and visualizing ML workflows with fine-grained control. It integrates seamlessly with SageMaker Studio, offering an intuitive interface for workflow orchestration.
SageMaker ML Lineage Tracking keeps a running history of experiments and tracks the lineage of datasets, models, and training jobs. This feature supports model governance, auditing, and compliance verification requirements
Question.32 A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive. A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database. Which solution will meet these requirements with the LEAST implementation effort? (A) Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time. (B) Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist. (C) Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist. (D) Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data.Share the dataset with the data scientist. |
32. Click here to View Answer
Correct Answer: A
Dynamic data maskingallows you to control how sensitive data is presented to users at query time, without modifying or storing transformed versions of the source data. Amazon Redshift supports dynamic data masking, which can be implemented with minimal effort. This solution ensures that the data scientistcan access the required information while sensitive data remains protected, meeting the requirements efficiently and with the least implementation effort.
Question.33 An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company’s ML engineers are assigned to specific advertisement campaigns. The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns. Which solution will meet these requirements in the MOST operationally efficient way? (A) Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers’ campaigns. (B) Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies. (C) Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns. (D) Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers’ campaigns. |
33. Click here to View Answer
Correct Answer: C
AWS Lake Formation provides fine-grained access control and simplifies data governance for data lakes. By configuring Lake Formation tags to map ML engineers to their specific campaigns, you can restrict access to both structured and unstructured data in the data lake. This method is operationally efficient, as it centralizes access control management within Lake Formation and ensures consistency across Amazon Athena and S3 bucket access without requiring manual updates to policies or DynamoDB-based custom logic.
Question.34 A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company’s Amazon S3 bucket every 3-4 days. The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket. Which solution will meet these requirements with the LEAST operational effort? (A) Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training. (B) Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded. (C) Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule. (D) Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded. |
34. click here to View Answer
Correct Answer: C
UsingAmazon EventBridgewith an event pattern that matches S3 upload events provides an automated, low- effort solution. When new data is uploaded to the S3 bucket, the EventBridge rule triggers the SageMaker pipeline. This approach minimizes operational overhead by eliminating the need for custom scripts or external orchestration tools while seamlessly integrating with the existing S3 and SageMaker setup.
Question.35 An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize the production inference data in the same way as the training data before passing the production inference data to the model for predictions. Which solution will meet this requirement? (A) Apply statistics from a well-known dataset to normalize the production samples. (B) Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples. (C) Calculate a new set of min-max normalization statistics from a batch of production samples. Use these values to normalize all the production samples. (D) Calculate a new set of min-max normalization statistics from each production sample. Use these values to normalize all the production samples. |
35. Click here to View Answer
Correct Answer: B
To ensure consistency between training and inference, themin-max normalization statistics (min and max values)calculated during training must be retained and applied to normalize production inference data. Using the same statistics ensures that the model receives data in the same scale and distribution as it did during training, avoiding discrepancies that could degrade model performance. Calculating new statistics from production data would lead to inconsistent normalization and affect predictions.