Question.21 Case study An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model’s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model. Which algorithm should the ML engineer use to meet this requirement? (A) LightGBM (B) Linear learner (C) #-means clustering (D) Neural Topic Model (NTM) |
21. Click here to View Answer
Correct Answer: B
Why Linear Learner?
* SageMaker’sLinear Learneralgorithm is well-suited for binary classification problems such as fraud detection. It handles class imbalance effectively by incorporating built-in options forweight balancing across classes.
* Linear Learner can capture patterns in the data while being computationally efficient.
Key Features of Linear Learner:
* Automatically weights minority and majority classes.
* Supports both classification and regression tasks.
* Handles interdependencies among features effectively through gradient optimization.
Steps to Implement:
* Use the SageMaker Python SDK to set up a training job with the Linear Learner algorithm.
* Configure the hyperparameters to enable balanced class weights.
* Train the model with the balanced dataset created using SageMaker Data Wrangler.
Question.22 An ML engineer needs to use AWS services to identify and extract meaningful unique keywords from documents. Which solution will meet these requirements with the LEAST operational overhead? (A) Use the Natural Language Toolkit (NLTK) library on Amazon EC2 instances for text pre-processing. Use the Latent Dirichlet Allocation (LDA) algorithm to identify and extract relevant keywords. (B) Use Amazon SageMaker and the BlazingText algorithm. Apply custom pre-processing steps for stemming and removal of stop words. Calculate term frequency-inverse document frequency (TF-IDF) scores to identify and extract relevant keywords. (C) Store the documents in an Amazon S3 bucket. Create AWS Lambda functions to process the documents and to run Python scripts for stemming and removal of stop words. Use bigram and trigram techniques to identify and extract relevant keywords. (D) Use Amazon Comprehend custom entity recognition and key phrase extraction to identify and extract relevant keywords. |
22. Click here to View Answer
Correct Answer: D
Amazon Comprehend provides pre-built functionality for key phrase extraction and can identify meaningful keywords from documents with minimal setup or operational overhead. It eliminates the need for manual preprocessing, stemming, or stop-word removal and does not require custom model development or infrastructure management. This makes it the most efficient and low-maintenance solution for the task.
Question.23 A company is planning to create several ML prediction models. The training data is stored in Amazon S3. The entire dataset is more than 5 ## in size and consists of CSV, JSON, Apache Parquet, and simple text files. The data must be processed in several consecutive steps. The steps include complex manipulations that can take hours to finish running. Some of the processing involves natural language processing (NLP) transformations. The entire process must be automated. Which solution will meet these requirements? (A) Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs. (B) Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge. (C) Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge. (D) Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge. |
23. Click here to View Answer
Correct Answer: D
Amazon SageMaker Pipelines is designed for creating, automating, and managing end-to-end ML workflows, including complex data preprocessing tasks. It supports handling large datasets and can integrate with custom steps, such as NLP transformations. By combining SageMaker Pipelines with Amazon EventBridge, the entire workflow can be triggered and automated efficiently, meeting the requirements for scalability, automation, and processing complexity.
Question.24 A company wants to improve the sustainability of its ML operations. Which actions will reduce the energy usage and computational resources that are associated with the company’s training jobs? (Choose two.) (A) Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected. (B) Use Amazon SageMaker Ground Truth for data labeling. (C) Deploy models by using AWS Lambda functions. (D) Use AWS Trainium instances for training. (E) Use PyTorch or TensorFlow with the distributed training option. |
24. Click here to View Answer
Correct Answer: A,D
SageMaker Debuggercan identify when a training job is not converging or is stuck in a non-productive state.
By stopping these jobs early, unnecessary energy and computational resources are conserved, improving sustainability.
AWS Trainiuminstances are purpose-built for ML training and are optimized for energy efficiency and cost- effectiveness. They use less energy per training task compared to general-purpose instances, making them a sustainable choice.
Question.25 An ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate. During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identifying fraud in new and unseen transactions. What should the ML engineer do to improve the fraud detection for new transactions? (A) Increase the learning rate. (B) Remove some irrelevant features from the training dataset. (C) Increase the value of the max_depth hyperparameter. (D) Decrease the value of the max_depth hyperparameter. |
25. Click here to View Answer
Correct Answer: D
A high max_depth value in XGBoost can lead to overfitting, where the model learns the training dataset too well but fails to generalize to new and unseen data. By decreasing the max_depth, the model becomes less complex, reducing overfitting and improving its ability to detect fraud in new transactions. This adjustment helps the model focus on general patterns rather than memorizing specific details in the training data.