Question.46 A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages. The Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions The ML spoctafst must ensure that the data does not contain outliers before training the ..el. What can the ML specialist meet these requirements with the LEAST operational overhead? (A) Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and third quartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles. (B) Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Use a Data Wrangler data flow to remove outliers based on the bias report. (C) Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers. (D) Use Amazon Lookout for Equipment to find and remove outliers from the dataset. |
46. Click here to View Answer
Correct Answer: C
Amazon SageMaker Data Wrangler is a tool that helps data scientists and ML developers to prepare data for ML. One of the features of Data Wrangler is the anomaly detection visualization, which uses an unsupervised ML algorithm to identify outliers in the dataset based on statistical properties. The ML specialist can use this feature to quickly explore the sensor data and find any anomalous values that may affect the model performance. The ML specialist can then add a transformation to a Data Wrangler data flow to remove the outliers from the dataset. The data flow can be exported as a script or a pipeline to automate the data preparation process. This option requires the least operational overhead compared to the other options.
Amazon SageMaker Data Wrangler – Amazon Web Services (AWS)
Anomaly Detection Visualization – Amazon SageMaker
Transform Data – Amazon SageMaker
Question.47 A company that promotes healthy sleep patterns by providing cloud-connected devices currently hosts a sleep tracking application on AWS. The application collects device usage information from device users. The company’s Data Science team is building a machine learning model to predict if and when a user will stop utilizing the company’s devices. Predictions from this model are used by a downstream application that determines the best approach for contacting users. The Data Science team is building multiple versions of the machine learning model to evaluate each version against the company’s business goals. To measure long-term effectiveness, the team wants to run multiple versions of the model in parallel for long periods of time, with the ability to control the portion of inferences served by the models. Which solution satisfies these requirements with MINIMAL effort? (A) Build and host multiple models in Amazon SageMaker. Create multiple Amazon SageMaker endpoints, one for each model. Programmatically control invoking different models for inference at the application layer. (B) Build and host multiple models in Amazon SageMaker. Create an Amazon SageMaker endpoint configuration with multiple production variants. Programmatically control the portion of the inferences served by the multiple models by updating the endpoint configuration. (C) Build and host multiple models in Amazon SageMaker Neo to take into account different types of medical devices. Programmatically control which model is invoked for inference based on the medical device type. (D) Build and host multiple models in Amazon SageMaker. Create a single endpoint that accesses multiple models. Use Amazon SageMaker batch transform to control invoking the different models through the single endpoint. |
47. Click here to View Answer
Correct Answer: B
Amazon SageMaker is a service that allows users to build, train, and deploy ML models on AWS. Amazon SageMaker endpoints are scalable and secure web services that can be used to perform real-time inference on ML models. An endpoint configuration defines the models that are deployed and the resources that are used by the endpoint. An endpoint configuration can have multiple production variants, each representing a different version or variant of a model. Users can specify the portion of the inferences served by each production variant using the initialVariantWeight parameter. Users can also programmatically update the endpoint configuration to change the portion of the inferences served by each production variant using the UpdateEndpointWeightsAndCapacities API. Therefore, option B is the best solution to satisfy the requirements with minimal effort.
Option A is incorrect because creating multiple endpoints for each model would incur more cost and complexity than using a single endpoint with multiple production variants. Moreover, controlling the invocation of different models at the application layer would require more custom logic and coordination than using the UpdateEndpointWeightsAndCapacities API. Option C is incorrect because Amazon SageMaker Neo is a service that allows users to optimize ML models for different hardware platforms, such as edge devices. It is not relevant to the problem of running multiple versions of a model in parallel for long periods of time.
Option D is incorrect because Amazon SageMaker batch transform is a service that allows users to perform asynchronous inference on large datasets. It is not suitable for the problem of performing real-time inference on streaming data from device users.
Deploying models to Amazon SageMaker hosting services – Amazon SageMaker Update an Amazon SageMaker endpoint to accommodate new models – Amazon SageMaker UpdateEndpointWeightsAndCapacities – Amazon SageMaker
Question.48 An ecommerce company has observed that customers who use the company’s website rarely view items that the website recommends to customers. The company wants to recommend items to customers that customers are more likely to want to purchase. Which solution will meet this requirement in the SHORTEST amount of time? (A) Host the company’s website on Amazon EC2 Accelerated Computing instances to increase the website response speed. (B) Host the company’s website on Amazon EC2 GPU-based instances to increase the speed of the website’s search tool. (C) Integrate Amazon Personalize into the company’s website to provide customers with personalized recommendations. (D) Use Amazon SageMaker to train a Neural Collaborative Filtering (NCF) model to make product recommendations. |
48. Click here to View Answer
Correct Answer: C
Amazon Personalize is a managed AWS service specifically designed to deliver personalized recommendations with minimal development time. It uses machine learning algorithms tailored for recommendation systems, making it highly suitable for applications where quick integration is essential. By using Amazon Personalize, the company can leverage existing customer data to generate real-time, personalized product recommendations that align better with customer preferences, enhancing the likelihood of customer engagement with recommended items.
Options involving EC2 instances with GPU or accelerated computing primarily enhance computational performance but do not inherently improve recommendation relevance, while Amazon SageMaker would require more development effort to achieve similar results.
Question.49 A company wants to use machine learning (ML) to improve its customer churn prediction model. The company stores data in an Amazon Redshift data warehouse. A data science team wants to use Amazon Redshift machine learning (Amazon Redshift ML) to build a model and run predictions for new data directly within the data warehouse. Which combination of steps should the company take to use Amazon Redshift ML to meet these requirements? (Select THREE.) (A) Define the feature variables and target variable for the churn prediction model. (B) Use the SQL EXPLAIN_MODEL function to run predictions. (C) Write a CREATE MODEL SQL statement to create a model. (D) Use Amazon Redshift Spectrum to train the model. (E) Manually export the training data to Amazon S3. (F) Use the SQL prediction function to run predictions, |
49. Click here to View Answer
Correct Answer: A,C,F
Amazon Redshift ML enables in-database machine learning model creation and predictions, allowing data scientists to leverage Redshift for model training without needing to export data.
To create and run a model for customer churn prediction in Amazon Redshift ML:
* Define the feature variables and target variable: Identify the columns to use as features (predictors) and the target variable (outcome) for the churn prediction model.
* Create the model: Write a CREATE MODEL SQL statement, which trains the model using Amazon Redshift’s integration with Amazon SageMaker and stores the model directly in Redshift.
* Run predictions: Use the SQL PREDICT function to generate predictions on new data directly within Redshift.
Options B, D, and E are not required as Redshift ML handles model creation and prediction without manual data export to Amazon S3 or additional Spectrum integration.
Question.50 A Machine Learning Specialist is attempting to build a linear regression model. Given the displayed residual plot only, what is the MOST likely problem with the model? (A) Linear regression is inappropriate. The residuals do not have constant variance. (B) Linear regression is inappropriate. The underlying data has outliers. (C) Linear regression is appropriate. The residuals have a zero mean. (D) Linear regression is appropriate. The residuals have constant variance. |
50. Click here to View Answer
Correct Answer: A
A residual plot is a type of plot that displays the values of a predictor variable in a regression model along the x-axis and the values of the residuals along the y-axis. This plot is used to assess whether or not the residuals in a regression model are normally distributed and whether or not they exhibit heteroscedasticity.
Heteroscedasticity means that the variance of the residuals is not constant across different values of the predictor variable. This violates one of the assumptions of linear regression and can lead to biased estimates and unreliable predictions. The displayed residual plot shows a clear pattern of heteroscedasticity, as the residuals spread out as the fitted values increase. This indicates that linear regression is inappropriate for this data and a different model should be used. References:
* Regression – Amazon Machine Learning
* How to Create a Residual Plot by Hand
* How to Create a Residual Plot in Python