Question.6 HOTSPOT – An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model. Select and order the steps from the following list to create and use the features in Feature Store. Each step should be selected one time. (Select and order three.) • Access the store to build datasets for training. • Create a feature group. • Ingest the records. ![]() |
6. Click here to View Answer
Answer:

Explanation:
Step 1: Create a feature group.
A Feature Group is like a schema for storing features. It defines the structure of the features you want to store, including:
Feature names
Data types
Record identifiers
Event timestamps
Creating a feature group is the first step before you can store any data in the Feature Store.
Think of this like defining a table in a relational database before inserting data into it.
Step 2: Ingest the records.
Once the Feature Group is created, you can ingest (insert) the actual data/records into it.
This means populating the Feature Group with rows of data where each row represents a specific entity (e.g., a user, product, or session).
Ingested records are stored in online and/or offline stores, depending on how the Feature Group is configured.
Step 3: Access the store to build datasets for training.
After data is ingested, you can retrieve the features from the Feature Store (usually the offline store) to build training datasets for machine learning models.
This is typically done before launching training jobs in Amazon SageMaker.
Question.7 HOTSPOT – A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket. Select and order the pipeline’s correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.) • An S3 event notification invokes the pipeline when new data is uploaded. • S3 Lifecycle rule invokes the pipeline when new data is uploaded. • SageMaker retrains the model by using the data in the S3 bucket. • The pipeline deploys the model to a SageMaker endpoint. • The pipeline deploys the model to SageMaker Model Registry. ![]() |
7. Click here to View Answer
Answer:

Explanation:
Step 1: An S3 event notification invokes the pipeline when new data is uploaded.
In AWS, an S3 event notification is often used to trigger a pipeline when new data is uploaded to a bucket. This is a common approach to automate ML workflows by detecting new training data and triggering retraining processes.
Step 2: SageMaker retrains the model by using the data in the S3 bucket.
Once the pipeline is triggered by the S3 event, SageMaker retrieves the new data from the S3 bucket and retrains the ML model. This is a key step in continuous model training workflows.
Step 3: The pipeline deploys the model to a SageMaker endpoint.
After retraining, the final step in an ML pipeline is often deployment. The trained model is deployed to a SageMaker endpoint so it can be used for inference in real-time applications.
Question.8 HOTSPOT – An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs). Select the correct generative AI term from the following list for each description. Each term should be selected one time or not at all. (Select three.) • Embedding • Retrieval Augmented Generation (RAG) • Temperature • Token ![]() |
8. Click here to View Answer
Answer:

Explanation:
1. Text representation of basic units of data processed by LLMs
Token.
Tokens are the smallest units of text that LLMs process.
A token can be a word, part of a word, or even punctuation depending on the tokenizer.
LLMs (like GPT) break input text into tokens before performing predictions.
2. High-dimensional vectors that contain the semantic meaning of text
Embedding.
Embeddings are numerical representations of text in a high-dimensional space.
These vectors preserve the semantic meaning, allowing LLMs or other models to understand the context and relationships between words or phrases.
3. Enrichment of information from additional data sources to improve a generated response
Retrieval Augmented Generation (RAG)
RAG is a technique where the model retrieves external documents or data (e.g., from a vector store or a knowledge base) before generating a response.
This helps the model augment its output with factual or contextual information it might not have memorized.
Question.9 HOTSPOT – An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes: • Feature splitting • Logarithmic transformation • One-hot encoding • Standardized distribution Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.) ![]() |
9. Click here to View Answer
Answer:

Explanation:
1. City (name) :One-hot encoding.
The city name is a categorical variable (nominal data) that represents a set of distinct categories without any inherent order.
One-hot encoding is a common technique for handling categorical variables where each category is converted into a binary (0 or 1) vector.
This avoids assigning any numerical relationship between categories, which would be misleading in a model.
Example: If we have three cities—New York, Los Angeles, and Chicago—one-hot encoding would transform them into:
New York → [1, 0, 0]
Los Angeles → [0, 1, 0]
Chicago → [0, 0, 1]
2. Type_year (type of home and year the home was built) : Feature splitting.
This feature contains two types of information: (1) the type of home and (2) the year it was built.
Feature splitting means breaking this single column into two separate features: one for home type and one for year.
Example:
“Apartment_2000” → Apartment (categorical feature) and 2000 (numerical feature)
This improves data representation and allows the model to process each type of information correctly.
3.Size of the building (square feet or square meters) : Logarithmic transformation.
The size of a building is a continuous numerical variable that can have a wide range of values.
Logarithmic transformation helps normalize skewed distributions where there are a few extremely large values.
It reduces the impact of large outliers and makes the data more normally distributed, improving model performance.
Example:
Raw data: 500, 1000, 5000, 10000, 50000
After log transformation: 2.7, 3.0, 3.7, 4.0, 4.7 (values are compressed in a more manageable range).
Question.10 An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model’s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. Which AWS service or feature can aggregate the data from the various data sources? (A) Amazon EMR Spark jobs (B) Amazon Kinesis Data Streams (C) Amazon DynamoDB (D) AWS Lake Formation |
10. Click here to View Answer
Answer: A