Free AWS MLA-C01 exam practice tests online

6. Click here to View Answer

Answer:

Explanation:

Step 1: Create a feature group.

A Feature Group is like a schema for storing features. It defines the structure of the features you want to store, including:

Feature names

Data types

Record identifiers

Event timestamps

Creating a feature group is the first step before you can store any data in the Feature Store.

Think of this like defining a table in a relational database before inserting data into it.

Step 2: Ingest the records.

Once the Feature Group is created, you can ingest (insert) the actual data/records into it.

This means populating the Feature Group with rows of data where each row represents a specific entity (e.g., a user, product, or session).

Ingested records are stored in online and/or offline stores, depending on how the Feature Group is configured.

Step 3: Access the store to build datasets for training.

After data is ingested, you can retrieve the features from the Feature Store (usually the offline store) to build training datasets for machine learning models.

This is typically done before launching training jobs in Amazon SageMaker.

7. Click here to View Answer

Answer:

Explanation:

Step 1: An S3 event notification invokes the pipeline when new data is uploaded.

In AWS, an S3 event notification is often used to trigger a pipeline when new data is uploaded to a bucket. This is a common approach to automate ML workflows by detecting new training data and triggering retraining processes.

Step 2: SageMaker retrains the model by using the data in the S3 bucket.

Once the pipeline is triggered by the S3 event, SageMaker retrieves the new data from the S3 bucket and retrains the ML model. This is a key step in continuous model training workflows.

Step 3: The pipeline deploys the model to a SageMaker endpoint.

After retraining, the final step in an ML pipeline is often deployment. The trained model is deployed to a SageMaker endpoint so it can be used for inference in real-time applications.

8. Click here to View Answer

Answer:

Explanation:

1. Text representation of basic units of data processed by LLMs

Token.

Tokens are the smallest units of text that LLMs process.

A token can be a word, part of a word, or even punctuation depending on the tokenizer.

LLMs (like GPT) break input text into tokens before performing predictions.

2. High-dimensional vectors that contain the semantic meaning of text

Embedding.

Embeddings are numerical representations of text in a high-dimensional space.

These vectors preserve the semantic meaning, allowing LLMs or other models to understand the context and relationships between words or phrases.

3. Enrichment of information from additional data sources to improve a generated response

Retrieval Augmented Generation (RAG)

RAG is a technique where the model retrieves external documents or data (e.g., from a vector store or a knowledge base) before generating a response.

This helps the model augment its output with factual or contextual information it might not have memorized.

9. Click here to View Answer

Answer:

Explanation:

1. City (name) :One-hot encoding.

The city name is a categorical variable (nominal data) that represents a set of distinct categories without any inherent order.

One-hot encoding is a common technique for handling categorical variables where each category is converted into a binary (0 or 1) vector.

This avoids assigning any numerical relationship between categories, which would be misleading in a model.

Example: If we have three cities—New York, Los Angeles, and Chicago—one-hot encoding would transform them into:

New York → [1, 0, 0]

Los Angeles → [0, 1, 0]

Chicago → [0, 0, 1]

2. Type_year (type of home and year the home was built) : Feature splitting.

This feature contains two types of information: (1) the type of home and (2) the year it was built.

Feature splitting means breaking this single column into two separate features: one for home type and one for year.

Example:

“Apartment_2000” → Apartment (categorical feature) and 2000 (numerical feature)

This improves data representation and allows the model to process each type of information correctly.

3.Size of the building (square feet or square meters) : Logarithmic transformation.

The size of a building is a continuous numerical variable that can have a wide range of values.

Logarithmic transformation helps normalize skewed distributions where there are a few extremely large values.

It reduces the impact of large outliers and makes the data more normally distributed, improving model performance.

Example:

Raw data: 500, 1000, 5000, 10000, 50000

After log transformation: 2.7, 3.0, 3.7, 4.0, 4.7 (values are compressed in a more manageable range).

Question.10
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model’s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Which AWS service or feature can aggregate the data from the various data sources?
(A) Amazon EMR Spark jobs
(B) Amazon Kinesis Data Streams
(C) Amazon DynamoDB
(D) AWS Lake Formation

10. Click here to View Answer

Answer: A

Pages: 1 2 3 4 5 6 7 8 9