Start your DEA-C01 Exam Questions Preparation with Updated 132 Questions [Q20-Q44]

Share

Start your DEA-C01 Exam Questions Preparation with Updated 132 Questions

A Fully Updated 2025 DEA-C01 Exam Dumps - PDF Questions and Testing Engine

NEW QUESTION # 20
A marketing company uses Amazon S3 to store clickstream data. The company queries the data at the end of each day by using a SQL JOIN clause on S3 objects that are stored in separate buckets.
The company creates key performance indicators (KPIs) based on the objects. The company needs a serverless solution that will give users the ability to query data by partitioning the data.
The solution must maintain the atomicity, consistency, isolation, and durability (ACID) properties of the data.
Which solution will meet these requirements MOST cost-effectively?

  • A. Amazon EMR
  • B. Amazon Redshift Spectrum
  • C. Amazon S3 Select
  • D. Amazon Athena

Answer: D


NEW QUESTION # 21
Ron, Snowflake Developer needs to capture change data (insert only) on the source views, for that he follows the below steps:
Enable change tracking on the source views & its underlying tables.
Inserted the data via Scripts scheduled with the help of Tasks.
then simply run the below Select statements.
1.select *
2.from test_table
3.changes(information => append_only)
4.at(timestamp => (select current_timestamp()));
Select the Correct Query Execution Output option below:

  • A. Select query will fail with error: 'SQL compilation error-Incorrect Keyword "Chang-es()" found'
  • B. Developer missed to create stream on the source table which can further query to cap-ture DML records.
  • C. No Error reported, select command gives Changed records with Metadata columns as change tracking enabled on the Source views & its underlying tables.
  • D. Select statement complied but gives erroneous results.

Answer: C

Explanation:
Explanation
As an alternative to streams, Snowflake supports querying change tracking metadata for tables or views using the CHANGES clause for SELECT statements. The CHANGES clause enables query-ing change tracking metadata between two points in time without having to create a stream with an explicit transactional offset.
To Know more about Snowflake CHANGES clause, please refer the mentioned link:
https://docs.snowflake.com/en/sql-reference/constructs/changes


NEW QUESTION # 22
The following CREATE DATABASE command creates a clone of a database snowmy_db i.e.
Create database pods_db clone snowmy_db
before (statement => '7e5d0cb9-005e-94e6-b058-k8f5b37c5725');
What are possible reason of failing cloning operation for this database?

  • A. Time Travel Statement query time is beyond the retention time of few current child (e.g., a table) of the Database entity.
  • B. CREATE DATABASE query fails due to compilation error as it do not support state-ment keyword.
  • C. SQL Compilation error: "Incorrect Syntax 'before' while creating database"
  • D. Time Travel Statement query time is at or before the point in time when the object was created.

Answer: A,D


NEW QUESTION # 23
By default, a newly-created Custom role is not assigned to any user, nor granted to any other role?

  • A. TRUE
  • B. FALSE

Answer: A


NEW QUESTION # 24
A company has an extensive script in Scala that transforms data by leveraging DataFrames. A Data engineer needs to move these transformations to Snowpark.
...characteristics of data transformations in Snowpark should be considered to meet this requirement? (Select TWO)

  • A. User-Defined Functions (UDFs) are not pushed down to Snowflake
  • B. It is possible to join multiple tables using DataFrames.
  • C. Snowpark requires a separate cluster outside of Snowflake for computations
  • D. Snowpark operations are executed lazily on the server.
  • E. Columns in different DataFrames with the same name should be referred to with squared brackets

Answer: B,D

Explanation:
Explanation
The characteristics of data transformations in Snowpark that should be considered to meet this requirement are:
It is possible to join multiple tables using DataFrames.
Snowpark operations are executed lazily on the server.
These characteristics indicate how Snowpark can perform data transformations using DataFrames, which are similar to the ones used in Scala. DataFrames are distributed collections of rows that can be manipulated using various operations, such as joins, filters, aggregations, etc. DataFrames can be created from different sources, such as tables, files, or SQL queries. Snowpark operations are executed lazily on the server, which means that they are not performed until an action is triggered, such as a write or a collect operation. This allows Snowpark to optimize the execution plan and reduce the amount of data transferred between the client and the server.
The other options are not characteristics of data transformations in Snowpark that should be considered to meet this requirement. Option C is incorrect because User-Defined Functions (UDFs) are pushed down to Snowflake and executed on the server. Option D is incorrect because Snowpark does not require a separate cluster outside of Snowflake for computations, but rather uses virtual warehouses within Snowflake. Option E is incorrect because columns in different DataFrames with the same name should be referred to with dot notation, not squared brackets.


NEW QUESTION # 25
Which are the Cloud Platforms that Support Calling an External Function?

  • A. AWS only
  • B. GCP
  • C. AWS,GCP,AZURE
  • D. AWS & AZURE

Answer: C


NEW QUESTION # 26
What is a characteristic of the use of binding variables in JavaScript stored procedures in Snowflake?

  • A. Only JavaScript variables of type number, string and sf Date can be bound
  • B. All Snowflake first-class objects can be bound
  • C. Users are restricted from binding JavaScript variables because they create SQL injection attack vulnerabilities
  • D. All types of JavaScript variables can be bound

Answer: A

Explanation:
Explanation
A characteristic of the use of binding variables in JavaScript stored procedures in Snowflake is that only JavaScript variables of type number, string and sf Date can be bound. Binding variables are a way to pass values from JavaScript variables to SQL statements within a stored procedure. Binding variables can improve the security and performance of the stored procedure by preventing SQL injection attacks and reducing the parsing overhead. However, not all types of JavaScript variables can be bound. Only the primitive types number and string, and the Snowflake-specific type sf Date, can be bound. The other options are incorrect because they do not describe a characteristic of the use of binding variables in JavaScript stored procedures in Snowflake. Option A is incorrect because authenticator is not a type of JavaScript variable, but a parameter of the snowflake.connector.connect function. Option B is incorrect because arrow_number_to_decimal is not a type of JavaScript variable, but a parameter of the snowflake.connector.connect function. Option D is incorrect because users are not restricted from binding JavaScript variables, but encouraged to do so.


NEW QUESTION # 27
A Data Engineer wants to centralize grant management to maximize security. A user needs ownership on a table m a new schema However, this user should not have the ability to make grant decisions What is the correct way to do this?

  • A. Grant ownership to the user on the table
  • B. Add the with managed access parameter on the schema
  • C. Revoke grant decisions from the user on the table
  • D. Revoke grant decisions from the user on the schema.

Answer: B

Explanation:
Explanation
The with managed access parameter on the schema enables the schema owner to control the grant and revoke privileges on the objects within the schema. This way, the user who owns the table cannot make grant decisions, but only the schema owner can. This is the best way to centralize grant management and maximize security.


NEW QUESTION # 28
A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data.
Which solution will meet these requirements with the LEAST operational overhead?

  • A. Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.
  • B. Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.
  • C. Configure an AWS Lambda function to load data from the S3 bucket into a pandas dataframe.
    Write a SQL SELECT statement on the dataframe to query the required column.
  • D. Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.

Answer: A

Explanation:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory-athena-query.html S3 Select allows you to retrieve a subset of data from an object stored in S3 using simple SQL expressions. It is capable of working directly with objects in Parquet format.


NEW QUESTION # 29
A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data engineer has set up the necessary AWS Glue connection details and an associated IAM role. However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint.
The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket.
Which solution will meet this requirement?

  • A. Update the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC gateway endpoint.
  • B. Configure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access the S3 bucket.
  • C. Verify that the VPC's route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.
  • D. Review the AWS Glue job code to ensure that the AWS Glue connection details include a fully qualified domain name.

Answer: C

Explanation:
https://docs.aws.amazon.com/glue/latest/dg/connection-VPC-disable-proxy.html
https://docs.aws.amazon.com/glue/latest/dg/connection-S3-VPC.html


NEW QUESTION # 30
A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format:

Which solution will meet this requirement with the LEAST coding effort?

  • A. Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.
  • B. Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.
  • C. Use AWS Glue DataBrew to read the files. Use the NEST_TO_MAP transformation to create the new column.
  • D. Use AWS Glue DataBrew to read the files. Use the NEST_TO_ARRAY transformation to create the new column.

Answer: C


NEW QUESTION # 31
Which one is not the Core benefits of micro-partitioning

  • A. Enables extremely efficient DML and fine-grained pruning for faster queries.
  • B. Snowflake micro-partitions are derived automatically they do not need to be explicitly defined up-front or maintained by users.
  • C. Columns are also compressed individually within micro-partitions.
  • D. Micro-partitions can overlap in their range of values, helps data skewing.
  • E. Columns are stored independently within micro-partitions, often referred to as colum-nar storage.

Answer: D

Explanation:
Explanation
The benefits of Snowflake's approach to partitioning table data include:
In contrast to traditional static partitioning, Snowflake micro-partitions are derived automatically; they don't need to be explicitly defined up-front or maintained by users.
As the name suggests, micro-partitions are small in size (50 to 500 MB, before compression), which enables extremely efficient DML and fine-grained pruning for faster queries.
Micro-partitions can overlap in their range of values, which, combined with their uniformly small size, helps prevent skew.
Columns are stored independently within micro-partitions, often referred to as columnar storage. This enables efficient scanning of individual columns; only the columns referenced by a query are scanned.
Columns are also compressed individually within micro-partitions. Snowflake automatically de-termines the most efficient compression algorithm for the columns in each micro-partition.


NEW QUESTION # 32
When created, a stream logically takes an initial snapshot of every row in the source object and the contents of a stream change as DML statements execute on the source table.
A Data Engineer, Sophie Created a view that queries the table and returns the CURRENT_USER and CURRENT_TIMESTAMP values for the query transaction. A Stream has been created on views to capture CDC.
Tony, another user inserted the data e.g.
insert into <table> values (1),(2),(3);
Emily, another user also inserted the data e.g.
insert into <table> values (4),(5),(6);
What will happened when Different user queries the same stream after 1 hour?

  • A. All the 6 records would be shown with METADATA$ACTION as 'INSERT' out of which 3 records would be displayed with username 'Tony' & rest 3 records would be displayed with username 'Emily'.
  • B. All the Six records would be displayed with User 'Sohpie' Who is the owner of the View.
  • C. User would be displayed with the one who queried during the session, but Recorded timestamp would be of past 1 hour i.e. actual records insertion time.
  • D. All the Six Records would be displayed with CURRENT_USER & CUR-RENT_TIMESTAMP while querying Streams.

Answer: D

Explanation:
Explanation
When User queries the stream, the stream returns the username for the user. The stream also returns the current timestamp for the query transaction in each row, NOT the timestamp when each row was inserted.


NEW QUESTION # 33
A Data Engineer is trying to load the following rows from a CSV file into a table in Snowflake with the following structure:

....engineer is using the following COPY INTO statement:

However, the following error is received.

Which file format option should be used to resolve the error and successfully load all the data into the table?

  • A. FIELD_DELIMITER = ","
  • B. FIELD OPTIONALLY ENCLOSED BY = " "
  • C. ESC&PE_UNENGLO9ED_FIELD = '\\'
  • D. ERROR_ON_COLUMN_COUKT_MISMATCH = FALSE

Answer: B

Explanation:
Explanation
The file format option that should be used to resolve the error and successfully load all the data into the table is FIELD_OPTIONALLY_ENCLOSED_BY = '"'. This option specifies that fields in the file may be enclosed by double quotes, which allows for fields that contain commas or newlines within them. For example, in row 3 of the file, there is a field that contains a comma within double quotes: "Smith Jr., John". Without specifying this option, Snowflake will treat this field as two separate fields and cause an error due to column count mismatch. By specifying this option, Snowflake will treat this field as one field and load it correctly into the table.


NEW QUESTION # 34
Mark the correct Statements with respect to Secure views & its creation in the SnowFlake Account?

  • A. For a secure view, internal optimizations can indirectly expose data & the view defini-tion is visible to other users.
  • B. To convert an existing view to a secure view and back to a regular view, set/unset the SECURE keyword in the ALTER VIEW or ALTER MATERIALIZED VIEW com-mand.
  • C. The internals of a secure view are not exposed in Query Profile (in the web interface). This is the case even for the owner of the secure view, because non-owners might have access to an owner's Query Profile.
  • D. For non-materialized views, the IS_SECURE column in the Information Schema and Account Usage views identifies whether a view is secure.
  • E. Secure views should not be used for views that are defined solely for query conven-ience, such as views created to simplify queries for which users do not need to under-stand the underlying data representation.

Answer: B,C,D,E

Explanation:
Explanation
Why Should I Use Secure Views?
For a non-secure view, internal optimizations can indirectly expose data.
Some of the internal optimizations for views require access to the underlying data in the base tables for the view. This access might allow data that is hidden from users of the view to be exposed through user code, such as user-defined functions, or other programmatic methods. Secure views do not utilize these optimizations, ensuring that users have no access to the underlying data.
For a non-secure view, the view definition is visible to other users.
By default, the query expression used to create a standard view, also known as the view definition or text, is visible to users in various commands and interfaces.
For security or privacy reasons, you might not wish to expose the underlying tables or internal struc-tural details for a view. With secure views, the view definition and details are visible only to author-ized users (i.e.
users who are granted the role that owns the view).
When Should I Use a Secure View?
Views should be defined as secure when they are specifically designated for data privacy (i.e. to limit access to sensitive data that should not be exposed to all users of the underlying table(s)).
Secure views should not be used for views that are defined solely for query convenience, such as views created to simplify queries for which users do not need to understand the underlying data representation. Secure views can execute more slowly than non-secure views.
Secure views are defined using the SECURE keyword with the standard DDL for views:
To create a secure view, specify the SECURE keyword in the CREATE VIEW or CREATE MA-TERIALIZED VIEW command.
To convert an existing view to a secure view and back to a regular view, set/unset the SECURE keyword in the ALTER VIEW or ALTER MATERIALIZED VIEW command.
The definition of a secure view is only exposed to authorized users (i.e. users who have been grant-ed the role that owns the view). If an unauthorized user uses any of the following commands or in-terfaces, the view definition is not displayed:
SHOW VIEWS and SHOW MATERIALIZED VIEWS commands.
GET_DDL utility function.
VIEWS Information Schema view.
VIEWS Account Usage view.
For non-materialized views, the IS_SECURE column in the Information Schema and Account Us-age views identifies whether a view is secure.
The internals of a secure view are not exposed in Query Profile (in the web interface). This is the case even for the owner of the secure view, because non-owners might have access to an owner's Query Profile.


NEW QUESTION # 35
For evolving schema and high compatibility, which data format should be chosen for downstream analytics?

  • A. JSON
  • B. CSV
  • C. Parquet
  • D. Avro

Answer: D


NEW QUESTION # 36
As Data Engineer, you have been asked to access data held in AWS Glacier Deep Archive storage class for Historical Data Analysis, which one is the correct statement to recommend?

  • A. Data can be accessed from External stage using AWS Private link in this case.
  • B. You cannot access data held in archival cloud storage classes that requires restoration before it can be retrieved.
  • C. Loading data from AWS cloud storage services is supported regardless of the cloud platform that hosts your Snowflake account.
  • D. We can simply access AWS Glacier Deep Archive storage External Stage data using PUT command.
  • E. Upload (i.e. stage) files to your cloud storage account using the tools provided by the cloud storage service.

Answer: B

Explanation:
Explanation
External stage
References data files stored in a location outside of Snowflake. Currently, the following cloud stor-age services are supported:
Amazon S3 buckets
Google Cloud Storage buckets
Microsoft Azure containers
The storage location can be either private/protected or public.
You cannot access data held in archival cloud storage classes that requires restoration before it can be retrieved. These archival storage classes include, for example, the Amazon S3 Glacier Flexible Retrieval or Glacier Deep Archive storage class, or Microsoft Azure Archive Storage.


NEW QUESTION # 37
A healthcare company uses Amazon Kinesis Data Streams to stream real-time health data from wearable devices, hospital equipment, and patient records.
A data engineer needs to find a solution to process the streaming data. The data engineer needs to store the data in an Amazon Redshift Serverless warehouse. The solution must support near real-time analytics of the streaming data and the previous day's data.
Which solution will meet these requirements with the LEAST operational overhead?

  • A. Load the data into Amazon S3. Use the COPY command to load the data into Amazon Redshift.
  • B. Load data into Amazon Kinesis Data Firehose. Load the data into Amazon Redshift.
  • C. Use the Amazon Aurora zero-ETL integration with Amazon Redshift.
  • D. Use the streaming ingestion feature of Amazon Redshift.

Answer: D

Explanation:
https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html Use the Streaming Ingestion Feature of Amazon Redshift: Amazon Redshift recently introduced streaming data ingestion, allowing Redshift to consume data directly from Kinesis Data Streams in near real-time. This feature simplifies the architecture by eliminating the need for intermediate steps or services, and it is specifically designed to support near real-time analytics. The operational overhead is minimal since the feature is integrated within Redshift.


NEW QUESTION # 38
Dominic, a Data Engineer wants to resume the pipe named stalepipe3 which got stale after 14 days. To do the same, he called the SYSTEM$PIPE_FORCE_RESUME function select sys-tem$pipe_force_resume('snowmydb.mysnowschema.stalepipe3','staleness_check_override'); Let's say If the pipe is resumed 16 days after it was paused, what will happened to the event notifi-cation that were received on the first and second days after the pipe was paused?

  • A. Pipe maintains Metadata history of files for 64 days, so in this scenarios Snowpipe pro-cessed all the event notifications that were received for 16 days or so.
  • B. Once the Pipe got stale, all the events got purged automatically & pipe needs to be rec-reated with modified properties.
  • C. All the events get processed from day 1 if the PURGE properties in the PIPE object definition set to be FALSE initially.
  • D. Snowpipe generally skips any event notifications that were received on the first and second days after the pipe was paused.

Answer: D

Explanation:
Explanation
When a pipe is paused, event messages received for the pipe enter a limited retention period. The period is 14 days by default. If a pipe is paused for longer than 14 days, it is considered stale.
To resume a stale pipe, a qualified role must call the SYSTEM$PIPE_FORCE_RESUME function and input the STALENESS_CHECK_OVERRIDE argument. This argument indicates an under-standing that the role is resuming a stale pipe.
For example, resume the stale stalepipe1 pipe in the mydb.myschema database and schema:
select sys-tem$pipe_force_resume('mydb.myschema.stalepipe3','staleness_check_override'); As an event notification received while a pipe is paused reaches the end of the limited retention pe-riod, Snowflake schedules it to be dropped from the internal metadata. If the pipe is later resumed, Snowpipe processes these older notifications on a best effort basis. Snowflake cannot guarantee that they are processed.
For example, if a pipe is resumed 15 days after it was paused, Snowpipe generally skips any event notifications that were received on the first day the pipe was paused (i.e. that are now more than 14 days old).
If the pipe is resumed 16 days after it was paused, Snowpipe generally skips any event notifications that were received on the first and second days after the pipe was paused. And so on.


NEW QUESTION # 39
A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information.
The data engineer must identify and remove duplicate information from the legacy application data.
Which solution will meet these requirements with the LEAST operational overhead?

  • A. Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.
  • B. Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.
  • C. Write a custom extract, transform, and load (ETL) job in Python. Use the DataFrame.drop_duplicates() function by importing the Pandas library to perform data deduplication.
  • D. Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

Answer: A


NEW QUESTION # 40
An airline company is collecting metrics about flight activities for analytics. The company is conducting a proof of concept (POC) test to show how analytics can provide insights that the company can use to increase on-time departures.
The POC test uses objects in Amazon S3 that contain the metrics in .csv format. The POC test uses Amazon Athena to query the data. The data is partitioned in the S3 bucket by date.
As the amount of data increases, the company wants to optimize the storage solution to improve query performance.
Which combination of solutions will meet these requirements? (Choose two.)

  • A. Preprocess the .csv data to JSON format by fetching only the document keys that the query requires.
  • B. Preprocess the .csv data to Apache Parquet format by fetching only the data blocks that are needed for predicates.
  • C. Use an S3 bucket that is in the same AWS Region where the company runs Athena queries.
  • D. Add a randomized string to the beginning of the keys in Amazon S3 to get more throughput across partitions.
  • E. Use an S3 bucket that is in the same account that uses Athena to query the data.

Answer: B,C

Explanation:
https://docs.aws.amazon.com/athena/latest/ug/performance-tuning.html


NEW QUESTION # 41
Which of the following security and governance tools/technologies are known to provide native connectivity to Snowflake? [Select 2]

  • A. Dataiku
  • B. BIG Squid
  • C. Baffle
  • D. ALTR
  • E. Zepl

Answer: C,D

Explanation:
Explanation
Security and governance tools ensure sensitive data maintained by an organization is protected from inappropriate access and tampering, as well as helping organizations to achieve and maintain regula-tory compliance. These tools are often used in conjunction with observability solutions/services to provide organizations with visibility into the status, quality, and integrity of their data, including identifying potential issues.
Together, these tools support a wide range of operations, including risk assessment, intrusion detec-tion/monitoring/notification, data masking, data cataloging, data health/quality checks, issue identi-fication/troubleshooting/resolution, and more.
ALTR & Baffle are correct options here.


NEW QUESTION # 42
A Data Engineer has developed a dashboard that will issue the same SQL select clause to Snowflake every 12 hours.
---will Snowflake use the persisted query results from the result cache provided that the underlying data has not changed^

  • A. 14 days
  • B. 12 hours
  • C. 24 hours
  • D. 31 days

Answer: A

Explanation:
Explanation
Snowflake uses the result cache to store the results of queries that have been executed recently. The result cache is maintained at the account level and is shared across all sessions and users. The result cache is invalidated when any changes are made to the tables or views referenced by the query. Snowflake also has a retention policy for the result cache, which determines how long the results are kept in the cache before they are purged. The default retention period for the result cache is 24 hours, but it can be changed at the account, user, or session level. However, there is a maximum retention period of 14 days for the result cache, which cannot be exceeded. Therefore, if the underlying data has not changed, Snowflake will use the persisted query results from the result cache for up to 14 days.


NEW QUESTION # 43
Which Function would Data engineer used to recursively resume all tasks in Chain of Tasks rather than resuming each task individually (using ALTER TASK ... RESUME)?

  • A. SYSTEM$TASK_RECURSIVE_ENABLE
  • B. SYSTEM$TASK_DEPENDENTS
  • C. SYSTEM$TASK_DEPENDENTS_RESUME
  • D. SYSTEM$TASK_DEPENDENTS_ENABLE

Answer: D

Explanation:
Explanation
To recursively resume all tasks in a DAG(A Directed Acyclic Graph (DAG) is a series of tasks com-posed of a single root task and additional tasks, organized by their dependencies.), query the SYS-TEM$TASK_DEPENDENTS_ENABLE function rather than resuming each task individually (us-ing ALTER TASK ... RESUME).


NEW QUESTION # 44
......

Easy Success Snowflake DEA-C01 Exam in First Try: https://passleader.testkingpdf.com/DEA-C01-testking-pdf-torrent.html