Search Header Logo

EXAMEN PREPARATION PART 3

Authored by licibeth delacruz

Other

Professional Development

Used 1+ times

EXAMEN PREPARATION PART 3
AI

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

    Content View

    Student View

26 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Which of the following describes how Databricks Repos can help facilitate CI/CD workflows

on the Databricks Lakehouse Platform?

A. Databricks Repos can facilitate the pull request, review, and approval process before

merging branches

B. Databricks Repos can merge changes from a secondary Git branch into a main Git

branch

C. Databricks Repos can be used to design, develop, and trigger Git automation

pipelines

D. Databricks Repos can store the single-source-of-truth Git repository

E. Databricks Repos can commit or push code changes to trigger a CI/CD process

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

A data engineering team needs to query a Delta table to extract rows that all meet the same

condition. However, the team has noticed that the query is running slowly. The team has

already tuned the size of the data files. Upon investigating, the team has concluded that the

rows meeting the condition are sparsely located throughout each of the data files.

Based on the scenario, which of the following optimization techniques could speed up the

query?

A. Data skipping

B. Z-Ordering

C. Bin-packing

D. Write as a Parquet file

E. Tuning the file size

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

A junior data engineer needs to create a Spark SQL table my_table for which Spark

manages both the data and the metadata. The metadata and data should also be stored in

the Databricks Filesystem (DBFS).

Which of the following commands should a senior data engineer share with the junior data

engineer to complete this task?

CREATE TABLE my_table (id STRING, value STRING) USING

org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");

CREATE MANAGED TABLE my_table (id STRING, value STRING) USING

org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");

CREATE MANAGED TABLE my_table (id STRING, value STRING);

CREATE TABLE my_table (id STRING, value STRING) USING DBFS;

CREATE TABLE my_table (id STRING, value STRING);

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

A data engineer wants to create a relational object by pulling data from two tables. The

relational object must be used by other data engineers in other sessions. In order to save on

storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

A. View

B. Temporary view

C. Delta Table

D. Database

E. Spark SQL Table

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

A junior data engineer has ingested a JSON file into a table raw_table with the following

schema:

cart_id STRING,

items ARRAY<item_id:STRING>

The junior data engineer would like to unnest the items column in raw_table to result in a

new table with the following schema:

cart_id STRING,

item_id STRING

Which of the following commands should the junior data engineer run to complete this

task?

A. SELECT cart_id, filter(items) AS item_id FROM raw_table;

B. SELECT cart_id, flatten(items) AS item_id FROM raw_table;

C. SELECT cart_id, reduce(items) AS item_id FROM raw_table;

D. SELECT cart_id, explode(items) AS item_id FROM raw_table;

E. SELECT cart_id, slice(items) AS item_id FROM raw_table;

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

A data engineer has ingested a JSON file into a table raw_table with the following schema:

transaction_id STRING,

payload ARRAY<customer_id:STRING, date:TIMESTAMP, store_id:STRING>

The data engineer wants to efficiently extract the date of each transaction into a table with

the following schema:

transaction_id STRING,

date TIMESTAMP

Which of the following commands should the data engineer run to complete this task?

A. SELECT transaction_id, explode(payload) FROM raw_table;

B. SELECT transaction_id, payload.date FROM raw_table;

C. SELECT transaction_id, date FROM raw_table;

D. SELECT transaction_id, payload[date] FROM raw_table;

E. SELECT transaction_id, date from payload FROM raw_table;

7.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

A data analyst has provided a data engineering team with the following Spark SQL query:

SELECT district,

avg(sales)

FROM store_sales_20220101

GROUP BY district;

The data analyst would like the data engineering team to run this query every day. The date

at the end of the table name (20220101) should automatically be replaced with the current

date each time the query is run.

Which of the following approaches could be used by the data engineering team to

efficiently automate this process?

A. They could wrap the query using PySpark and use Python’s string variable system to

automatically update the table name.

B. They could manually replace the date within the table name with the current day’s

date.

C. They could request that the data analyst rewrites the query to be run less frequently.

D. They could replace the string-formatted date in the table with a

timestamp-formatted date.

E. They could pass the table into PySpark and develop a robustly tested module on the

existing query.

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Classlink

Continue with Classlink

Clever

Continue with Clever

or continue with

Microsoft

Microsoft

Apple

Apple

Others

Others

Already have an account?