Quiz 3

Quiz 3

Professional Development

7 Qs

quiz-placeholder

Similar activities

ICT WEBINAR - LAC SESSION

ICT WEBINAR - LAC SESSION

Professional Development

10 Qs

Final task

Final task

Professional Development

10 Qs

For ABL-3

For ABL-3

KG - Professional Development

10 Qs

KUIZ MICROSOFT WORD & EXCEL

KUIZ MICROSOFT WORD & EXCEL

Professional Development

10 Qs

11.1 Windows Desktop and File Explorer

11.1 Windows Desktop and File Explorer

Professional Development

10 Qs

SHI Chapter 4

SHI Chapter 4

Professional Development

10 Qs

Word Pert 8 Format Halaman

Word Pert 8 Format Halaman

Professional Development

10 Qs

Kuiz Pemugaran TS25 PPD Seremban 2023

Kuiz Pemugaran TS25 PPD Seremban 2023

Professional Development

10 Qs

Quiz 3

Quiz 3

Assessment

Quiz

Education

Professional Development

Hard

Created by

CloudThat Technologies

Used 1+ times

FREE Resource

7 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

A data engineer has ingested data from an external source into a PySpark DataFrame raw_df. They need to briefly make this data available in SQL for a data analyst to perform a quality assurance check on the data. Which of the following commands should the data engineer run to make this data available in SQL for only the remainder of the Spark session?

raw_df.createOrReplaceTempView("raw_df")

raw_df.createTable("raw_df")

raw_df.write.save("raw_df")

raw_df.saveAsTable("raw_df")

There is no way to share data between PySpark and SQL.

2.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

A data engineer has developed a code block to perform a streaming read on a data source.

The code block is below:

(spark

.read

.schema(schema)

.format("cloudFiles")

.option("cloudFiles.format", "json")

.load(dataSource))

The code block is returning an error.

Which of the following changes should be made to the code block to configure the block to successfully perform a streaming read?

The .read line should be replaced with .readStream.

A new .stream line should be added after the .read line.

The .format("cloudFiles") line should be replaced with .format("stream").

A new .stream line should be added after the spark line.

A new .stream line should be added after the .load(dataSource) line.

3.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

Databricks SQL

Delta Lake

Unity Catalog

Data Explorer

Auto Loader

4.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

A data engineering team is in the process of converting their existing data pipeline to utilize Auto Loader for incremental processing in the ingestion of JSON files. One data engineer comes across the following code block in the Auto Loader documentation:

(streaming_df = spark.readStream.format("cloudFiles")

.option("cloudFiles.format", "json")

.option("cloudFiles.schemaLocation", schemaLocation)

.load(sourcePath))

Assuming that schemaLocation and sourcePath have been set correctly, which of the following changes does the data engineer need to make to convert this code block to use Auto Loader to ingest the data?

The data engineer needs to change the format("cloudFiles") line to format("autoLoader").

There is no change required. Databricks automatically uses Auto Loader for streaming reads.

There is no change required. The inclusion of format("cloudFiles") enables the use of Auto Loader.

The data engineer needs to add the .autoLoader line before the .load(sourcePath) line.

There is no change required. The data engineer needs to ask their administrator to turn on Auto Loader.

5.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

Which of the following data workloads will utilize a Bronze table as its source?

A job that aggregates cleaned data to create standard summary statistics

A job that queries aggregated data to publish key insights into a dashboard

A job that enriches data by parsing its timestamps into a human-readable format

A job that ingests raw data from a streaming source into the Lakehouse

A job that develops a feature set for a machine learning application

6.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

A data engineer has written the following query:

SELECT * FROM json.`/path/to/json/file.json`;

The data engineer asks a colleague for help to convert this query for use in a Delta Live Tables (DLT) pipeline. The query should create the first table in the DLT pipeline.

Which of the following describes the change the colleague needs to make to the query?

They need to add a COMMENT line at the beginning of the query.

They need to add a CREATE LIVE TABLE table_name AS line at the beginning of the query.

They need to add a live. prefix prior to json. in the FROM line.

They need to add a CREATE DELTA LIVE TABLE table_name AS line at the beginning of the query.

They need to add the cloud_files(...) wrapper to the JSON file path.

7.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

Which of the following statements describes Delta Lake?

A.

B. Delta Lake is an open format storage layer that delivers reliability, security, and

performance.

C. Delta Lake is an open source platform to help manage the complete machine

learning lifecycle.

D. Delta Lake is an open source data storage format for distributed data.

E. Delta Lake is an open format storage layer that processes data.

Delta Lake is an open source analytics engine used for big data workloads.

Delta Lake is an open format storage layer that delivers reliability, security, and

performance.

Delta Lake is an open source platform to help manage the complete machine

learning lifecycle.

Delta Lake is an open source data storage format for distributed data.

Delta Lake is an open format storage layer that processes data.