Search Header Logo

Lakeflow Spark Pipelines

Authored by โปร แกรมเมอร์

English

University

Used 7+ times

Lakeflow Spark Pipelines
AI

AI Actions

Add similar questions

Adjust reading levels

Convert to real-world scenario

Translate activity

More...

    Content View

    Student View

20 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

In Spark Declarative Pipelines, what is the primary difference between a streaming table and a materialized view?

Streaming tables are used only for raw data, while materialized views are used only for final reporting tables

Streaming tables always recompute all data on each run, while materialized views never recompute data

Streaming tables ingest and incrementally process incoming data from a source, while materialized views incrementally maintain the results of a query over upstream tables when possible

Streaming tables are used only for batch workloads, while materialized views are used only for streaming workloads

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You are migrating a traditional batch ETL workflow, where the entire dataset is fully reprocessed each time the job runs, into a Lakeflow Spark Declarative Pipeline using a Bronze > Silver > Gold architecture. In the existing workflow:
- Raw files are ingested, cleaned, joined, and aggregated in a single batch job.
- Each run processes all historical data, even when new data has arrived

Which approach best reflects how this workflow should be redesigned using Spark Declarative Pipelines?

Keep the single batch job and run it more frequently to reduce data latency

Split the logic into separate batch jobs and manually orchestrate them in sequence

Define a Bronze streaming table for raw ingestion, Silver tables for cleansing and enrichment, and Gold materialized views for aggregations, allowing the pipeline to manage dependencies and incremental updates

Convert the batch SQL into Python and execute it unchanged in a pipeline notebook

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You have a Lakeflow Spark Declarative Pipeline that includes streaming tables, downstream transformations, and materialized views.
You want to:
-Delete all pipeline checkpoints
-Clear all data from streaming tables
-Reprocess all source data from scratch
-Fully rebuild all downstream tables and materialized views

Which action should you take?

Select delete all and run the pipeline

Manually delete the pipeline and start it again

Run the pipeline with a full table refresh

Run the pipeline with different settings

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

When running a Spark Declarative Pipeline for the second time after landing new data, how many rows should be processed?

Zero rows, requiring a manual refresh

The original rows only

All rows in the source volume

Only the new rows added since the last run

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

When using batch notebook-based ETL for large data volumes, what processing or cost consideration often motivates teams to migrate to Spark Declarative Pipelines?

Batch notebook ETL often fully reprocesses data on each run, increasing compute cost, whereas Spark Declarative Pipelines manage incremental processing automatically

Batch notebook ETL does not support distributed processing, while Spark Declarative Pipelines do

Batch notebook ETL cannot use Auto Loader, while Spark Declarative Pipelines can

Batch notebook ETL cannot be scheduled, while Spark Declarative Pipelines can

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Assume you have JSON log files arriving continuously in cloud storage at /Volumes/logs/events. You want to ingest them into a streaming table called events_bronze.

Which SQL statement correctly defines this streaming table in Databricks SQL?

CREATE STREAMING TABLE events_bronze AS SELECT * FROM read_files('/Volumes/logs/events', format => 'json');

CREATE STREAMING TABLE events_bronze AS SELECT * FROM read_files('/Volumes/logs/events', format => 'json') SCHEDULE EVERY 1 HOUR;

CREATE OR REFRESH STREAMING TABLE events_bronze AS SELECT * FROM STREAM read_files('/Volumes/logs/events', format => 'json');

CREATE OR REFRESH TABLE events_bronze AS SELECT * FROM read_files('/Volumes/logs/events', format => 'json');

7.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You are building a pipeline in Lakeflow Spark Declarative Pipelines. You want to make the path to your input data configurable depending on the environment (dev, test, prod). You set a pipeline configuration parameter named input_path. Which SQL snippet correctly references this parameter to define a streaming table that uses that path?

CREATE OR REFRESH STREAMING TABLE raw_events AS SELECT * FROM STREAM read_files(input_path, format => 'json');

CREATE OR REFRESH STREAMING TABLE raw_events AS SELECT * FROM STREAM read_files('${input_path}', format => 'json');

CREATE OR REFRESH STREAMING TABLE raw_events AS SELECT * FROM STREAM read_files({input_path}, format => 'json');

Access all questions and much more by creating a free account

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Microsoft

Continue with Microsoft

or continue with

Facebook

Facebook

Apple

Apple

Others

Others

Already have an account?