
Lakeflow Spark Pipelines
Authored by โปร แกรมเมอร์
English
University
Used 7+ times

AI Actions
Add similar questions
Adjust reading levels
Convert to real-world scenario
Translate activity
More...
Content View
Student View
20 questions
Show all answers
1.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
In Spark Declarative Pipelines, what is the primary difference between a streaming table and a materialized view?
Streaming tables are used only for raw data, while materialized views are used only for final reporting tables
Streaming tables always recompute all data on each run, while materialized views never recompute data
Streaming tables ingest and incrementally process incoming data from a source, while materialized views incrementally maintain the results of a query over upstream tables when possible
Streaming tables are used only for batch workloads, while materialized views are used only for streaming workloads
2.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You are migrating a traditional batch ETL workflow, where the entire dataset is fully reprocessed each time the job runs, into a Lakeflow Spark Declarative Pipeline using a Bronze > Silver > Gold architecture. In the existing workflow:
- Raw files are ingested, cleaned, joined, and aggregated in a single batch job.
- Each run processes all historical data, even when new data has arrived
Which approach best reflects how this workflow should be redesigned using Spark Declarative Pipelines?
Keep the single batch job and run it more frequently to reduce data latency
Split the logic into separate batch jobs and manually orchestrate them in sequence
Define a Bronze streaming table for raw ingestion, Silver tables for cleansing and enrichment, and Gold materialized views for aggregations, allowing the pipeline to manage dependencies and incremental updates
Convert the batch SQL into Python and execute it unchanged in a pipeline notebook
3.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You have a Lakeflow Spark Declarative Pipeline that includes streaming tables, downstream transformations, and materialized views.
You want to:
-Delete all pipeline checkpoints
-Clear all data from streaming tables
-Reprocess all source data from scratch
-Fully rebuild all downstream tables and materialized views
Which action should you take?
Select delete all and run the pipeline
Manually delete the pipeline and start it again
Run the pipeline with a full table refresh
Run the pipeline with different settings
4.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
When running a Spark Declarative Pipeline for the second time after landing new data, how many rows should be processed?
Zero rows, requiring a manual refresh
The original rows only
All rows in the source volume
Only the new rows added since the last run
5.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
When using batch notebook-based ETL for large data volumes, what processing or cost consideration often motivates teams to migrate to Spark Declarative Pipelines?
Batch notebook ETL often fully reprocesses data on each run, increasing compute cost, whereas Spark Declarative Pipelines manage incremental processing automatically
Batch notebook ETL does not support distributed processing, while Spark Declarative Pipelines do
Batch notebook ETL cannot use Auto Loader, while Spark Declarative Pipelines can
Batch notebook ETL cannot be scheduled, while Spark Declarative Pipelines can
6.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Assume you have JSON log files arriving continuously in cloud storage at /Volumes/logs/events. You want to ingest them into a streaming table called events_bronze.
Which SQL statement correctly defines this streaming table in Databricks SQL?
CREATE STREAMING TABLE events_bronze AS SELECT * FROM read_files('/Volumes/logs/events', format => 'json');
CREATE STREAMING TABLE events_bronze AS SELECT * FROM read_files('/Volumes/logs/events', format => 'json') SCHEDULE EVERY 1 HOUR;
CREATE OR REFRESH STREAMING TABLE events_bronze AS SELECT * FROM STREAM read_files('/Volumes/logs/events', format => 'json');
CREATE OR REFRESH TABLE events_bronze AS SELECT * FROM read_files('/Volumes/logs/events', format => 'json');
7.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You are building a pipeline in Lakeflow Spark Declarative Pipelines. You want to make the path to your input data configurable depending on the environment (dev, test, prod). You set a pipeline configuration parameter named input_path. Which SQL snippet correctly references this parameter to define a streaming table that uses that path?
CREATE OR REFRESH STREAMING TABLE raw_events AS SELECT * FROM STREAM read_files(input_path, format => 'json');
CREATE OR REFRESH STREAMING TABLE raw_events AS SELECT * FROM STREAM read_files('${input_path}', format => 'json');
CREATE OR REFRESH STREAMING TABLE raw_events AS SELECT * FROM STREAM read_files({input_path}, format => 'json');
Access all questions and much more by creating a free account
Create resources
Host any resource
Get auto-graded reports

Continue with Google

Continue with Email

Continue with Microsoft
or continue with
%20(1).png)
Apple
Others
Already have an account?