
Databricks Data Engineer Professional Practice Exam
Authored by Dot Jarv
Computers
University
Used 42+ times

AI Actions
Add similar questions
Adjust reading levels
Convert to real-world scenario
Translate activity
More...
Content View
Student View
105 questions
Show all answers
1.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code: df = spark.read.format("parquet").load(f"/mnt/source/(date)")
Which code block should be used to create the date Python variable used in the above code block?
date = spark.conf.get("date")
input_dict = input()
date= input_dict["date"]
import sys
date = sys.argv[1]
date = dbutils.notebooks.getParam("date")
dbutils.widgets.text("date", "null")
date = dbutils.widgets.get("date")
Answer explanation
In https://docs.databricks.com/en/notebooks/notebook-workflows.html#dbutilsnotebook-api the "run Example" is an equivalent use-case as this.
dbutils.widgets.text("date", "null")
date = dbutils.widgets.get("date")
2.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes of inactivity. Each user should be able to execute workloads against their assigned clusters at any time of the day.
Assuming users have been added to a workspace but not granted any permissions, which of the following describes the minimal permissions a user would need to start and attach to an already configured cluster.
"Can Manage" privileges on the required cluster
Workspace Admin privileges, cluster creation allowed, "Can Attach To" privileges on the required cluster
Cluster creation allowed, "Can Attach To" privileges on the required cluster
"Can Restart" privileges on the required cluster
Cluster creation allowed, "Can Restart" privileges on the required cluster
Answer explanation
Can restart is is minimum permission to attach and start the cluster. For more information.
Read this page https://docs.databricks.com/en/security/auth-authz/access-control/cluster-acl.html
3.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?
Cluster: New Job Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: Unlimited
Cluster: New Job Cluster;
Retries: None;
Maximum Concurrent Runs: 1
Cluster: Existing All-Purpose Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: 1
Cluster: New Job Cluster;
Retries: Unlimited;
Maximum Concurrent Runs: 1
Cluster: Existing All-Purpose Cluster;
Retries: None;
Maximum Concurrent Runs: 1
Answer explanation
Maximum concurrent runs: Set to 1. There must be only one instance of each query concurrently active. Retries: Set to Unlimited.
https://docs.databricks.com/en/structured-streaming/query-recovery.html
4.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. The recent_sensor_recordings table contains an identifying sensor_id alongside the timestamp and temperature for the most recent 5 minutes of recordings.
The query on the left is used to create the alert:
The query is set to refresh each minute and always completes in less than 10 seconds. The alert is set to trigger when mean (temperature) > 120. Notifications are triggered to be sent at most every 1 minute.
If this alert raises notifications for 3 consecutive minutes and then stops, which statement must be true?
The total average temperature across all sensors exceeded 120 on three consecutive executions of the query
The recent_sensor_recordings table was unresponsive for three consecutive runs of the query
The source query failed to update properly for three consecutive minutes and then restarted
The maximum temperature recording for at least one sensor exceeded 120 on three consecutive executions of the query
The average temperature recordings for at least one sensor exceeded 120 on three consecutive executions of the query
Answer explanation
5.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're using a notebook versioned with Databricks Repos, they're using a personal branch that contains old logic. The desired branch named dev-2.3.9 is not available from the branch selection dropdown.
Which approach will allow this developer to review the current logic for this notebook?
Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9
Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.
Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch
Merge all changes back to the main branch in the remote Git repository and clone the repo again
Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository
6.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database.
After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active user. They then modify their code to the following (leaving all other variables unchanged).
(See screenshot on the left)
Which statement describes what will happen when the above code is executed?
The connection to the external table will fail; the string "REDACTED" will be printed.
An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the encoded password will be saved to DBFS.
An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the password will be printed in plain text.
The connection to the external table will succeed; the string value of password will be printed in plain text.
The connection to the external table will succeed; the string "REDACTED" will be printed.
Answer explanation
7.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".
The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.
Which code block accomplishes this task while minimizing potential compute costs?
preds.write.mode("append").saveAsTable("churn_preds")
preds.write.format("delta").save("/preds/churn_preds")
Answer explanation
You need:
- Batch operation since it is at most once a day - Append, since you need to keep track of past predictions
You don't need to specify "format" when you use saveAsTable.
Access all questions and much more by creating a free account
Create resources
Host any resource
Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever
or continue with

Microsoft
%20(1).png)
Apple
Others
Already have an account?
Similar Resources on Wayground
108 questions
ағылшын 1 рк
Quiz
•
1st Grade - University
102 questions
BIS POP QUIZ 2
Quiz
•
University
104 questions
ICT - session
Quiz
•
University
100 questions
Year 7 Scratch Test
Quiz
•
6th Grade - University
100 questions
CS Elective (DTP)
Quiz
•
University
100 questions
Chess Quiz - Compilation
Quiz
•
KG - Professional Dev...
100 questions
POST TEST - KKTKJ
Quiz
•
10th Grade - University
101 questions
асел
Quiz
•
University
Popular Resources on Wayground
15 questions
Fractions on a Number Line
Quiz
•
3rd Grade
20 questions
Equivalent Fractions
Quiz
•
3rd Grade
25 questions
Multiplication Facts
Quiz
•
5th Grade
22 questions
fractions
Quiz
•
3rd Grade
20 questions
Main Idea and Details
Quiz
•
5th Grade
20 questions
Context Clues
Quiz
•
6th Grade
15 questions
Equivalent Fractions
Quiz
•
4th Grade
20 questions
Figurative Language Review
Quiz
•
6th Grade