Google Professional Data Engineer - Part 2
Quiz
•
Computers
•
Professional Development
•
Hard
Steven Wong
Used 36+ times
FREE Resource
137 questions
Show all answers
1.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You are developing an Apache Beam pipeline to extract data from a Cloud SQL instance by using JdbcIO. You have two projects running in Google Cloud. The pipeline will be deployed and executed on Dataflow in Project A. The Cloud SQL. instance is running in Project B and does not have a public IP address. After deploying the pipeline, you noticed that the pipeline failed to extract data from the Cloud SQL instance due to connection failure. You verified that VPC Service Controls and shared VPC are not in use in these projects. You want to resolve this error while ensuring that the data does not go through the public internet. What should you do?
Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.
Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A.
Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SQL instance.
Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.
Answer explanation
Cloud SQL supports private IP addresses through private service access. When you create a Cloud SQL instance, Cloud SQL creates the instance within its own virtual private cloud (VPC), called the Cloud SQL VPC. Enabling private IP requires setting up a peering connection between the Cloud SQL VPC and your VPC network.
2.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You currently have transactional data stored on-premises in a PostgreSQL database. To modernize your data environment, you want to run transactional workloads and support analytics needs with a single database. You need to move to Google Cloud without changing database management systems, and minimize cost and complexity. What should you do?
Migrate and modernize your database with Cloud Spanner.
Migrate your workloads to AlloyDB for PostgreSQL.
Migrate to BigQuery to optimize analytics.
Migrate your PostgreSQL database to Cloud SQL for PostgreSQL.
Answer explanation
They currently have transactional data stored on-premises in a PostgreSQL database and they want to modernize their database that supports transactional workloads and analytics .If they select cloud Sql (postgreSQL) it will minimize the cost and complexity. and for analytics purpose they can create federated queries over cloudSql(postgreSql) https://cloud.google.com/bigquery/docs/federated-queries-intro This approach will minimze the cost
3.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You have a Standard Tier Memorystore for Redis instance deployed in a production environment. You need to simulate a Redis instance failover in the most accurate disaster recovery situation, and ensure that the failover has no impact on production data. What should you do?
Create a Standard Tier Memorystore for Redis instance in the development environment. Initiate a manual failover by using the limited-data-loss data protection mode.
Create a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode.
Increase one replica to Redis instance in production environment. Initiate a manual failover by using the force-data-loss data protection mode.
Initiate a manual failover by using the limited-data-loss data protection mode to the Memorystore for Redis instance in the production environment.
Answer explanation
The best option is B - Create a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode. The key points are: • The failover should be tested in a separate development environment, not production, to avoid impacting real data. • The force-data-loss mode will simulate a full failover and restart, which is the most accurate test of disaster recovery. • Limited-data-loss mode only fails over reads which does not fully test write capabilities. • Increasing replicas in production and failing over (C) risks losing real production data. • Failing over production (D) also risks impacting real data and traffic. So option B isolates the test from production and uses the most rigorous failover mode to fully validate disaster recovery capabilities.
4.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You have a data processing application that runs on Google Kubernetes Engine (GKE). Containers need to be launched with their latest available configurations from a container registry. Your GKE nodes need to have GPUs, local SSDs, and 8 Gbps bandwidth. You want to efficiently provision the data processing infrastructure and manage the deployment process. What should you do?
Use Compute Engine startup scripts to pull container images, and use gcloud commands to provision the infrastructure.
Use Cloud Build to schedule a job using Terraform build to provision the infrastructure and launch with the most current container images.
Use GKE to autoscale containers, and use gcloud commands to provision the infrastructure.
Use Dataflow to provision the data pipeline, and use Cloud Scheduler to run the job.
Answer explanation
- Dataflow is a fully managed service for stream and batch data processing and is well-suited for real-time data processing tasks like identifying longtail and outlier data points. - Using BigQuery as a sink allows to efficiently store the cleansed and processed data for further analysis and serving it to AI models.
5.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You want to create a machine learning model using BigQuery ML and create an endpoint for hosting the model using Vertex AI. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?
Create a new BigQuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the "ingestion" dataset as the framing data.
Use BigQuery streaming inserts to land the data from multiple vendors where your BigQuery dataset ML model is deployed.
Create a Pub/Sub topic and send all vendor data to it. Connect a Cloud Function to the topic to process the data and store it in BigQuery.
Create a Pub/Sub topic and send all vendor data to it. Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
Answer explanation
Dataflow provides a scalable and flexible way to process and clean the incoming data in real-time before loading it into BigQuery.
6.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be closing soon, so a rapid lift-and-shift migration is necessary. However, the data you've been using will be migrated to migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?
Use Vertex AI for training existing Spark ML models
Rewrite your models on TensorFlow, and start using Vertex AI
Use Dataproc for training existing Spark ML models, but start reading data directly from BigQuery
Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery
Answer explanation
Option C : It is the most rapid way to migrate your existing training pipelines to Google Cloud. It allows you to continue using your existing Spark ML models. It allows you to take advantage of the scalability and performance of Dataproc. It allows you to read data directly from BigQuery, which is a more efficient way to process large datasets
7.
MULTIPLE CHOICE QUESTION
30 sec • 1 pt
Your infrastructure team has set up an interconnect link between Google Cloud and the on-premises network. You are designing a high-throughput streaming pipeline to ingest data in streaming from an Apache Kafka cluster hosted on- premises. You want to store the data in BigQuery, with as minimal latency as possible. What should you do?
Setup a Kafka Connect bridge between Kafka and Pub/Sub. Use a Google-provided Dataflow template to read the data from Pub/Sub, and write the data to BigQuery.
Use a proxy host in the VPC in Google Cloud connecting to Kafka. Write a Dataflow pipeline, read data from the proxy host, and write the data to BigQuery.
Use Dataflow, write a pipeline that reads the data from Kafka, and writes the data to BigQuery.
Setup a Kafka Connect bridge between Kafka and Pub/Sub. Write a Dataflow pipeline, read the data from Pub/Sub, and write the data to BigQuery.
Answer explanation
Latency: Option C, with direct integration between Kafka and Dataflow, offers lower latency by eliminating intermediate steps. Flexibility: Custom Dataflow pipelines (Option C) provide more control over data processing and optimization compared to using a pre-built template.
Create a free account and access millions of resources
Create resources
Host any resource
Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever
or continue with

Microsoft
%20(1).png)
Apple
Others
By signing up, you agree to our Terms of Service & Privacy Policy
Already have an account?
Popular Resources on Wayground
10 questions
Honoring the Significance of Veterans Day
Interactive video
•
6th - 10th Grade
10 questions
Exploring Veterans Day: Facts and Celebrations for Kids
Interactive video
•
6th - 10th Grade
19 questions
Veterans Day
Quiz
•
5th Grade
25 questions
Multiplication Facts
Quiz
•
5th Grade
15 questions
Circuits, Light Energy, and Forces
Quiz
•
5th Grade
6 questions
FOREST Self-Discipline
Lesson
•
1st - 5th Grade
7 questions
Veteran's Day
Interactive video
•
3rd Grade
20 questions
Weekly Prefix check #2
Quiz
•
4th - 7th Grade
Discover more resources for Computers
10 questions
Identifying Phishing Emails Quiz
Quiz
•
Professional Development
14 questions
2019 Logos
Quiz
•
Professional Development
7 questions
Tone and Mood Quick Check
Quiz
•
Professional Development
32 questions
Abbreviations and Equivalents
Lesson
•
6th Grade - Professio...
5 questions
11.4.25 Student Engagement & Discourse
Lesson
•
Professional Development
