Google Professional Data Engineer - Part 2

Google Professional Data Engineer - Part 2

Professional Development

137 Qs

quiz-placeholder

Similar activities

Compatia 1101

Compatia 1101

Professional Development

136 Qs

PRA UJI 2024

PRA UJI 2024

Professional Development

140 Qs

Google Professional Data Engineer - Part 2

Google Professional Data Engineer - Part 2

Assessment

Quiz

Computers

Professional Development

Hard

Created by

Steven Wong

Used 36+ times

FREE Resource

137 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You are developing an Apache Beam pipeline to extract data from a Cloud SQL instance by using JdbcIO. You have two projects running in Google Cloud. The pipeline will be deployed and executed on Dataflow in Project A. The Cloud SQL. instance is running in Project B and does not have a public IP address. After deploying the pipeline, you noticed that the pipeline failed to extract data from the Cloud SQL instance due to connection failure. You verified that VPC Service Controls and shared VPC are not in use in these projects. You want to resolve this error while ensuring that the data does not go through the public internet. What should you do?

Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.

Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A.

Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SQL instance.

Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.

Answer explanation

Cloud SQL supports private IP addresses through private service access. When you create a Cloud SQL instance, Cloud SQL creates the instance within its own virtual private cloud (VPC), called the Cloud SQL VPC. Enabling private IP requires setting up a peering connection between the Cloud SQL VPC and your VPC network.

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You currently have transactional data stored on-premises in a PostgreSQL database. To modernize your data environment, you want to run transactional workloads and support analytics needs with a single database. You need to move to Google Cloud without changing database management systems, and minimize cost and complexity. What should you do?

Migrate and modernize your database with Cloud Spanner.

Migrate your workloads to AlloyDB for PostgreSQL.

Migrate to BigQuery to optimize analytics.

Migrate your PostgreSQL database to Cloud SQL for PostgreSQL.

Answer explanation

They currently have transactional data stored on-premises in a PostgreSQL database and they want to modernize their database that supports transactional workloads and analytics .If they select cloud Sql (postgreSQL) it will minimize the cost and complexity. and for analytics purpose they can create federated queries over cloudSql(postgreSql) https://cloud.google.com/bigquery/docs/federated-queries-intro This approach will minimze the cost

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You have a Standard Tier Memorystore for Redis instance deployed in a production environment. You need to simulate a Redis instance failover in the most accurate disaster recovery situation, and ensure that the failover has no impact on production data. What should you do?

Create a Standard Tier Memorystore for Redis instance in the development environment. Initiate a manual failover by using the limited-data-loss data protection mode.

Create a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode.

Increase one replica to Redis instance in production environment. Initiate a manual failover by using the force-data-loss data protection mode.

Initiate a manual failover by using the limited-data-loss data protection mode to the Memorystore for Redis instance in the production environment.

Answer explanation

The best option is B - Create a Standard Tier Memorystore for Redis instance in a development environment. Initiate a manual failover by using the force-data-loss data protection mode. The key points are: • The failover should be tested in a separate development environment, not production, to avoid impacting real data. • The force-data-loss mode will simulate a full failover and restart, which is the most accurate test of disaster recovery. • Limited-data-loss mode only fails over reads which does not fully test write capabilities. • Increasing replicas in production and failing over (C) risks losing real production data. • Failing over production (D) also risks impacting real data and traffic. So option B isolates the test from production and uses the most rigorous failover mode to fully validate disaster recovery capabilities.

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You have a data processing application that runs on Google Kubernetes Engine (GKE). Containers need to be launched with their latest available configurations from a container registry. Your GKE nodes need to have GPUs, local SSDs, and 8 Gbps bandwidth. You want to efficiently provision the data processing infrastructure and manage the deployment process. What should you do?

Use Compute Engine startup scripts to pull container images, and use gcloud commands to provision the infrastructure.

Use Cloud Build to schedule a job using Terraform build to provision the infrastructure and launch with the most current container images.

Use GKE to autoscale containers, and use gcloud commands to provision the infrastructure.

Use Dataflow to provision the data pipeline, and use Cloud Scheduler to run the job.

Answer explanation

- Dataflow is a fully managed service for stream and batch data processing and is well-suited for real-time data processing tasks like identifying longtail and outlier data points. - Using BigQuery as a sink allows to efficiently store the cleansed and processed data for further analysis and serving it to AI models.

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You want to create a machine learning model using BigQuery ML and create an endpoint for hosting the model using Vertex AI. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

Create a new BigQuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the "ingestion" dataset as the framing data.

Use BigQuery streaming inserts to land the data from multiple vendors where your BigQuery dataset ML model is deployed.

Create a Pub/Sub topic and send all vendor data to it. Connect a Cloud Function to the topic to process the data and store it in BigQuery.

Create a Pub/Sub topic and send all vendor data to it. Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.

Answer explanation

Dataflow provides a scalable and flexible way to process and clean the incoming data in real-time before loading it into BigQuery.

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-premises data center, and now your company is migrating to Google Cloud. Your data center will be closing soon, so a rapid lift-and-shift migration is necessary. However, the data you've been using will be migrated to migrated to BigQuery. You periodically retrain your Spark ML models, so you need to migrate existing training pipelines to Google Cloud. What should you do?

Use Vertex AI for training existing Spark ML models

Rewrite your models on TensorFlow, and start using Vertex AI

Use Dataproc for training existing Spark ML models, but start reading data directly from BigQuery

Spin up a Spark cluster on Compute Engine, and train Spark ML models on the data exported from BigQuery

Answer explanation

Option C : It is the most rapid way to migrate your existing training pipelines to Google Cloud. It allows you to continue using your existing Spark ML models. It allows you to take advantage of the scalability and performance of Dataproc. It allows you to read data directly from BigQuery, which is a more efficient way to process large datasets

7.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Your infrastructure team has set up an interconnect link between Google Cloud and the on-premises network. You are designing a high-throughput streaming pipeline to ingest data in streaming from an Apache Kafka cluster hosted on- premises. You want to store the data in BigQuery, with as minimal latency as possible. What should you do?

Setup a Kafka Connect bridge between Kafka and Pub/Sub. Use a Google-provided Dataflow template to read the data from Pub/Sub, and write the data to BigQuery.

Use a proxy host in the VPC in Google Cloud connecting to Kafka. Write a Dataflow pipeline, read data from the proxy host, and write the data to BigQuery.

Use Dataflow, write a pipeline that reads the data from Kafka, and writes the data to BigQuery.

Setup a Kafka Connect bridge between Kafka and Pub/Sub. Write a Dataflow pipeline, read the data from Pub/Sub, and write the data to BigQuery.

Answer explanation

Latency: Option C, with direct integration between Kafka and Dataflow, offers lower latency by eliminating intermediate steps. Flexibility: Custom Dataflow pipelines (Option C) provide more control over data processing and optimization compared to using a pre-built template.

Create a free account and access millions of resources

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Classlink

Continue with Classlink

Clever

Continue with Clever

or continue with

Microsoft

Microsoft

Apple

Apple

Others

Others

By signing up, you agree to our Terms of Service & Privacy Policy

Already have an account?