Data 268-277 12th Grade Quiz

1.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What should you do?

Group the data by using a tumbling window in a Dataflow pipeline, and write the aggregated data to Memorystore.

Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to Memorystore.

Group the data by using a session window in a Dataflow pipeline, and write the aggregated data to BigQuery.

Group the data by using a hopping window in a Dataflow pipeline, and write the aggregated data to BigQuery.

2.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You are designing a Dataflow pipeline for a batch processing job. You want to mitigate multiple zonal failures at job submission time. What should you do?

Submit duplicate pipelines in two different zones by using the --zone flag.

Set the pipeline staging location as a regional Cloud Storage bucket

Specify a worker region by using the --region flag.

Create an Eventarc trigger to resubmit the job in case of zonal failure when submitting the job.

3.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What should you do?

Import the ORC files to Bigtable tables for the data scientist team.

Import the ORC files to BigQuery tables for the data scientist team.

Copy the ORC files on Cloud Storage, then deploy a Dataproc cluster for the data scientist team.

Copy the ORC files on Cloud Storage, then create external BigQuery tables for the data scientist team.

4.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You have a BigQuery table that ingests data directly from a Pub/Sub subscription. The ingested data is encrypted with a Google-managed encryption key. You need to meet a new organization policy that requires you to use keys from a centralized Cloud Key Management Service (Cloud KMS) project to encrypt data at rest. What should you do?

Use Cloud KMS encryption key with Dataflow to ingest the existing Pub/Sub subscription to the existing BigQuery table.

Create a new BigQuery table by using customer-managed encryption keys (CMEK), and migrate the data from the old BigQuery table.

Create a new Pub/Sub topic with CMEK and use the existing BigQuery table by using Google-managed encryption key.

Create a new BigQuery table and Pub/Sub topic by using customer-managed encryption keys (CMEK), and migrate the data from the old BigQuery table

5.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You are creating the CI/CD cycle for the code of the directed acyclic graphs (DAGs) running in Cloud Composer. Your team has two Cloud Composer instances: one instance for development and another instance for production. Your team is using a Git repository to maintain and develop the code of the DAGs. You want to deploy the DAGs automatically to Cloud Composer when a certain tag is pushed to the Git repository. What should you do?

1. Use Cloud Build to copy the code of the DAG to the Cloud Storage bucket of the development instance for DAG testing.

2. If the tests pass, use Cloud Build to copy the code to the bucket of the production instance.

1. Use Cloud Build to build a container with the code of the DAG and the KubernetesPodOperator to deploy the code to the Google Kubernetes Engine (GKE) cluster of the development instance for testing.

cluster of the development instance for testing. 2. If the tests pass, use the KubernetesPodOperator to deploy the container to the GKE cluster of the production instance.

1. Use Cloud Build to build a container and the KubernetesPodOperator to deploy the code of the DAG to the Google Kubernetes Engine (GKE) cluster of the development instance for testing.

2. If the tests pass, copy the code to the Cloud Storage bucket of the production instance.

1. Use Cloud Build to copy the code of the DAG to the Cloud Storage bucket of the development instance for DAG testing.

2. If the tests pass, use Cloud Build to build a container with the code of the DAG and the KubernetesPodOperator to deploy the container to the Google Kubernetes Engine (GKE) cluster of the production instance.

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You have a BigQuery dataset named “customers”. All tables will be tagged by using a Data Catalog tag template named “gdpr”. The template contains one mandatory field, “has_sensitive_data”, with a boolean value. All employees must be able to do a simple search and find tables in the dataset that have either true or false in the “has_sensitive_data’ field. However, only the Human Resources (HR) group should be able to see the data inside the tables for which “has_sensitive data” is true. You give the all employees group the bigquery.metadataViewer and bigquery.connectionUser roles on the dataset. You want to minimize configuration overhead. What should you do next?

Create the “gdpr” tag template with private visibility. Assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.

Create the “gdpr” tag template with private visibility. Assign the datacatalog.tagTemplateViewer role on this tag to the all employees group, and assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.

Create the “gdpr” tag template with public visibility. Assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data.

Create the “gdpr” tag template with public visibility. Assign the datacatalog.tagTemplateViewer role on this tag to the all employees group, and assign the bigquery.dataViewer role to the HR group on the tables that contain sensitive data

7.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You are monitoring your organization’s data lake hosted on BigQuery. The ingestion pipelines read data from Pub/Sub and write the data into tables on BigQuery. After a new version of the ingestion pipelines is deployed, the daily stored data increased by 50%. The volumes of data in Pub/Sub remained the same and only some tables had their daily partition data size doubled. You need to investigate and fix the cause of the data increase. What should you do?

1. Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled.

2. Schedule daily SQL jobs to deduplicate the affected tables.

3. Share the deduplication script with the other operational teams to reuse if this occurs to other tables.

1. Check for code errors in the deployed pipelines.

2. Check for multiple writing to pipeline BigQuery sink.

3. Check for errors in Cloud Logging during the day of the release of the new pipelines.

4. If no errors, restore the BigQuery tables to their content before the last release by using time travel.

1. Check for duplicate rows in the BigQuery tables that have the daily partition data size doubled.

2. Check the BigQuery Audit logs to find job IDs.

3. Use Cloud Monitoring to determine when the identified Dataflow jobs started and the pipeline code version.

4. When more than one pipeline ingests data into a table, stop all versions except the latest one.

1. Roll back the last deployment.

2. Restore the BigQuery tables to their content before the last release by using time travel.

3. Restart the Dataflow jobs and replay the messages by seeking the subscription to the timestamp of the release.

8.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You need to create a SQL pipeline. The pipeline runs an aggregate SQL transformation on a BigQuery table every two hours and appends the result to another existing BigQuery table. You need to configure the pipeline to retry if errors occur. You want the pipeline to send an email notification after three consecutive failures. What should you do?

Use the BigQueryUpsertTableOperator in Cloud Composer, set the retry parameter to three, and set the email_on_failure parameter to true.

Use the BigQueryInsertJobOperator in Cloud Composer, set the retry parameter to three, and set the email_on_failure parameter to true.

Create a BigQuery scheduled query to run the SQL transformation with schedule options that repeats every two hours, and enable email notifications.

Create a BigQuery scheduled query to run the SQL transformation with schedule options that repeats every two hours, and enable notification to Pub/Sub topic. Use Pub/Sub and Cloud Functions to send an email after three failed executions.

9.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

Your organization's data assets are stored in BigQuery, Pub/Sub, and a PostgreSQL instance running on Compute Engine. Because there are multiple domains and diverse teams using the data, teams in your organization are unable to discover existing data assets. You need to design a solution to improve data discoverability while keeping development and configuration efforts to a minimum. What should you do?

Use Data Catalog to automatically catalog BigQuery datasets. Use Data Catalog APIs to manually catalog Pub/Sub topics and PostgreSQL tables.

Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use Data Catalog APIs to manually catalog PostgreSQL tables.

Use Data Catalog to automatically catalog BigQuery datasets and Pub/Sub topics. Use custom connectors to manually catalog PostgreSQL tables.

Use customer connectors to manually catalog BigQuery datasets, Pub/Sub topics, and PostgreSQL tables.

10.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

You created a new version of a Dataflow streaming data ingestion pipeline that reads from Pub/Sub and writes to BigQuery. The previous version of the pipeline that runs in production uses a 5-minute window for processing. You need to deploy the new version of the pipeline without losing any data, creating inconsistencies, or increasing the processing latency by more than 10 minutes. What should you do?

Update the old pipeline with the new pipeline code.

Snapshot the old pipeline, stop the old pipeline, and then start the new pipeline from the snapshot.

Drain the old pipeline, then start the new pipeline.

Cancel the old pipeline, then start the new pipeline.

Create a free account and access millions of resources

Similar Resources on Quizizz

Popular Resources on Quizizz

Discover more resources for Computers