Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log data. How should you set up the log data transfer into Google Cloud?

Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.

Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.

Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.

Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query.

Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query.

Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query.

Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query.

You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?

Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as labels.

Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.

Build and train a text classification model using TensorFlow. Deploy the model using Cloud Machine Learning Engine. Call the model from your application and process the results as labels.

Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?

Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.

Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.

Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.

Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.

You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud. You want to support transactions that scale horizontally. You also want to optimize data for range queries on non-key columns. What should you do?

Use Cloud Spanner for storage. Add secondary indexes to support query patterns.

Use Cloud SQL for storage. Add secondary indexes to support query patterns.

Use Cloud SQL for storage. Use Cloud Dataflow to transform data to support query patterns.

Use Cloud Spanner for storage. Use Cloud Dataflow to transform data to support query patterns.

An organization maintains a Google BigQuery dataset that contains tables with user-level data. They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?

Create and share an authorized view that provides the aggregate results.

Create and share a new dataset and view that provides the aggregate results.

Create and share a new dataset and table that contains the aggregate results.

Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.

Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of data. Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?

In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.

Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.

In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.

In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Subsample your training dataset.

Increase the number of input features to your model.

Increase the number of layers in your neural network.

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

Increase your network bandwidth from your datacenter to GCP.

Increase the CPU size on your server.

Increase the size of the Google Persistent Disk on your server.

Increase your network bandwidth from Compute Engine to Cloud Storage.

After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the output of the original job and want to compare the contents with output from the migrated job to show that they are identical. The tables do not contain a primary key column that would enable you to join them together for comparison. What should you do?

Use a Dataproc cluster and the BigQuery Hadoop connector to read the data from each table and calculate a hash from non-timestamp columns of the table after sorting. Compare the hashes of each table.

Select random samples from the tables using the RAND() function and compare the samples.

Select random samples from the tables using the HASH() function and compare the samples.

Create stratified random samples using the OVER() function and compare equivalent samples from each table.

You are a head of BI at a large enterprise company with multiple business units that each have different priorities and budgets. You use ondemand pricing for BigQuery with a quota of 2K concurrent on-demand slots per project. Users at your organization sometimes don't get slots to execute their query and you need to correct this. You'd like to avoid introducing new projects to your account. What should you do?

Switch to flat-rate pricing and establish a hierarchical priority model for your projects.

Convert your batch BQ queries into interactive BQ queries.

Create an additional project to overcome the 2K on-demand per-project quota.

Increase the amount of concurrent slots per project at the Quotas page at the Cloud Console.

You have an Apache Kafka cluster on-prem with topics containing web application logs. You need to replicate the data to Google Cloud for analysis in BigQuery and Cloud Storage. The preferred replication method is mirroring to avoid deployment of Kafka Connect plugins. What should you do?

Deploy a Kafka cluster on GCE VM Instances. Configure your on-prem cluster to mirror your topics to the cluster running in GCE. Use a Dataproc cluster or Dataflow job to read from Kafka and write to GCS.

Deploy the Pub/Sub Kafka connector to your on-prem Kafka cluster and configure Pub/Sub as a Source connector. Use a Dataflow job to read from Pub/Sub and write to GCS.

Deploy the Pub/Sub Kafka connector to your on-prem Kafka cluster and configure Pub/Sub as a Sink connector. Use a Dataflow job to read from Pub/Sub and write to GCS.

You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffling operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload. What should you do?

Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Increase the size of your parquet files to ensure them to be 1 GB minimum.

Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.

Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.

Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data). What should you do?

Add a try catch block to your DoFn that transforms the data, write erroneous rows to Pub/Sub PubSub directly from the DoFn.

Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.

Add a try catch block to your DoFn that transforms the data, extract erroneous rows from logs.

Add a try catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be stored to Pub/Sub later.

You're training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency. What should you do?

Create a feature cross of latitude and longitude, bucketize it at the minute level and use L1 regularization during optimizatio

Provide latitude and longitude as input vectors to your neural net.

Create a numeric column from a feature cross of latitude and longitude.

Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization.

You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts. What should you do?

Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.

Place the MariaDB instances in an Instance Group with a Health Check.

Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.

Install the StackDriver Agent and configure the MySQL plugin

You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether these applications have been defaulted. You have been asked to train a model to predict default rates for credit applicants. What should you do?

Train a linear regression to predict a credit default risk score.

Increase the size of the dataset by collecting additional data.

Remove the bias from the data and collect applications that have been declined loans.

Match loan applicants with their social profiles to enable feature engineering.

You're using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. You've recently identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole database. You need to ensure both the reliability of your production application as well as the analytical workload. What should you do?

Add a second cluster to an existing instance with a multi-cluster routing, use live-traffic app profile for your regular workload and batchanalytics profile for the analytics workload.

Export Bigtable dump to GCS and run your analytical job on top of the exported files.

Add a second cluster to an existing instance with a single-cluster routing, use live-traffic app profile for your regular workload and batchanalytics profile for the analytics workload.

Increase the size of your existing cluster twice and execute your analytics workload on your new resized cluster.

You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?

Streaming job, PubSubIO, BigQueryIO, side-inputs

Batch job, PubSubIO, side-inputs

Streaming job, PubSubIO, JdbcIO, side-outputs

Streaming job, PubSubIO, BigQueryIO, side-outputs

You have a data pipeline that writes data to Cloud Bigtable using well-designed row keys. You want to monitor your pipeline to determine when to increase the size of your Cloud Bigtable cluster. Which two actions can you take to accomplish this? (Choose two.)

Monitor the latency of write operations. Increase the size of the Cloud Bigtable cluster when there is a sustained increase in write latency.

Monitor storage utilization. Increase the size of the Cloud Bigtable cluster when utilization increases above 70% of max capacity.

Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Read pressure index is above 100.

Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Write pressure index is above 100

Monitor latency of read operations. Increase the size of the Cloud Bigtable cluster of read operations take longer than 100 ms.

You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps. You have the following requirements: ✑ You will batch-load the posts once per day and run them through the Cloud Natural Language API. ✑ You will extract topics and sentiment from the posts. ✑ You must store the raw posts for archiving and reprocessing. ✑ You will create dashboards to be shared with people both inside and outside your organization. You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?

Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.

Store the social media posts and the data extracted from the API in BigQuery.

Store the social media posts and the data extracted from the API in Cloud SQL.

Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.

You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL. What should you do?

Use Cloud Dataprep with recipes to detect errors and perform transformations.

Use Cloud Dataflow with Beam to detect errors and perform transformations.

Use Cloud Dataproc with a Hadoop job to detect errors and perform transformations.

Use federated tables in BigQuery with queries to detect errors and perform transformations.

Your company needs to upload their historic data to Cloud Storage. The security rules don't allow access from external IPs to their on-premises resources. After an initial upload, they will add new data from existing on-premises applications every day. What should they do?

Execute gsutil rsync from the on-premises servers.

Use Dataflow and write the data to Cloud Storage.

Write a job template in Dataproc to perform the data transfer.

Install an FTP server on a Compute Engine VM to receive the files and move them to Cloud Storage.

You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query `"-dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

Recreate the table with a partitioning column and clustering column

Create a separate table for each ID.

Use the LIMIT keyword to reduce the number of rows returned.

Use the bq query --maximum_bytes_billed flag to restrict the number of bytes billed.

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real-time analysis of aggregated trends. What should you do?

Use a Cloud Dataflow pipeline to stream data into the BigQuery table.

Use bq load to load a batch of sensor data every 60 seconds.

Use the INSERT statement to insert a batch of data every 60 seconds.

Use the MERGE statement to apply updates in batch every 60 seconds.

You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?

Export the records from the database as an Avro file. Copy the file onto a Transfer Appliance and send it to Google, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

Export the records from the database as an Avro file. Upload the file to GCS using gsutil, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

Export the records from the database into a CSV file. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console.

Export the records from the database as an Avro file. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

You need to create a near real-time inventory dashboard that reads the main inventory tables in your BigQuery data warehouse. Historical inventory data is stored as inventory balances by item and location. You have several thousand updates to inventory every hour. You want to maximize performance of the dashboard and ensure that the data is accurate. What should you do?

Leverage BigQuery UPDATE statements to update the inventory balances as they are changing.

Partition the inventory balance table by item to reduce the amount of data scanned with each inventory update.

Use the BigQuery streaming the stream changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory balance table. Update the inventory balance table nightly.

Use the BigQuery bulk loader to batch load inventory changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory balance table. Update the inventory balance table nightly.

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table that have a recovery point objective (RPO) of 30 days?

Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

You used Dataprep to create a recipe on a sample of data in a BigQuery table. You want to reuse this recipe on a daily upload of data with the same schema, after the load job with variable execution time completes. What should you do?

Export the Dataprep job as a Dataflow template, and incorporate it into a Composer job.

Create a cron schedule in Dataprep.

Create an App Engine cron job to schedule the execution of the Dataprep job.

Export the recipe as a Dataprep template, and create a job in Cloud Scheduler.

You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work in progress on your clusters. What should you do?

Increase the cluster size with preemptible worker nodes, and configure them to use graceful decommissioning.

Increase the cluster size with more non-preemptible workers.

Increase the cluster size with preemptible worker nodes, and configure them to forcefully decommission.

Increase the cluster size with preemptible worker nodes, and use Cloud Stackdriver to trigger a script to preserve work.

You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit tracking numbers when events are sent to Kafka topics. A recent software update caused the scanners to accidentally transmit recipients' personally identifiable information (PII) to analytics systems, which violates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems. What should you do?

Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention (Cloud DLP) API. Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review.

Create an authorized view in BigQuery to restrict access to tables with sensitive data.

Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitive information.

Use Cloud Logging to analyze the data passed through the total pipeline to identify transactions that may contain sensitive information.

You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second ingests data from on-premises servers and uploads it to Cloud Storage. The third is a Cloud Dataflow pipeline that gets information from third-party data providers and uploads the information to Cloud Storage. You need to be able to schedule and monitor the execution of these three workflows and manually execute them when needed. What should you do?

Create a Direct Acyclic Graph in Cloud Composer to schedule and monitor the jobs.

Use Stackdriver Monitoring and set up an alert with a Webhook notification to trigger the jobs.

Develop an App Engine application to schedule and request the status of the jobs using GCP API calls.

Set up cron jobs in a Compute Engine instance to schedule and monitor the pipelines using GCP API calls.

You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is no error logged in Cloud Logging. What are the two most likely causes of this problem? (Choose two.)

Error handling in the subscriber code is not handling run-time errors properly.

The subscriber code does not acknowledge the messages that it pulls.

Publisher throughput quota is too small.

Total outstanding messages exceed the 10-MB maximum.

The subscriber code cannot keep up with the messages.

You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt. You need to modify the Cloud Dataflow pipeline to filter out this corrupt data. What should you do?

Add a ParDo transform in Cloud Dataflow to discard corrupt elements.

Add a SideInput that returns a Boolean if the element is corrupt.

Add a Partition transform in Cloud Dataflow to separate valid data from corrupt data.

Add a GroupByKey transform in Cloud Dataflow to group all of the valid data together and discard the rest.

You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30`"90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?

Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.

Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.

Modify your pipeline to maintain the last 3090 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.

Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.

You operate a logistics company, and you want to improve event delivery reliability for vehicle-based sensors. You operate small data centers around the world to capture these events, but leased lines that provide connectivity from your event collection infrastructure to your event processing infrastructure are unreliable, with unpredictable latency. You want to address this issue in the most cost-effective way. What should you do?

Have the data acquisition devices publish data to Cloud Pub/Sub.

Deploy small Kafka clusters in your data centers to buffer events.

Establish a Cloud Interconnect between all remote data centers and Google.

Write a Cloud Dataflow pipeline that aggregates all data in session windows.

You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current. Which solution should you choose?

Use Analytics Hub to control data access, and provide third party companies with access to the dataset.

Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide third-party companies with access to the bucket.

Create a separate dataset in BigQuery that contains the relevant data to share, and provide third-party companies with access to the new dataset.

Create a Dataflow job that reads the data in frequent time intervals, and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.

PDE-2022-2

Professional Development

•

50 Qs

Similar activities

Final Test - Intermediate

Professional Development

•

50 Qs

EVALUASI AKADEMIK LATSAR 2019

Professional Development

•

51 Qs

TES TEHNIK PPPK

Professional Development

•

50 Qs

ACE-3

Professional Development

•

50 Qs

Forest Ergonomics

Professional Development

•

52 Qs

SELASAR W4 AGS 2024

Professional Development

•

50 Qs

REDES1

University - Professional Development

•

50 Qs

Quality Engineering - Online Assessment

Professional Development

•

45 Qs

PDE-2022-2

Quiz

•

Professional Development

•

Professional Development

•

Practice Problem

•

Medium

Balamurugan R

Used 78+ times

FREE Resource

50 questions

Show all answers

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?

Migrate the workload to Google Cloud Dataflow

Use pre-emptible virtual machines (VMs) for the cluster

Use a higher-memory node so that the job runs faster

Use SSDs on the worker nodes so that the job can run faster

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?

Set a single global window to capture all the data.

Set sliding windows to capture all the lagged data.

Use watermarks and timestamps to capture the lagged data.

Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

You are integrating one of your internal IT applications and Google BigQuery, so users can query BigQuery from the application's interface. You do not want individual users to authenticate to BigQuery and you do not want to give them access to the dataset. You need to securely access BigQuery from your IT application. What should you do?

Create groups for your users and give those groups access to the dataset

Integrate with a single sign-on (SSO) platform, and pass each user's credentials along with the query request

Create a service account and grant dataset access to that account. Use the service account's private key to access the dataset

Create a dummy user and grant dataset access to that user. Store the username and password for that user in a file on the files system, and use those credentials to access the BigQuery dataset

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?

Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataproc job

Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.

Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 'none' using a Cloud Dataprep job.

Use Cloud Dataflow to find null values in sample source data. Convert all nulls to 0 using a custom script.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do?

Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.

Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.

Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.

Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

You are developing an application that uses a recommendation engine on Google Cloud. Your solution should display new videos to customers based on past views. Your solution needs to generate labels for the entities in videos that the customer has viewed. Your design must be able to provide very fast filtering suggestions based on data from other customer preferences on several TB of data. What should you do?

Build and train a complex classification model with Spark MLlib to generate labels and filter the results. Deploy the models using Cloud Dataproc. Call the model from your application.

Build and train a classification model with Spark MLlib to generate labels. Build and train a second classification model with Spark MLlib to filter results to match customer preferences. Deploy the models using Cloud Dataproc. Call the models from your application.

Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud Bigtable, and filter the predicted labels to match the user's viewing history to generate preferences.

Build an application that calls the Cloud Video Intelligence API to generate labels. Store data in Cloud SQL, and join and filter the predicted labels to match the user's viewing history to generate preferences.

MULTIPLE CHOICE QUESTION

2 mins • 1 pt

You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?

Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.

Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.

Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.

Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs. Configure the job to use nondefault Compute Engine machine types when needed.

Create a free account and access millions of resources

Create resources

Host any resource

Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever

or continue with

Microsoft

Apple

Others

Already have an account?

Similar Resources on Wayground

50 questions

Famous Landmarks Quiz

Quiz

•

Professional Development

50 questions

Exam Pre-intermediate. Grammar / Vocabulary part

Quiz

•

Professional Development

50 questions

DPA Game

Quiz

•

Professional Development

50 questions

SELASAR W2 July

Quiz

•

Professional Development

50 questions

SELASAR W4 JULY 2024

Quiz

•

Professional Development

55 questions

LASTSOL MANSOSWA 8 PPPK GMI

Quiz

•

Professional Development

50 questions

Achievement Orientation (MSIB)

Quiz

•

Professional Development

55 questions

NETWORK HARDWARE

Quiz

•

University - Professi...

Popular Resources on Wayground

5 questions

This is not a...winter edition (Drawing game)

Quiz

•

1st - 5th Grade

15 questions

4:3 Model Multiplication of Decimals by Whole Numbers

Quiz

•

5th Grade

25 questions

Multiplication Facts

Quiz

•

5th Grade

10 questions

The Best Christmas Pageant Ever Chapters 1 & 2

Quiz

•

4th Grade

12 questions

Unit 4 Review Day

Quiz

•

3rd Grade

10 questions

Identify Iconic Christmas Movie Scenes

Interactive video

•

6th - 10th Grade

20 questions

Christmas Trivia

Quiz

•

6th - 8th Grade

18 questions

Kids Christmas Trivia

Quiz

•

KG - 5th Grade

Discover more resources for Professional Development

21 questions

Name that Holiday Movie

Quiz

•

Professional Development