Data Lab PreTest

Data Lab PreTest

Professional Development

50 Qs

quiz-placeholder

Similar activities

YONO

YONO

Professional Development

50 Qs

Legislation

Legislation

KG - Professional Development

51 Qs

SIA MIDTERM

SIA MIDTERM

Professional Development

50 Qs

Cuestionario de Herramientas y Algoritmos

Cuestionario de Herramientas y Algoritmos

Professional Development

50 Qs

Data Lab PreTest

Data Lab PreTest

Assessment

Quiz

Instructional Technology

Professional Development

Practice Problem

Hard

Created by

Sinh Hoàng Văn

FREE Resource

AI

Enhance your content in a minute

Add similar questions
Adjust reading levels
Convert to real-world scenario
Translate activity
More...

50 questions

Show all answers

1.

MULTIPLE SELECT QUESTION

1 min • 1 pt

You have the following dataset and need to calculate the average salary per department, but only include departments where the average salary is greater than 4000:

df.groupBy(\Department\").avg(\"Salary\").filter(\"avg(Salary) > 4000\")"
df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, AVG(Salary) AS AvgSalary FROM employees GROUP BY Department HAVING AVG(Salary) > 4000\")"
from pyspark.sql.functions import col, avg\n\ndf.groupBy(\Department\").agg(avg(col(\"Salary\")).alias(\"AvgSalary\")).filter(col(\"AvgSalary\") > 4000)"
df.filter(\Salary > 4000\").groupBy(\"Department\").avg(\"Salary\")"

2.

MULTIPLE SELECT QUESTION

1 min • 1 pt

You need to find all employees in departments with a budget greater than 8000. Which code snippet achieves this?

employees_df.join(departments_df, \Department\").filter(\"Budget > 8000\").select(\"Name\", \"Department\")"
employees_df.createOrReplaceTempView(\employees\")\ndepartments_df.createOrReplaceTempView(\"departments\")\nspark.sql(\"SELECT e.Name, e.Department FROM employees e JOIN departments d ON e.Department = d.Department WHERE d.Budget > 8000\")"
from pyspark.sql.functions import col\n\nemployees_df.join(departments_df, \Department\").where(col(\"Budget\") > 8000).select(\"Name\", \"Department\")"
employees_df.join(departments_df, employees_df.Department == departments_df.Department).filter(\Budget > 8000\")"

3.

MULTIPLE SELECT QUESTION

1 min • 1 pt

From the dataset below, find the top-earning employee in each department, including their department name and salary. Use Spark SQL:

df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, Name, Salary FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS rank FROM employees) t WHERE rank = 1\")"
df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, Name, MAX(Salary) AS Salary FROM employees GROUP BY Department, Name\")"
df.createOrReplaceTempView(\employees\")\nspark.sql(\"SELECT Department, Name, MAX(Salary) AS MaxSalary FROM employees GROUP BY Department\")"
from pyspark.sql.window import Window\nfrom pyspark.sql.functions import col, row_number\n\nwindow_spec = Window.partitionBy(\Department\").orderBy(col(\"Salary\").desc())\ndf.withColumn(\"rank\", row_number().over(window_spec)).filter(\"rank == 1\").select(\"Department\", \"Name\", \"Salary\")"

4.

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which of the following codes computes the total sales per region efficiently in Spark SQL?

df.createOrReplaceTempView(\sales\")\nspark.sql(\"SELECT Region, SUM(Sales) AS TotalSales FROM sales GROUP BY Region\")"
df.groupBy(\Region\").sum(\"Sales\")"
from pyspark.sql.functions import col, sum\n\ndf.groupBy(col(\Region\")).agg(sum(col(\"Sales\")).alias(\"TotalSales\"))"
df.filter(\Sales IS NOT NULL\").groupBy(\"Region\").sum(\"Sales\")"

5.

MULTIPLE SELECT QUESTION

1 min • 1 pt

How can you rank products within each category by their sales in descending order?

from pyspark.sql.window import Window\nfrom pyspark.sql.functions import col, rank\n\nwindow_spec = Window.partitionBy(\Category\").orderBy(col(\"Sales\").desc())\ndf.withColumn(\"rank\", rank().over(window_spec))"
df.createOrReplaceTempView(\products\")\nspark.sql(\"SELECT Category, Product, RANK() OVER (PARTITION BY Category ORDER BY Sales DESC) AS Rank FROM products\")"
df.groupBy(\Category\", \"Product\").agg(rank().alias(\"Rank\"))"
from pyspark.sql.functions import row_number\n\nwindow_spec = Window.partitionBy(\Category\").orderBy(\"Sales DESC\")\ndf.withColumn(\"row_number\", row_number().over(window_spec))"

6.

MULTIPLE SELECT QUESTION

1 min • 1 pt

Which function would you use to replace all NULL values in a Spark DataFrame with default values?

df.na.fill({\Column1\": 0, \"Column2\": \"N/A\"})"
df.na.replace(NULL, {\Column1\": 0, \"Column2\": \"N/A\"})"
df.fillna({\Column1\": 0, \"Column2\": \"N/A\"})"
df.replace(NULL, {\Column1\": 0, \"Column2\": \"N/A\"})"

7.

MULTIPLE SELECT QUESTION

1 min • 1 pt

How do you efficiently join two DataFrames on multiple keys in PySpark?

df1.join(df2, (df1.Key1 == df2.Key1) & (df1.Key2 == df2.Key2))
df1.join(df2, \Key1\", \"inner\")"
df1.createOrReplaceTempView(\df1\")\ndf2.createOrReplaceTempView(\"df2\")\nspark.sql(\"SELECT * FROM df1 JOIN df2 ON df1.Key1 = df2.Key1 AND df1.Key2 = df2.Key2\")"
df1.merge(df2, \Key1\")"

Create a free account and access millions of resources

Create resources

Host any resource

Get auto-graded reports

Google

Continue with Google

Email

Continue with Email

Classlink

Continue with Classlink

Clever

Continue with Clever

or continue with

Microsoft

Microsoft

Apple

Apple

Others

Others

Already have an account?