Mastering Your Data - Post Test

Mastering Your Data - Post Test

Professional Development

9 Qs

quiz-placeholder

Similar activities

Quiz - AIML5 - Day 7

Quiz - AIML5 - Day 7

Professional Development

10 Qs

Quiz - AIML5 - Day 12

Quiz - AIML5 - Day 12

Professional Development

10 Qs

Quiz on Enhance Productivity with Gen Ai for Business

Quiz on Enhance Productivity with Gen Ai for Business

Professional Development

10 Qs

How well do you understand LOMAS?

How well do you understand LOMAS?

Professional Development

9 Qs

NDF - JULY 2024

NDF - JULY 2024

Professional Development

7 Qs

Understanding SPO Library and List

Understanding SPO Library and List

Professional Development

6 Qs

Tools for Analysis

Tools for Analysis

Professional Development

7 Qs

Crestsage/TQ Excel Training

Crestsage/TQ Excel Training

Professional Development

5 Qs

Mastering Your Data - Post Test

Mastering Your Data - Post Test

Assessment

Quiz

Information Technology (IT)

Professional Development

Easy

Created by

Irsyad Firsandi Wahyudi

Used 2+ times

FREE Resource

9 questions

Show all answers

1.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

You are working with a dataset of customer support tickets where each row represents a single interaction (e.g., a phone call, an email, a chat message) a customer had with support. What is the granularity of this dataset?

Per unique customer

Per customer support agent

Per customer interaction

Per issue resolution

2.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

A column named 'RegistrationDate' in your dataset contains values like "01/20/2023" or "March 15, 2024". Your data analysis tool initially identifies this column as a 'string' or 'object' type. You need to calculate the average time customers remain active since registration. What is the most crucial step you must take with this column before performing such a calculation?

Remove all rows with missing 'RegistrationDate' values.

Convert the column to an integer type.

Standardize the date format and convert the column to a datetime object.

Calculate the mean of the string values.

3.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

After loading a new dataset into your analytical environment, you execute a command (e.g., df.isnull().sum() in Pandas) and observe that the 'Email' column has a count of 50 missing values. What does this observation primarily tell you about your dataset?

All 50 missing emails belong to the same customer.

The 'Email' column contains 50 unique email addresses.

50 records in your dataset lack email information, indicating a data completeness issue for that column.

The 'Email' column has been incorrectly imported as a numeric type.

4.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

You are analyzing a dataset of product reviews. The 'Rating' column contains numbers from 1 to 5. However, you discover that some users accidentally submitted a rating of '10' which was then truncated to '1' by the system during data entry. This issue is not visible by simply checking df.dtypes or df.isnull().sum(). What kind of hidden data quality issue is this?

Duplicate records

Inconsistent data entry leading to data corruption

High cardinality

Outdated information

5.

MULTIPLE CHOICE QUESTION

1 min • 1 pt

What is the primary benefit of using methods like df.head() and df.tail() during the initial data inspection phase?

They perform complex statistical analysis on the entire dataset.

They automatically clean and preprocess the data for analysis.

They provide a quick visual overview of the dataset's beginning and end, helping to spot immediate structural issues or unexpected values.

They are used to calculate the mean and median of all numerical columns.

6.

OPEN ENDED QUESTION

10 mins • 1 pt

Imagine you have two datasets related to customer orders: Dataset A: Each row represents a single order placed by a customer. Dataset B: Each row represents a single item within an order (so one order might have multiple rows). If you want to calculate the average number of items per order, which dataset's granularity is more suitable for direct calculation, and why? If the other dataset were used, what initial step would be required?

Evaluate responses using AI:

OFF

7.

OPEN ENDED QUESTION

10 mins • 1 pt

You load a CSV file, and when you check the data types, a column named 'Customer_Age' is identified as an 'object' (string) type, even though you know it should contain numbers. a) What is the most likely reason for 'Customer_Age' being read as an 'object' instead of an integer or float? b) What problem would this incorrect data type cause if you tried to calculate the average customer age?

Evaluate responses using AI:

OFF

8.

OPEN ENDED QUESTION

10 mins • 1 pt

Explain why using df.sample() (to view random rows) can be more beneficial than just using df.head() or df.tail() when performing initial data inspection on a very large dataset.

Evaluate responses using AI:

OFF

9.

OPEN ENDED QUESTION

10 mins • 1 pt

Scenario: You are a junior data analyst tasked with performing an initial inspection of a new dataset containing information about online course enrollments. The data is provided as a CSV. Granularity: What is the granularity of this dataset (what does each row represent)? Expected Data Types: For 'EnrollmentDate', 'CompletionStatus', and 'PaymentAmount', what data type would you expect them to be after proper loading and conversion in a tool like Pandas? Missing Values: Identify any column(s) with missing values in this snippet and state the count of missing values for each. Uniqueness/Cardinality: If the full dataset contains 5000 rows, and you run df['EnrollmentID'].nunique(), what value would you expect to see, and why? What is the cardinality of the 'CompletionStatus' column in this snippet? Potential Hidden Issue (Beyond Snippet): Based on the 'StudentID' and 'CourseName' columns, describe one potential hidden data quality issue that might exist in the full dataset (not just this snippet) that could lead to incorrect analysis if not addressed. Explain why it's 'hidden'.

Evaluate responses using AI:

OFF