

Data Cleaning and Preparation Flashcard
Flashcard
•
•
Practice Problem
•
Hard
Wayground Content
FREE Resource
Student preview

11 questions
Show all answers
1.
FLASHCARD QUESTION
Front
Which of the following is NOT a part of data cleaning? Handling missing values, Removing duplicates, Correcting inconsistent data
Back
Applying machine learning algorithms
2.
FLASHCARD QUESTION
Front
What is the main disadvantage of dropping rows with missing values?
Back
It reduces the size of the dataset, potentially leading to data loss
3.
FLASHCARD QUESTION
Front
Which of the following methods is best suited for handling categorical variables with many unique values? Label encoding, One-hot encoding, Frequency encoding, Min-max normalization
Back
Frequency encoding
4.
FLASHCARD QUESTION
Front
Why is feature scaling important in data preparation?
Back
It ensures that numerical features contribute equally to distance-based models
5.
FLASHCARD QUESTION
Front
If a dataset has a column with typos such as ["Male", "male", "M", "FEMALE", "female", "F"], which data cleaning technique should be applied?
Back
Convert all values to lowercase and map variations to standard categories
6.
FLASHCARD QUESTION
Front
Given the following DataFrame, how do you replace all missing values in the "Age" column with the median value? Options: df['Age'].fillna(df['Age'].mean(), inplace=True), df['Age'].fillna(df['Age'].median(), inplace=True), df['Age'].replace(np.nan, 0, inplace=True), df.dropna(subset=['Age'], inplace=True)
Back
df['Age'].fillna(df['Age'].median(), inplace=True)
7.
FLASHCARD QUESTION
Front
How can you remove duplicate rows from a DataFrame while keeping only the last occurrence?
Back
df.drop_duplicates(keep='last', inplace=True)
Access all questions and much more by creating a free account
Create resources
Host any resource
Get auto-graded reports

Continue with Google

Continue with Email

Continue with Classlink

Continue with Clever
or continue with

Microsoft
%20(1).png)
Apple
Others
Already have an account?