EDA Quiz 1

EDA Quiz 1

Professional Development

27 Qs

quiz-placeholder

Similar activities

ITM unit 2

ITM unit 2

KG - Professional Development

24 Qs

FR-MPA.06-Network Administrator Madya (TEORI)

FR-MPA.06-Network Administrator Madya (TEORI)

University - Professional Development

25 Qs

Ciência de Dados -  Informações e Dados

Ciência de Dados - Informações e Dados

Professional Development

25 Qs

Тест

Тест

Professional Development

25 Qs

Web Application Development (WAD)

Web Application Development (WAD)

Professional Development

25 Qs

Module-3 [Data Processing, Data Wrangling, Data Visualization]

Module-3 [Data Processing, Data Wrangling, Data Visualization]

Professional Development

30 Qs

Tebak Berhadiah GTS Hari-4

Tebak Berhadiah GTS Hari-4

Professional Development

22 Qs

Multiverse of Computers

Multiverse of Computers

4th Grade - Professional Development

22 Qs

EDA Quiz 1

EDA Quiz 1

Assessment

Quiz

Computers

Professional Development

Medium

Created by

Vijay Agrawal

Used 1+ times

FREE Resource

27 questions

Show all answers

1.

MULTIPLE SELECT QUESTION

30 sec • 1 pt

You are ingesting data from multiple sources (CSV, JSON, and Parquet). Which of the following statements are correct?

CSV files cannot handle hierarchical data structures.

JSON files are human-readable and can handle nested objects.

Parquet files store data in a columnar format and allow efficient compression.

CSV files always load faster than Parquet files.

2.

MULTIPLE SELECT QUESTION

30 sec • 1 pt

You have a 50 GB CSV file you need to ingest and analyze. Which approach(es) could be most practical?

Use Python's built-in open() and read line by line in a loop.

Use pandas.read_csv() without specifying chunksize.

Use chunking in pandas (chunksize parameter) to process data in smaller batches.

Convert the CSV into a more compressed format like Parquet and use a distributed environment (e.g., PySpark).

3.

MULTIPLE SELECT QUESTION

30 sec • 1 pt

Which of the following summary statistics are typically useful during EDA?

Mean, median, mode

Standard deviation, variance

Range, interquartile range

Confusion matrix

4.

MULTIPLE SELECT QUESTION

30 sec • 1 pt

When examining a dataset's distribution, which are signs that the data might be right-skewed?

The mean is greater than the median.

The mean is less than the median.

A histogram shows a longer tail to the right.

The mode is greater than the median.

5.

MULTIPLE SELECT QUESTION

30 sec • 1 pt

You have a dataset of housing prices. You notice that some houses are extremely expensive compared to the rest. Which methods can help you identify outliers effectively?

Box plot to detect points beyond 1.5 IQR from the quartiles.

Z-scores to find values far from the mean.

Dropping all data above the median price.

Calculating the difference between max and min values.

6.

MULTIPLE CHOICE QUESTION

30 sec • 1 pt

What does a high variance in a dataset indicate?

The data points are spread out from the mean

The data points are closely clustered around the mean

The dataset has a low level of variability

The dataset has many missing values

7.

MULTIPLE SELECT QUESTION

30 sec • 1 pt

Which transformations are commonly used to stabilize variance or reduce skew in data?

Box-Cox transformation

Log transformation

Min-Max scaling

One-hot encoding

Create a free account and access millions of resources

Create resources
Host any resource
Get auto-graded reports
or continue with
Microsoft
Apple
Others
By signing up, you agree to our Terms of Service & Privacy Policy
Already have an account?