CSF 7.8 Module 7 Test Review

CSF 7.8 Module 7 Test Review

Assessment

Flashcard

Computers

9th - 12th Grade

Hard

Created by

Quizizz Content

FREE Resource

Student preview

quiz-placeholder

22 questions

Show all answers

1.

FLASHCARD QUESTION

Front

Which image best portrays your current mood?

Back

undefined

2.

FLASHCARD QUESTION

Front

Which of the following can be introduced accidentally during data selection? Bias, Ambiguity, Storytelling, Correlation

Back

Bias

Answer explanation

Bias can be introduced accidentally during data selection if the selection process is not representative of the entire population or if certain factors are weighted more heavily than others. This can lead to results that are skewed and not reflective of reality.

Storytelling refers to the way data is presented or interpreted, and it can also be influenced by personal biases or preconceived notions. This can lead to a distorted view of the data and inaccurate conclusions.

Correlation can be introduced accidentally during data selection if variables that are not truly related are included in the analysis. This can lead to spurious correlations and inaccurate results.

Ambiguity, on the other hand, is not typically introduced accidentally during data selection. Ambiguity refers to the presence of multiple interpretations or unclear meaning in the data, and it is more likely to arise from issues in data collection or measurement.

3.

FLASHCARD QUESTION

Front

Which of the following tasks would make data storytelling less accurate?
- Be clear and concise
- Address a specific customer’s problem
- Provide only one viewpoint
- Provide background information about the problem being addressed

Back

Provide only one viewpoint

Answer explanation

Providing only one viewpoint would make data storytelling less accurate.

While being clear and concise, addressing a specific customer's problem, and providing background information about the problem being addressed can all improve the accuracy of data storytelling, presenting only one viewpoint can skew the story and limit the audience's understanding of the full picture.

It's important to present a balanced view of the data, including any conflicting or alternative viewpoints, so that the audience can make informed decisions based on the information presented.

4.

FLASHCARD QUESTION

Front

How can you determine if a data set contains outliers by looking at a boxplot?

Back

The outliers will be marked by a circle beyond the whiskers on a boxplot.

Answer explanation

The outliers will be marked by a circle beyond the whiskers on a boxplot.

A boxplot is a graphical representation of the distribution of a dataset that provides information about the median, quartiles, and range of the data. It also indicates the presence of any outliers in the data.

Outliers are marked as individual points beyond the whiskers on a boxplot, which are lines that extend from the box and represent the range of the data. The whiskers typically extend to 1.5 times the interquartile range (IQR), which is the distance between the first and third quartiles of the data. Any data points that fall beyond the whiskers are considered outliers and are marked as individual circles on the plot.

Therefore, to determine if a data set contains outliers by looking at a boxplot, you can visually inspect the plot and look for any circles beyond the whiskers.

5.

FLASHCARD QUESTION

Front

Which of the following summary statistics can help determine the center of the data? Median, Lowest value, Highest value, Correlation coefficient

Back

Median

Answer explanation

The median can help determine the center of the data.

The median is a measure of central tendency that represents the middle value in a dataset when the values are arranged in order. Half of the data points are greater than the median, and half are less than the median. Therefore, the median is a useful summary statistic for determining the center of a dataset.

The lowest and highest values are not measures of central tendency and do not provide information about the center of the data.

Correlation coefficient is a measure of the strength and direction of the linear relationship between two variables and is not related to determining the center of the data.

6.

FLASHCARD QUESTION

Front

Which of the following data errors would affect the accuracy of a predictive model? Duplicate entries, Misspellings, Missing values, All of these options

Back

All of these options

Answer explanation

All of these options would affect the accuracy of a predictive model.

Missing values, misspellings, and duplicate entries are all common types of data errors that can have a significant impact on the accuracy of predictive models.

Missing values can introduce bias and reduce the precision of the model, while misspellings can lead to incorrect matches and inaccurate predictions. Duplicate entries can also cause problems, as they may be counted multiple times and skew the results.

To build an accurate predictive model, it is important to ensure that the data is clean, complete, and error-free. This includes checking for missing values, correcting misspellings, and removing duplicate entries before training the model.

7.

FLASHCARD QUESTION

Front

Which of the following is true?
Options: Data cleaning is the very last thing a data scientist does with data, Data cleaning is not necessary for most data sets, Data cleaning has to be done by hand, Data cleaning takes a lot of time but can be made faster by using a programming language

Back

Data cleaning takes a lot of time but can be made faster by using a programming language

Answer explanation

Data cleaning takes a lot of time but can be made faster by using a programming language.

Data cleaning, also known as data preprocessing, is an essential step in the data analysis process that involves identifying and correcting errors, inconsistencies, and missing values in a dataset. Data cleaning is typically done using a programming language, as it can be time-consuming and repetitive to do manually.

While some aspects of data cleaning, such as identifying errors or missing values, may require manual intervention, most data cleaning tasks can be automated using programming languages and libraries such as Python and R.

Data cleaning is a crucial step in preparing data for analysis and modeling, and it should be done before any other analysis is performed.

Create a free account and access millions of resources

Create resources
Host any resource
Get auto-graded reports
or continue with
Microsoft
Apple
Others
By signing up, you agree to our Terms of Service & Privacy Policy
Already have an account?