Feature Engineering

Professional Development

•

10 Qs

Similar activities

Rate plans and Features

Professional Development

•

10 Qs

B . A. Ist

University - Professional Development

•

10 Qs

smartPlanner

Professional Development

•

10 Qs

ATO Quiz

Professional Development

•

10 Qs

Test tuần 11/2025 - AR

Professional Development

•

10 Qs

March 2018 Lodi

Professional Development

•

10 Qs

Mi BOX 4K

Professional Development

•

10 Qs

Computers

4th Grade - Professional Development

•

15 Qs

Feature Engineering

Quiz

•

Other

•

Professional Development

•

Easy

Bayu Prasetya

Used 11+ times

FREE Resource

10 questions

Show all answers

OPEN ENDED QUESTION

15 mins • 1 pt

What is feature engineering?

Evaluate responses using AI:

OFF

Answer explanation

Feature engineering is the process of selecting and transforming raw data features into new features that are more informative, useful, and predictive for machine learning models. In other words, it involves identifying and creating new features from the raw data that can improve the performance of a machine learning model.

The goal of feature engineering is to create a set of features that best represents the underlying patterns and relationships in the data, and that can improve the accuracy of a machine learning model. Feature engineering is a crucial step in machine learning, as the quality of the input features can have a significant impact on the model's ability to learn and make accurate predictions.

Some common techniques used in feature engineering include:

1. Feature scaling or normalization

Handling missing data

2. Encoding categorical variables

3. Creating new features from existing ones (e.g. combining features, binning, scaling, etc.)

4. Feature selection (e.g. removing irrelevant or redundant features)

Effective feature engineering requires a deep understanding of the data and the problem at hand, as well as knowledge of the various techniques available. It can be a time-consuming process, but it is often essential for achieving high performance in machine learning models.

OPEN ENDED QUESTION

15 mins • 1 pt

Explain what scaling means, and when we use it?

Evaluate responses using AI:

OFF

Answer explanation

Scaling refers to the process of transforming the numerical values of features in a dataset to a common scale, typically between 0 and 1, or between -1 and 1. The purpose of scaling is to ensure that all the features have the same weight and importance during training a machine learning model.

When we use scaling: In many machine learning algorithms, the magnitude and distribution of the features can have a significant impact on the performance of the model. Features with larger magnitudes can dominate the model, leading to biased results. Additionally, some algorithms like K-Nearest Neighbors (KNN) and Support Vector Machines (SVM) use distance measures to determine similarity between data points, and features with larger magnitudes can heavily influence the distance calculations.

In summary, scaling is a crucial step in preparing data for machine learning models, as it ensures that all features are treated equally during training and prevents biases caused by differences in feature magnitudes.

OPEN ENDED QUESTION

15 mins • 1 pt

What is an outlier, and what is a proper scaling method for data that have outlier values?

Evaluate responses using AI:

OFF

Answer explanation

Outliers are data points that are significantly different from the majority of the data points in a dataset. When scaling data that contains outliers, it is important to choose a scaling method that is robust to outliers and does not heavily influence the scaling of the rest of the data.

One such method is the Robust Scaler, which is a scaling method that is robust to outliers. The RobustScaler works by scaling the data using the interquartile range (IQR), which is the difference between the 75th and 25th percentiles of the data. The RobustScaler then scales the data such that the IQR of each feature is 1, and the median of each feature is 0.

OPEN ENDED QUESTION

15 mins • 1 pt

Explain what One Hot Encoding is and what is the difference between One Hot Encoding and Binary Encoding!

Evaluate responses using AI:

OFF

Answer explanation

One hot encoding: One hot encoding is a process of representing categorical variables as a binary vector. It involves creating a new binary feature for each possible category of the original categorical feature. The value of the binary feature is 1 if the data point belongs to that category, and 0 otherwise. For example, if we have a categorical variable "color" with possible values "red," "blue," and "green," one hot encoding would create three new binary features "color_red," "color_blue," and "color_green."

inary encoding: Binary encoding is similar to one hot encoding, but it represents each category as a binary number rather than a binary vector. It involves assigning each category a unique number and then representing that number as a binary vector. For example, if we have a categorical variable "color" with possible values "red," "blue," and "green," binary encoding might assign the numbers 0, 1, and 2 to these categories and represent them as binary vectors (0 = 00, 1 = 01, 2 = 10).

The main difference between one hot encoding and binary encoding is that one hot encoding creates a new binary feature for each category, while binary encoding represents each category as a binary number. This means that binary encoding uses fewer features and may be more memory-efficient, but it may also be less expressive and have lower accuracy in some cases.

OPEN ENDED QUESTION

15 mins • 1 pt

If we encode the data with 7 categories,

1. How many new columns are formed if we use One Hot Encoding?

2. How many new columns are formed if we use One Hot Encoding with the parameter drop_first = True (sklearn)?

3. How many new columns are formed if we use the Binary Encoder?

Evaluate responses using AI:

OFF

Answer explanation

1. 7 columns

2. 6 columns

3. 4 clumns

OPEN ENDED QUESTION

15 mins • 1 pt

If we have a dataset that comes from survey results (eg customer satisfaction) which consists of 3 categories: Dissatisfied, Satisfied, Very Satisfied. What encoding method is suitable for the data, and explain why

Evaluate responses using AI:

OFF

Answer explanation

For categorical data like customer satisfaction levels, Ordinal encoding is a suitable encoding method. Ordinal encoding can be useful when there is a natural ordering to the categories, such as in the case of customer satisfaction levels or education levels. The categories "Dissatisfied," "Satisfied," and "Very Satisfied" have a natural ordering, and assigning numerical values to each category in a way that preserves this ordering can be useful for some machine learning algorithms.

However, it is important to note that ordinal encoding assumes that the distance between the values is equal, which may not be true in all cases. For example, the difference between Dissatisfied and Satisfied may not be the same as the difference between Satisfied and Very Satisfied.

OPEN ENDED QUESTION

15 mins • 1 pt

What is feature selection, and how do we do feature selection?

Evaluate responses using AI:

OFF

Answer explanation

Feature selection is the process of selecting a subset of relevant features (i.e., variables or attributes) from a larger set of features in a dataset, with the goal of improving the performance of a machine learning model.

There are several methods for feature selection, including:

1. Filter methods: These methods use statistical measures (e.g., correlation, mutual information) to rank the importance of each feature and select the top k features. Filter methods are generally fast and computationally efficient, but they do not consider the interaction between features.

2. Wrapper methods: These methods evaluate the performance of a machine learning model using different subsets of features and select the subset that achieves the best performance. Wrapper methods can be computationally expensive, but they can capture the interaction between features.

3. Embedded methods: These methods incorporate feature selection into the process of building a machine learning model, such as using regularization techniques (e.g., Lasso, Ridge) that penalize the coefficients of less important features. Embedded methods can be computationally efficient and effective, but they may not be suitable for all types of machine learning models.

4. Domain knowledge: Sometimes, domain knowledge can be used to select relevant features based on the understanding of the problem domain and the potential impact of each feature on the outcome.

When selecting features, it is important to consider the trade-off between the number of features and the performance of the machine learning model. Having too many irrelevant features can lead to overfitting and reduce the performance of the model, while having too few relevant features can result in underfitting and also reduce the performance

10.

OPEN ENDED QUESTION

15 mins • 1 pt

What's the right way to handle outliers?

Evaluate responses using AI:

OFF

Answer explanation

Handling outliers is an important step in data preprocessing, as outliers can skew statistical measures, reduce the accuracy of machine learning models, and introduce bias into analysis. Here are some common strategies for handling outliers:

1. Detection: Before handling outliers, it is important to first identify them. Outliers can be detected using statistical methods such as the z-score, which measures the distance of each data point from the mean in units of standard deviation, or the interquartile range (IQR), which measures the spread of the data in the middle 50% of the range.

2. Removal: One strategy for handling outliers is to remove them from the dataset. This can be done if the outliers are caused by measurement error or other factors that make them unlikely to represent the underlying population. However, it is important to carefully consider the impact of outlier removal on the analysis or modeling task and to document any changes made to the data.

3. Transformation: Outliers can be handled by transforming the data, such as applying a logarithmic or square root transformation, which can reduce the impact of extreme values on the statistical measures. However, this approach can also change the interpretation of the data and should be carefully evaluated.

4. Binning: Binning involves grouping the data into intervals and treating each interval as a separate category. This can be useful for handling outliers in categorical or ordinal data, but may not be appropriate for continuous data.

5. Robust modeling: Robust modeling techniques, such as regression methods that are less sensitive to outliers, can be used to handle outliers in machine learning. For example, Lasso and Ridge regression are methods that are robust to outliers.

The choice of strategy for handling outliers depends on the specific problem at hand and the properties of the dataset. It is important to evaluate the impact of outliers on the analysis or modeling task and choose the appropriate strategy accordingly.

Create a free account and access millions of resources

Create resources

Host any resource

Get auto-graded reports

or continue with

Microsoft

Apple

Others

By signing up, you agree to our Terms of Service & Privacy Policy

Already have an account?

Similar Resources on Wayground

9 questions

Da-hoot!

Quiz

•

Professional Development

12 questions

Intro SQL 2 - Data Dictionary

Quiz

•

Professional Development

10 questions

Communicate Clearly (Pre Test)

Quiz

•

Professional Development

13 questions

Beamery

Quiz

•

Professional Development

10 questions

Evaluasi Data Integrity

Quiz

•

Professional Development

15 questions

Test tuần 13+14/2025 - AR

Quiz

•

Professional Development

12 questions

Uji Pengetahuan Umum

Quiz

•

Professional Development

7 questions

APAC SMB

Quiz

•

Professional Development

Popular Resources on Wayground

10 questions

Video Games

Quiz

•

6th - 12th Grade

10 questions

Lab Safety Procedures and Guidelines

Interactive video

•

6th - 10th Grade

25 questions

Multiplication Facts

Quiz

•

5th Grade

10 questions

UPDATED FOREST Kindness 9-22

Lesson

•

9th - 12th Grade

22 questions

Adding Integers

Quiz

•

6th Grade

15 questions

Subtracting Integers

Quiz

•

7th Grade

20 questions

US Constitution Quiz

Quiz

•

11th Grade

10 questions

Exploring Digital Citizenship Essentials

Interactive video

•

6th - 10th Grade

Discover more resources for Other

10 questions

How to Email your Teacher

Quiz

•

Professional Development

20 questions

Prepositions of Place

Quiz

•

Professional Development

10 questions

Algebra 1 Unit 2 Lesson 1 Practice

Quiz

•

Professional Development

20 questions

Realidades 1 Weather Spanish 1

Quiz

•

KG - Professional Dev...