Search Header Logo
Chapter 4 - Text Data Mining

Chapter 4 - Text Data Mining

Assessment

Presentation

Professional Development

University

Easy

Created by

Ryan Gente

Used 1+ times

FREE Resource

37 Slides • 31 Questions

1

media

Chapter 4 - Text Data Mining

Prepared by: RYAN JISON DE LA GENTE, Ph.D.

2

media
media

Text Data Mining

3

Text Data Mining

  • Text data mining can be described as the process of extracting essential data from standard language text.

  • All the data that we generate via text messages, documents, emails, files are written in common language text.

  • Text mining is primarily used to draw useful insights or patterns from such data.

  • The text mining market has experienced exponential growth and adoption over the last few years and also expected to gain significant growth and adoption in the coming future. One of the primary reasons behind the adoption of text mining is higher competition in the business market, many organizations seeking value-added solutions to compete with other organizations.

4

Text Data Mining

  • With increasing completion in business and changing customer perspectives, organizations are making huge investments to find a solution that is capable of analyzing customer and competitor data to improve competitiveness.

  • The primary source of data is e-commerce websites, social media platforms, published articles, survey, and many more. The larger part of the generated data is unstructured, which makes it challenging and expensive for the organizations to analyze with the help of the people.

5

Multiple Choice

Text data mining can be described as the process of extracting essential data from non standard language text.

1

True

2

False

6

Multiple Choice

All the data that we generate via text messages, documents, emails, files are written in common language text.

1

True

2

False

7

Multiple Choice

Text mining is primarily used to draw useful insights or patterns from such data.

1

True

2

False

8

Multiple Choice

The text mining market has experienced exponential growth and adoption over the last few years and also expected to gain significant growth and adoption in the coming future.

1

True

2

False

9

Multiple Choice

The primary source of data is e-commerce websites, social media platforms, published articles, survey, and many more. The larger part of the generated data is unstructured, which makes it challenging and expensive for the organizations to analyze with the help of the people.

1

True

2

False

10

media
media
media
media
media

Areas of text mining in data mining

11

media
media

Information Extraction

​The automatic extraction of structured data such as entities, entities relationships, and attributes describing entities from an unstructured source is called information extraction.

12

media
media

Natural Language Processing

The development of the NLP application is difficult because computers generally expect humans to "Speak" to them in a programming language that is accurate, clear, and exceptionally structured. Human speech is usually not authentic so that it can depend on many complex variables, including slang, social context, and regional dialects.

NLP stands for Natural language processing. Computer software can understand human language as same as it is spoken. NLP is primarily a component of artificial intelligence(AI).

13

media
media

Data Mining

Data mining refers to the extraction of useful data, hidden patterns from large data sets. Data mining tools can predict behaviors and future trends that allow businesses to make a better data-driven decision.

Data mining tools can be used to resolve many business problems that have traditionally been too time-consuming.

14

media
media

Information Retrieval

Information retrieval deals with retrieving useful data from data that is stored in our systems. Alternately, as an analogy, we can view search engines that happen on websites such as e-commerce sites or any other sites as part of information retrieval.

15

Multiple Choice

Structured data such as entities, entities relationships, and attributes describing entities from an unstructured source is called information extraction.

1

Information Extraction

2

Natural Language Processing

3

Data Mining

4

Information Retrieval

16

Multiple Choice

Primarily a component of artificial intelligence(AI).

1

Information Extraction

2

Natural Language Processing

3

Data Mining

4

Information Retrieval

17

Multiple Choice

Refers to the extraction of useful data, hidden patterns from large data sets.

1

Information Extraction

2

Natural Language Processing

3

Data Mining

4

Information Retrieval

18

Multiple Choice

Deals with retrieving useful data from data that is stored in our systems.

1

Information Extraction

2

Natural Language Processing

3

Data Mining

4

Information Retrieval

19

media
media

Text Mining Process

20

Text Mining Process

Text transformation - A text transformation is a technique that is used to control the capitalization of the text.

Text Pre-processing - Pre-processing is a significant task and a critical step in Text Mining, Natural Language Processing (NLP), and information retrieval(IR). In the field of text mining, data pre-processing is used for extracting useful information and knowledge from unstructured text data. Information Retrieval (IR) is a matter of choosing which documents in a collection should be retrieved to fulfill the user's need.

21

Text Mining Process

Feature selection - Feature selection is a significant part of data mining. Feature selection can be defined as the process of reducing the input of processing or finding the essential information sources. The feature selection is also called variable selection.

Data Mining - Now, in this step, the text mining procedure merges with the conventional process. Classic Data Mining procedures are used in the structural database.

Evaluate - Afterward, it evaluates the results. Once the result is evaluated, the result abandon.

Applications – includes: Risk Management, Customer Care Service, Business Intelligence and Social Media Analysis.

22

Multiple Choice

Is a technique that is used to control the capitalization of the text.

1

Text transformation

2

Text Pre-processing

3

Feature selection

4

Data Mining

23

Multiple Choice

Is a significant task and a critical step in Text Mining, Natural Language Processing (NLP), and information retrieval(IR)

1

Text transformation

2

Text Pre-processing

3

Feature selection

4

Data Mining

24

Multiple Choice

Can be defined as the process of reducing the input of processing or finding the essential information sources.

1

Text transformation

2

Text Pre-processing

3

Feature selection

4

Data Mining

25

Multiple Choice

Procedures are used in the structural database.

1

Text transformation

2

Text Pre-processing

3

Feature selection

4

Data Mining

26

Multiple Choice

It evaluates the results. Once the result is evaluated, the result abandon.

1

Evaluate

2

Text Pre-processing

3

Feature selection

4

Data Mining

27

Multiple Choice

includes: Risk Management, Customer Care Service, Business Intelligence and Social Media Analysis

1

Evaluate

2

Applications

3

Feature selection

4

Data Mining

28

media
media
media
media
media

Text Mining Process Applications

29

media
media

Text Mining Process Applications

​Risk Management is a systematic and logical procedure of analyzing, identifying, treating, and monitoring the risks involved in any action or process in organizations. Insufficient risk analysis is usually a leading cause of disappointment.

It is particularly true in the financial organizations where adoption of Risk Management Software based on text mining technology can effectively enhance the ability to diminish risk. It enables the administration of millions of sources and petabytes of text documents, and giving the ability to connect the data. It helps to access the appropriate data at the right time.

30

media
media

Text Mining Process Applications

Text mining methods, particularly NLP, are finding increasing significance in the field of customer care. Organizations are spending in text analytics programming to improve their overall experience by accessing the textual data from different sources such as customer feedback, surveys, customer calls, etc.

The primary objective of text analysis is to reduce the response time of the organizations and help to address the complaints of the customer rapidly and productively.


31

media
media

Text Mining Process Applications

Companies and business firms have started to use text mining strategies as a major aspect of their business intelligence.

Besides providing significant insights into customer behavior and trends, text mining strategies also support organizations to analyze the qualities and weaknesses of their opponent's so, giving them a competitive advantage in the market.


32

media
media

Text Mining Process Applications

Social media analysis helps to track the online data, and there are numerous text mining tools designed particularly for performance analysis of social media sites.

These tools help to monitor and interpret the text generated via the internet from the news, emails, blogs, etc. Text mining tools can precisely analyze the total no of posts, followers, and total no of likes of your brand on a social media platform that enables you to understand the response of the individuals who are interacting with your brand and content.

33

media
media

Bitcoin Data Mining

Bitcoin mining refers to the process of authenticating and adding transactional records to the public ledger. The public ledge is known as the blockchain because it comprises a chain of the block. Before we understand the Bitcoin mining concept, we should understand what Bitcoin is. Bitcoin is virtual money having some value, and its value is not static, it varies according to time. There is no Bitcoin regulatory body that regulates the Bitcoin transactions.

Bitcoin was created under the pseudonym(False name) Satoshi Nakamoto, who announced the invention, and later it was implemented as open-source code. An only end-to-end version of electronic money would enable online payments to be sent directly from one person to another without the interference of an economic body.Bitcoin is a network practice that empowers people to transfer assets rights on account units called Bitcoin's, made in limited quantity. When an individual sends a couple of bitcoins to another individual, this data is communicated to the peer-to-peer bitcoin network.

34

media
media

Bitcoin Data Mining

This technology remains similar to purchasing something with virtual currency. However, one advantage of Bitcoins is that the arrangement remains unidentified. The personal identity of the sender and the beneficiary (receiver) remain encrypted.

It is the primary reason that's why it has become a trusted form of money transaction on the web. By convention, the complexity in making distributed money is the requirement for a proposal to avoid double-spending. One individual may simultaneously transmit two transactions, sending similar coins to two distinct parties on the network.

Bitcoin settles this difficulty and ensures agreement of rights by keeping up a community ledger of all transactions, called the blockchain. New transactions are grouped mutually and are checked against the existing record to make sure all new communications are valid. Bitcoin's accuracy is ensured by individuals who give computation authority to its system known as miners to validate and affix transactions to a public ledger.

35

media
media

How the Bitcoin Mining Works

36

media
media

Bitcoin Transaction

A Bitcoin transaction is a
section of data that is
transmitted to the network
and, if valid, it ends up in a
block in the blockchain. The
concept of a Bitcoin
transaction is to transfer the
responsibility of an amount of
Bitcoin address.

37

media
media

Data Mining Models

Data mining uses raw data to extract information and present it uniquely.

The data mining process is usually found in the most diverse range of applications, including business intelligence studies, political model forecasting, web ranking forecasting, weather pattern model forecasting, etc.

38

media
media

What are data mining models?

A Data mining model refers to
a method that usually use to
present the information and
various ways in which they can
apply information to specific
questions and problems.

39

media
media

Types of data mining models

40

Types of data mining models


A predictive data mining model predicts data values using known results from the different data sets. Predictive modeling can not be classified as a separate discipline; it occurs in all organizations or industries across all disciplines.

The main objective of predictive data mining models is to predict the future based on past data, generally but not always on statistical modeling. Predictive modeling is used in healthcare industries to identify high-risk patients with congestive heart failure, high blood pressure, diabetes, infection, cancer, etc. It is also used in the vehicle insurance company to assign the risk of accidents to the policyholder.

Some text here about the topic of discussion

41

Types of data mining models


A descriptive model differentiates the patterns and relationships in data. A descriptive model does not attempt to generalize to a statistical population or random process. A predictive model attempts to generalize to a population or random process. Predictive models should give prediction intervals and must be cross-validated; that is, they must prove that they can be used to make predictions with data that was not used in constructing the model. Descriptive analytics focuses on the summarization and conversion of the data into useful information for reporting and monitoring.

Some text here about the topic of discussion

42

media
media

Data Mining Models

43

media

Predictive data mining models

01Classification
03
Prediction

02Regression
04Time Series Analysis

44

Predictive data mining models


Classification: In data mining, classification refers to a form of data analysis where a machine learning model assigns a specific category to a new observation. It is based on what the model has learned from the data sets. In other words, classification is the act of assigning objects to many predefined categories. One example of classification in the banking and financial services industry is identifying whether transactions are fraudulent or not. In the same way, machine learning can also be used to predict whether a loan application would be approved or not.

45

Predictive data mining models


Regression: Regression refers to a method that verifies the value of data for a function. Generally, it is used for appropriate data. A linear regression model in the context of machine learning or statistics is basically a linear approach for modeling the relationships between the dependent variable known as the result and your independent variable is known as features.

If your model has only one independent variable, it is called simple linear regression, and else it is called multiple linear regression.


46

Predictive data mining models


Prediction: In data mining, prediction is used to identify data value based on the description of another corresponding data value. The prediction in data mining is known as Numeric Prediction. Generally, regression analysis is used for prediction. For example, in credit card fraud detection, data history for a particular person's credit card usage has to be analyzed. If any abnormal pattern was detected, it should be reported as 'fraudulent action’.

47

Predictive data mining models


Time series analysis: Time series analysis refers to the data sets based on time. It serves as an independent variable to predict the dependent variable in time.

48

media

Descriptive data mining models

Association
Rules

Sequence

Clustering

Summarization

Descriptive Data

Mining

49

Descriptive data mining models

Clustering: Clustering is grouping a set of objects so that objects in the same group, called a cluster, are more similar than those in other groups' clusters.


Association rules: Association rules determine a causal relationship between huge sets of data objects. The way the algorithm works is that you have. For example, a list of items you purchased at the grocery store for the past six months is data, and it calculates a percentage at which items are purchased together. For example, what are the chances of you buying milk with cereal?

50

Descriptive data mining models

Sequence: Sequence refers to the discovery of useful patterns in the data is in relation to some objective of how it is interesting.

Summarization: Summarization holds a data set in more depth which is easy to understand form.

51

Multiple Choice

Systematic and logical procedure of analyzing, identifying, treating, and monitoring the risks involved in any action or process in organizations.

1

Risk Management

2

Customer Care Service

3

Business Intelligence

4

Social media analysis

52

Multiple Choice

Reduce the response time of the organizations and help to address the complaints of the customer rapidly and productively.

1

Risk Management

2

Customer Care Service

3

Business Intelligence

4

Social media analysis

53

Multiple Choice

Providing significant insights into customer behavior and trends, text mining strategies also support organizations to analyze the qualities and weaknesses of their opponent's so, giving them a competitive advantage in the market.

1

Risk Management

2

Customer Care Service

3

Business Intelligence

4

Social media analysis

54

Multiple Choice

Helps to track the online data, and there are numerous text mining tools designed particularly for performance analysis of social media sites.

1

Risk Management

2

Customer Care Service

3

Business Intelligence

4

Social media analysis

55

Multiple Choice

Refers to the process of authenticating and adding transactional records to the public ledger. The public ledge is known as the blockchain because it comprises a chain of the block.

1

Bitcoin mining

2

Bitcoin

3

Predictive data mining model

4

Descriptive model

56

Multiple Choice

Virtual money having some value, and its value is not static, it varies according to time.

1

Bitcoin mining

2

Bitcoin

3

Predictive data mining model

4

Descriptive model

57

Multiple Choice

Predicts the values of data using known results gathered from the different data sets.

1

Bitcoin mining

2

Bitcoin

3

Predictive data mining model

4

Descriptive model

58

Multiple Choice

Differentiates the patterns and relationships in data.

1

Bitcoin mining

2

Bitcoin

3

Predictive data mining model

4

Descriptive model

59

Multiple Choice

Refers to a form of data analysis where a machine learning model assigns a specific category to a new observation.

1

Classification

2

Regression

3

Prediction

4

Time series analysis

60

Multiple Choice

Refers to a method that verifies the value of data for a function.

1

Classification

2

Regression

3

Prediction

4

Time series analysis

61

Multiple Choice

Is used to identify data value based on the description of another corresponding data value.

1

Classification

2

Regression

3

Prediction

4

Time series analysis

62

Multiple Choice

Refers to the data sets based on time.

1

Classification

2

Regression

3

Prediction

4

Time series analysis

63

Multiple Choice

Is grouping a set of objects so that objects in the same group called a cluster are more similar than those in other groups clusters.

1

Clustering

2

Association rules

3

Sequence

4

Summarization

64

Multiple Choice

Determine a causal relationship between huge sets of data objects.

1

Clustering

2

Association rules

3

Sequence

4

Summarization

65

Multiple Choice

refers to the discovery of useful patterns in the data is in relation to some objective of how it is interesting.

1

Clustering

2

Association rules

3

Sequence

4

Summarization

66

Multiple Choice

Holds a data set in more depth which is easy to understand form.

1

Clustering

2

Association rules

3

Sequence

4

Summarization

67

media

End of Topic 4

68

media

THANK YOU

media

Chapter 4 - Text Data Mining

Prepared by: RYAN JISON DE LA GENTE, Ph.D.

Show answer

Auto Play

Slide 1 / 68

SLIDE