
Chapter 4 - Text Data Mining
Presentation
•
Professional Development
•
University
•
Easy
Ryan Gente
Used 1+ times
FREE Resource
37 Slides • 31 Questions
1
Chapter 4 - Text Data Mining
Prepared by: RYAN JISON DE LA GENTE, Ph.D.
2
Text Data Mining
3
Text Data Mining
Text data mining can be described as the process of extracting essential data from standard language text.
All the data that we generate via text messages, documents, emails, files are written in common language text.
Text mining is primarily used to draw useful insights or patterns from such data.
The text mining market has experienced exponential growth and adoption over the last few years and also expected to gain significant growth and adoption in the coming future. One of the primary reasons behind the adoption of text mining is higher competition in the business market, many organizations seeking value-added solutions to compete with other organizations.
4
Text Data Mining
With increasing completion in business and changing customer perspectives, organizations are making huge investments to find a solution that is capable of analyzing customer and competitor data to improve competitiveness.
The primary source of data is e-commerce websites, social media platforms, published articles, survey, and many more. The larger part of the generated data is unstructured, which makes it challenging and expensive for the organizations to analyze with the help of the people.
5
Multiple Choice
Text data mining can be described as the process of extracting essential data from non standard language text.
True
False
6
Multiple Choice
All the data that we generate via text messages, documents, emails, files are written in common language text.
True
False
7
Multiple Choice
Text mining is primarily used to draw useful insights or patterns from such data.
True
False
8
Multiple Choice
The text mining market has experienced exponential growth and adoption over the last few years and also expected to gain significant growth and adoption in the coming future.
True
False
9
Multiple Choice
The primary source of data is e-commerce websites, social media platforms, published articles, survey, and many more. The larger part of the generated data is unstructured, which makes it challenging and expensive for the organizations to analyze with the help of the people.
True
False
10
Areas of text mining in data mining
11
Information Extraction
The automatic extraction of structured data such as entities, entities relationships, and attributes describing entities from an unstructured source is called information extraction.
12
Natural Language Processing
The development of the NLP application is difficult because computers generally expect humans to "Speak" to them in a programming language that is accurate, clear, and exceptionally structured. Human speech is usually not authentic so that it can depend on many complex variables, including slang, social context, and regional dialects.
NLP stands for Natural language processing. Computer software can understand human language as same as it is spoken. NLP is primarily a component of artificial intelligence(AI).
13
Data Mining
Data mining refers to the extraction of useful data, hidden patterns from large data sets. Data mining tools can predict behaviors and future trends that allow businesses to make a better data-driven decision.
Data mining tools can be used to resolve many business problems that have traditionally been too time-consuming.
14
Information Retrieval
Information retrieval deals with retrieving useful data from data that is stored in our systems. Alternately, as an analogy, we can view search engines that happen on websites such as e-commerce sites or any other sites as part of information retrieval.
15
Multiple Choice
Structured data such as entities, entities relationships, and attributes describing entities from an unstructured source is called information extraction.
Information Extraction
Natural Language Processing
Data Mining
Information Retrieval
16
Multiple Choice
Primarily a component of artificial intelligence(AI).
Information Extraction
Natural Language Processing
Data Mining
Information Retrieval
17
Multiple Choice
Refers to the extraction of useful data, hidden patterns from large data sets.
Information Extraction
Natural Language Processing
Data Mining
Information Retrieval
18
Multiple Choice
Deals with retrieving useful data from data that is stored in our systems.
Information Extraction
Natural Language Processing
Data Mining
Information Retrieval
19
Text Mining Process
20
Text Mining Process
Text transformation - A text transformation is a technique that is used to control the capitalization of the text.
Text Pre-processing - Pre-processing is a significant task and a critical step in Text Mining, Natural Language Processing (NLP), and information retrieval(IR). In the field of text mining, data pre-processing is used for extracting useful information and knowledge from unstructured text data. Information Retrieval (IR) is a matter of choosing which documents in a collection should be retrieved to fulfill the user's need.
21
Text Mining Process
Feature selection - Feature selection is a significant part of data mining. Feature selection can be defined as the process of reducing the input of processing or finding the essential information sources. The feature selection is also called variable selection.
Data Mining - Now, in this step, the text mining procedure merges with the conventional process. Classic Data Mining procedures are used in the structural database.
Evaluate - Afterward, it evaluates the results. Once the result is evaluated, the result abandon.
Applications – includes: Risk Management, Customer Care Service, Business Intelligence and Social Media Analysis.
22
Multiple Choice
Is a technique that is used to control the capitalization of the text.
Text transformation
Text Pre-processing
Feature selection
Data Mining
23
Multiple Choice
Is a significant task and a critical step in Text Mining, Natural Language Processing (NLP), and information retrieval(IR)
Text transformation
Text Pre-processing
Feature selection
Data Mining
24
Multiple Choice
Can be defined as the process of reducing the input of processing or finding the essential information sources.
Text transformation
Text Pre-processing
Feature selection
Data Mining
25
Multiple Choice
Procedures are used in the structural database.
Text transformation
Text Pre-processing
Feature selection
Data Mining
26
Multiple Choice
It evaluates the results. Once the result is evaluated, the result abandon.
Evaluate
Text Pre-processing
Feature selection
Data Mining
27
Multiple Choice
includes: Risk Management, Customer Care Service, Business Intelligence and Social Media Analysis
Evaluate
Applications
Feature selection
Data Mining
28
Text Mining Process Applications
29
Text Mining Process Applications
Risk Management is a systematic and logical procedure of analyzing, identifying, treating, and monitoring the risks involved in any action or process in organizations. Insufficient risk analysis is usually a leading cause of disappointment.
It is particularly true in the financial organizations where adoption of Risk Management Software based on text mining technology can effectively enhance the ability to diminish risk. It enables the administration of millions of sources and petabytes of text documents, and giving the ability to connect the data. It helps to access the appropriate data at the right time.
30
Text Mining Process Applications
Text mining methods, particularly NLP, are finding increasing significance in the field of customer care. Organizations are spending in text analytics programming to improve their overall experience by accessing the textual data from different sources such as customer feedback, surveys, customer calls, etc.
The primary objective of text analysis is to reduce the response time of the organizations and help to address the complaints of the customer rapidly and productively.
31
Text Mining Process Applications
Companies and business firms have started to use text mining strategies as a major aspect of their business intelligence.
Besides providing significant insights into customer behavior and trends, text mining strategies also support organizations to analyze the qualities and weaknesses of their opponent's so, giving them a competitive advantage in the market.
32
Text Mining Process Applications
Social media analysis helps to track the online data, and there are numerous text mining tools designed particularly for performance analysis of social media sites.
These tools help to monitor and interpret the text generated via the internet from the news, emails, blogs, etc. Text mining tools can precisely analyze the total no of posts, followers, and total no of likes of your brand on a social media platform that enables you to understand the response of the individuals who are interacting with your brand and content.
33
Bitcoin Data Mining
Bitcoin mining refers to the process of authenticating and adding transactional records to the public ledger. The public ledge is known as the blockchain because it comprises a chain of the block. Before we understand the Bitcoin mining concept, we should understand what Bitcoin is. Bitcoin is virtual money having some value, and its value is not static, it varies according to time. There is no Bitcoin regulatory body that regulates the Bitcoin transactions.
Bitcoin was created under the pseudonym(False name) Satoshi Nakamoto, who announced the invention, and later it was implemented as open-source code. An only end-to-end version of electronic money would enable online payments to be sent directly from one person to another without the interference of an economic body.Bitcoin is a network practice that empowers people to transfer assets rights on account units called Bitcoin's, made in limited quantity. When an individual sends a couple of bitcoins to another individual, this data is communicated to the peer-to-peer bitcoin network.
34
Bitcoin Data Mining
This technology remains similar to purchasing something with virtual currency. However, one advantage of Bitcoins is that the arrangement remains unidentified. The personal identity of the sender and the beneficiary (receiver) remain encrypted.
It is the primary reason that's why it has become a trusted form of money transaction on the web. By convention, the complexity in making distributed money is the requirement for a proposal to avoid double-spending. One individual may simultaneously transmit two transactions, sending similar coins to two distinct parties on the network.
Bitcoin settles this difficulty and ensures agreement of rights by keeping up a community ledger of all transactions, called the blockchain. New transactions are grouped mutually and are checked against the existing record to make sure all new communications are valid. Bitcoin's accuracy is ensured by individuals who give computation authority to its system known as miners to validate and affix transactions to a public ledger.
35
How the Bitcoin Mining Works
36
Bitcoin Transaction
A Bitcoin transaction is a
section of data that is
transmitted to the network
and, if valid, it ends up in a
block in the blockchain. The
concept of a Bitcoin
transaction is to transfer the
responsibility of an amount of
Bitcoin address.
37
Data Mining Models
Data mining uses raw data to extract information and present it uniquely.
The data mining process is usually found in the most diverse range of applications, including business intelligence studies, political model forecasting, web ranking forecasting, weather pattern model forecasting, etc.
38
What are data mining models?
A Data mining model refers to
a method that usually use to
present the information and
various ways in which they can
apply information to specific
questions and problems.
39
Types of data mining models
40
Types of data mining models
A predictive data mining model predicts data values using known results from the different data sets. Predictive modeling can not be classified as a separate discipline; it occurs in all organizations or industries across all disciplines.
The main objective of predictive data mining models is to predict the future based on past data, generally but not always on statistical modeling. Predictive modeling is used in healthcare industries to identify high-risk patients with congestive heart failure, high blood pressure, diabetes, infection, cancer, etc. It is also used in the vehicle insurance company to assign the risk of accidents to the policyholder.
Some text here about the topic of discussion
41
Types of data mining models
A descriptive model differentiates the patterns and relationships in data. A descriptive model does not attempt to generalize to a statistical population or random process. A predictive model attempts to generalize to a population or random process. Predictive models should give prediction intervals and must be cross-validated; that is, they must prove that they can be used to make predictions with data that was not used in constructing the model. Descriptive analytics focuses on the summarization and conversion of the data into useful information for reporting and monitoring.
Some text here about the topic of discussion
42
Data Mining Models
43
Predictive data mining models
01Classification
03
Prediction
02Regression
04Time Series Analysis
44
Predictive data mining models
Classification: In data mining, classification refers to a form of data analysis where a machine learning model assigns a specific category to a new observation. It is based on what the model has learned from the data sets. In other words, classification is the act of assigning objects to many predefined categories. One example of classification in the banking and financial services industry is identifying whether transactions are fraudulent or not. In the same way, machine learning can also be used to predict whether a loan application would be approved or not.
45
Predictive data mining models
Regression: Regression refers to a method that verifies the value of data for a function. Generally, it is used for appropriate data. A linear regression model in the context of machine learning or statistics is basically a linear approach for modeling the relationships between the dependent variable known as the result and your independent variable is known as features.
If your model has only one independent variable, it is called simple linear regression, and else it is called multiple linear regression.
46
Predictive data mining models
Prediction: In data mining, prediction is used to identify data value based on the description of another corresponding data value. The prediction in data mining is known as Numeric Prediction. Generally, regression analysis is used for prediction. For example, in credit card fraud detection, data history for a particular person's credit card usage has to be analyzed. If any abnormal pattern was detected, it should be reported as 'fraudulent action’.
47
Predictive data mining models
Time series analysis: Time series analysis refers to the data sets based on time. It serves as an independent variable to predict the dependent variable in time.
48
Descriptive data mining models
Association
Rules
Sequence
Clustering
Summarization
Descriptive Data
Mining
49
Descriptive data mining models
Clustering: Clustering is grouping a set of objects so that objects in the same group, called a cluster, are more similar than those in other groups' clusters.
Association rules: Association rules determine a causal relationship between huge sets of data objects. The way the algorithm works is that you have. For example, a list of items you purchased at the grocery store for the past six months is data, and it calculates a percentage at which items are purchased together. For example, what are the chances of you buying milk with cereal?
50
Descriptive data mining models
Sequence: Sequence refers to the discovery of useful patterns in the data is in relation to some objective of how it is interesting.
Summarization: Summarization holds a data set in more depth which is easy to understand form.
51
Multiple Choice
Systematic and logical procedure of analyzing, identifying, treating, and monitoring the risks involved in any action or process in organizations.
Risk Management
Customer Care Service
Business Intelligence
Social media analysis
52
Multiple Choice
Reduce the response time of the organizations and help to address the complaints of the customer rapidly and productively.
Risk Management
Customer Care Service
Business Intelligence
Social media analysis
53
Multiple Choice
Providing significant insights into customer behavior and trends, text mining strategies also support organizations to analyze the qualities and weaknesses of their opponent's so, giving them a competitive advantage in the market.
Risk Management
Customer Care Service
Business Intelligence
Social media analysis
54
Multiple Choice
Helps to track the online data, and there are numerous text mining tools designed particularly for performance analysis of social media sites.
Risk Management
Customer Care Service
Business Intelligence
Social media analysis
55
Multiple Choice
Refers to the process of authenticating and adding transactional records to the public ledger. The public ledge is known as the blockchain because it comprises a chain of the block.
Bitcoin mining
Bitcoin
Predictive data mining model
Descriptive model
56
Multiple Choice
Virtual money having some value, and its value is not static, it varies according to time.
Bitcoin mining
Bitcoin
Predictive data mining model
Descriptive model
57
Multiple Choice
Predicts the values of data using known results gathered from the different data sets.
Bitcoin mining
Bitcoin
Predictive data mining model
Descriptive model
58
Multiple Choice
Differentiates the patterns and relationships in data.
Bitcoin mining
Bitcoin
Predictive data mining model
Descriptive model
59
Multiple Choice
Refers to a form of data analysis where a machine learning model assigns a specific category to a new observation.
Classification
Regression
Prediction
Time series analysis
60
Multiple Choice
Refers to a method that verifies the value of data for a function.
Classification
Regression
Prediction
Time series analysis
61
Multiple Choice
Is used to identify data value based on the description of another corresponding data value.
Classification
Regression
Prediction
Time series analysis
62
Multiple Choice
Refers to the data sets based on time.
Classification
Regression
Prediction
Time series analysis
63
Multiple Choice
Is grouping a set of objects so that objects in the same group called a cluster are more similar than those in other groups clusters.
Clustering
Association rules
Sequence
Summarization
64
Multiple Choice
Determine a causal relationship between huge sets of data objects.
Clustering
Association rules
Sequence
Summarization
65
Multiple Choice
refers to the discovery of useful patterns in the data is in relation to some objective of how it is interesting.
Clustering
Association rules
Sequence
Summarization
66
Multiple Choice
Holds a data set in more depth which is easy to understand form.
Clustering
Association rules
Sequence
Summarization
67
End of Topic 4
68
THANK YOU
Chapter 4 - Text Data Mining
Prepared by: RYAN JISON DE LA GENTE, Ph.D.
Show answer
Auto Play
Slide 1 / 68
SLIDE
Similar Resources on Wayground
63 questions
MS EXCEL
Presentation
•
University
67 questions
Aula 6 DPAABM CFHP BM 25 T2
Presentation
•
KG - University
59 questions
CHAPTER 6 NEW
Presentation
•
University
62 questions
Bahasa Inggris 2 Meeting 12 dan 13
Presentation
•
University
61 questions
Introduction to Microsoft Excel
Presentation
•
KG - University
66 questions
Parts of speech_Determiners
Presentation
•
KG - University
63 questions
Topic 2 & 3 Networking Fundamentals Part I
Presentation
•
University
67 questions
M9 Masalah
Presentation
•
University
Popular Resources on Wayground
20 questions
STAAR Review Quiz #3
Quiz
•
8th Grade
20 questions
Equivalent Fractions
Quiz
•
3rd Grade
6 questions
Marshmallow Farm Quiz
Quiz
•
2nd - 5th Grade
20 questions
Main Idea and Details
Quiz
•
5th Grade
20 questions
Context Clues
Quiz
•
6th Grade
20 questions
Inferences
Quiz
•
4th Grade
19 questions
Classifying Quadrilaterals
Quiz
•
3rd Grade
12 questions
What makes Nebraska's government unique?
Quiz
•
4th - 5th Grade