An Introduction to Text Mining CAS 2009 RPM Seminar pdf

**project girl** · 24-11-2012, 01:16 PM

An Introduction to Text Mining CAS 2009 RPM Seminar

.pdf

francis.pdf (Size: 294.84 KB / Downloads: 71)

Objectives

• Present a new data mining technology
• Show how the technology uses a
combination of
• String processing functions
• Natural language processing
• Common multivariate procedures available
in statistical most statistical software
• Discuss practical issues for
implementing the methods
• Discuss software for text mining

Parsing Text

Separate words from spaces and
punctuation
Clean up
Remove redundant words
Remove words with no content
Cleaned up list of Words referred to
as tokens

Term Document Matrix/Index

Uses frequency measure for each word
instead of on-off binary indicator
“The Index representation does not do justice
to the complexity of human language but is
dictated by the practical difficulty of storing
more information objects”

Natural Language Processing

Draws on many disciplines
Artificial Intelligence
Linguistics
Statistics
Speech Recognition
Includes lexical analysis, multiword phrase
groupings, sense disambiguation, part of
speech tagging
Arguments against: it is error-prone and
output contains too much detail and nise

Consequences of Zipf

There are a few very frequent tokens or words
that add little to information
Known as stop words
Examples: a, the, to, from
Usually
Small number of very common words (i.e., stop
words)
Medium number of medium frequency words
Large number of infrequent words
The medium frequency words the most useful

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Biometrics Security System Full Download Seminar Report and Paper Presentation	computer science crazy	30	190,561,110	24-02-2021, 08:13 AM Last Post: buy cialis generic
	Ultrasonic Trapping In Capillaries For Trace-Amount Bi (Download Full Seminar Report)	Computer Science Clay	2	104,277,107	17-01-2018, 11:59 AM Last Post: dhanabhagya
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	Seminar on iPod shuffle	seminar flower	1	2,521	20-09-2017, 01:33 PM Last Post: jaseela123
	Wireless LAN Security Introduction	study tips	1	959	20-09-2017, 12:40 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.