Text Mining pdf

**project girl** · 30-11-2012, 04:37 PM

Text Mining

.pdf

TextMining.pdf (Size: 703.68 KB / Downloads: 167)

What is Text Mining?

•The discovery by computer of new, previously unknown information, by automatically extracting information from a usually large amount of different unstructured textual resources.
•What does previously unknown mean?
–Implies discovering genuinely new information.
–Hearst’s analogy: Discovering new knowledge vs. merely finding patterns is like the difference between a detective following clues to find the criminal vs. analysts looking at crime statistics to assess overall trends in car theft.
•What about unstructured?
–Free naturally occurring text.
–As opposed to HTML,XML, …

Document Clustering

•Large volume of textual data
–Billions of documents must be handled in an efficient manner.
•No clear picture of what documents suit the application.
•Solution: use Document Clustering (Unsupervised Learning).
•Most popular Document Clustering methods are:
–K-Means clustering.
–Agglomerative hierarchical clustering.

Text Characteristics

•Several input modes
–Text is intended for different consumers, i.e. different languages (human consumers) and different formats (automated consumers).
•Dependency
–Words and phrases create context for each other.

Text Processing again

•Semantic Structures:
–Two methods:
•Full parsing: Produces a parse tree for a sentence.
•Chunking with partial parsing: Produces syntactic constructs like Noun Phrases and Verb Groups for a sentence.
–Which is better?
•Producing a full parse tree often fails due to grammatical inaccuracies, novel words, bad tokenization, wrong sentence splits, errors in POS tagging, …
•Hence, chunking and partial parsing is more commonly used.

Data Mining

•At this point the Text mining process merges with the traditional Data Mining process.
•Classic Data Mining techniques are used on the structured database that resulted from the previous stages.
•This is a purely application-dependent stage.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.