18-06-2012, 05:52 PM
Discovering Hot Topics From Dirty Text
Discovering Hot Topics .pptx (Size: 1.63 MB / Downloads: 36)
Introduction.
Manual approach.
Example : yahoo answers.
Automatic approach.
Technical description.
At first need to tackle different aspects of problem for that we use following technique.
Document categorization can be done manually or automatically.
Two kinds of automatic categorization
Classification technique.
Unsupervised technique by using clustering algorithm.
. Clustering:
We use numerous algorithms to find the intrinsic structure in data and organize them into meaningful groups.
Example: we have 5 documents the clustering algorithm shows relationship as (1,3),(1,4),(3,4)
By using clustering we group related data at a single place which is easy to find the solution for customer.
Sentence identifier:
Code line remover.
Like “else if”, “{“.
Table remover.
Remove rows and columns and extract data.
Sentence boundary identifier.
Blank line to indicate end of sentence.