14-09-2012, 04:42 PM
Development of an auto-summarization tool
Objective/Vision
This has some applications like summarizing the search-engine results, providing briefs of big documents that do not have an abstract etc. There are two categories of summarizers, linguistic and statistical. Linguistic summarizers use knowledge about the language to summarize a document. Statistical ones operate by finding the important sentences using statistical methods. Statistical summarizers normally do not use any linguistic information.
User of the System
A. Used to generate summaries of electronic documents
B. Using statistical techniques
C. techniques involve finding the frequency of words, scoring the sentences, ranking the sentences
D. To handle the document types like Plain Text, HTML, Word Document
Functional Requirements
i. Study about auto-summarizing techniques (some references are given in the references section of this document) and concentrate more on summarizers based on statistical techniques
ii. Collect the list of stop-words from an Internet site
iii. Come up with algorithms for the different functional components listed in the previous section. Some heuristic methods could be used to come up with modification of any existing algorithm
iv. Implement the pre-processor/sentence separator/word separator/word frequency calculator. These do not require much work on the algorithm side and existing algorithms will do fine.
v. Implement the scoring and ranking component
vi. Test it with some documents and tune the algorithms, if needed
vii. Bench-mark your tool against some tools available on the Internet
Objective/Vision
This has some applications like summarizing the search-engine results, providing briefs of big documents that do not have an abstract etc. There are two categories of summarizers, linguistic and statistical. Linguistic summarizers use knowledge about the language to summarize a document. Statistical ones operate by finding the important sentences using statistical methods. Statistical summarizers normally do not use any linguistic information.
User of the System
A. Used to generate summaries of electronic documents
B. Using statistical techniques
C. techniques involve finding the frequency of words, scoring the sentences, ranking the sentences
D. To handle the document types like Plain Text, HTML, Word Document
Functional Requirements
i. Study about auto-summarizing techniques (some references are given in the references section of this document) and concentrate more on summarizers based on statistical techniques
ii. Collect the list of stop-words from an Internet site
iii. Come up with algorithms for the different functional components listed in the previous section. Some heuristic methods could be used to come up with modification of any existing algorithm
iv. Implement the pre-processor/sentence separator/word separator/word frequency calculator. These do not require much work on the algorithm side and existing algorithms will do fine.
v. Implement the scoring and ranking component
vi. Test it with some documents and tune the algorithms, if needed
vii. Bench-mark your tool against some tools available on the Internet