|
Development of an auto-summarization tool
Auto-summarization is a technique used to generate summaries of
electronic documents. This has some applications like summarizing the
search-engine results, providing briefs of big documents that do not
have an abstract etc. There are two categories of summarizers,
linguistic and statistical. Linguistic summarizers use knowledge about
the languange (syntax/semantics/usage etc) to summarize a document.
Statistical ones operate by finding the important sentences using
statistical methods (like frequency of a particular word etc).
Statistical summarizers normally do not use any linguistic information.
In
this project, an auto-summarization tool is developed using statistical
techniques. The techniques involve finding the frequency of words,
scoring the sentences, ranking the sentences etc. The summary is
obtained by selecting a particular number of sentences (specified by the
user) from the top of the list. It operates on a single document (but
can be made to work on multiple documents by choosing proper algorithms
for integration) and provides a summary of the document. The size of the
summary can be specified by the user when invoking the tool.
Pre-processing interfaces are there to handle the following document
types: Plain Text, HTML, Word Document.
|