Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Digital Archiving and Data Mining of Historic Document
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract
The development of research methodologies
innovative technologies as described in this paper
allows real time archival data extraction and
assessment while ensuring preservation. The system
establishes a new standard for electronic preservation,
retrieval and research of manuscript and other
materials. It uses computer-assisted technology for
initial handwriting recognition of manuscript sources.
This enables a search by word throughout an entire
database. The data mining component of the system
has been designed using lexicon, features extraction,
and neural networks methods.
1. Introduction
A major deficiency of digitized historical
handwritten documents is that researchers cannot
easily search such materials. Digitized handwritten
materials are typically stored in a database and
searched by pre-defined key words used as metadata
tags associated with each file, but words in the images
are not subject to data mining or extraction. For
example, some library collections allow a reader to
search the metadata log of a vessel for the term
“wrecked owning to foul weather,” but the reader only
obtains each document that contains the keywords and
cannot ask for a summary of results over all documents
in the database, such as “how many ships were driven
to shore?” Similarly, other library collections could
contain images of historical documents where searches
in significant manuscript collections are limited to
keywords or names contained in correspondence,
leaving researchers to pore over the collection imageby-
image on those frequent occasions when a name- or
keyword-search is insufficient.
The use of innovative technologies as described in
this paper allows real time archival data extraction and
assessment while ensuring preservation. The Archival
Data Extraction, Assessment, and Preservation
(ADEAP) system establishes a new standard for
electronic preservation, retrieval and research of
manuscript and other materials. A key is the use of
computer-assisted technology for initial handwriting
recognition of manuscript sources. This enables a
search by word throughout an entire database. The
objective is not only archiving of manuscript material,
but also archiving with the capability of data
extraction. The ADEAP is an ongoing project, its
features and capabilities will be subject to research,
innovative improvements, and updates. The prototype
data extraction and archival features of the ADEAP has
been developed by the author to support selected
applications but improved versions will be reported in
the future.