29-11-2012, 01:46 PM
Ontology Abstract:
Ontology.docx (Size: 13.28 KB / Downloads: 22)
Abstract:
Existing approaches to data extraction include wrapper induction and automated methods. In this project, an instance-based learning method, which performs extraction by comparing each new instance to be extracted with labeled instances is studied. The key advantage of their method is that it does not require an initial set of labeled pages to learn extraction rules as in wrapper induction. Instead, the algorithm is able to start extraction from a single labeled instance. Only when a new instance cannot be extracted does it need labeling. This avoids unnecessary page labeling, which solves a major problem with inductive learning (or wrapper induction), i.e., the set of labeled instances may not be representative of all other instances.
The instance-based approach is very natural because structured data on the Web usually follow some fixed templates. Pages of the same template usually can be extracted based on a single page instance of the template. This novel technique match a new instance with a manually labeled instance and in the process to extract the required data items from the new instance. The system provides a domain-specific search utility, which can access and collect data from the deep web.