EFFICIENT MULTI-DIMENSIONAL FUZZY SEARCH FOR PERSONAL INFORMATION MANAGEMENT REPORT

**study tips** · 30-07-2013, 02:14 PM

EFFICIENT MULTI-DIMENSIONAL FUZZY SEARCH FOR PERSONAL INFORMATION MANAGEMENT SYSTEMS

.doc

EFFICIENT MULTI-DIMENSIONAL.doc (Size: 52 KB / Downloads: 42)

Abstract

With the explosion in the amount of semi-structured data users access and store in personal information management systems, there is a critical need for powerful search tools to retrieve often very heterogeneous data in a simple and efficient way..We propose a novel multi-dimensional search approach that allows users to perform fuzzy searches for structure and metadata conditions in addition to keyword conditions. Our techniques individually score each dimension and integrate the three dimension scores into a meaningful unified score. We also design indexes and algorithms to efficiently identify the most relevant files that match multi-dimensional queries. We perform a thorough experimental evaluation of our approach and show that our relaxation and scoring framework for fuzzy query conditions in noncontent dimensions can significantly improve ranking accuracy. We also show that our query processing strategies perform and scale well, making our fuzzy search approach practical for every day usage.

EXISTING SYSTEM

filtering conditions. We adapt an existing algorithm called the Threshold Algorithm (TA) to drive query processing. TA uses a threshold condition to avoid evaluating all possible matches to a query, focusing instead on identifying the k best answers. It takes as input several sorted lists, each containing the system’s objects (files in our scenario) sorted in descending order according to their relevance scores for a particular attribute (dimension in our scenario), and This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Dynamically accesses the sorted lists until the threshold condition is met to find the k best answers.

DISADVANTAGE

Existing tools typically support some IR-style ranking on the textual part of the query, but only consider structure filtering conditions.

PROPOSED SYSTEM

Propose a novel approach that allows users to efficiently perform fuzzy searches across three different dimensions: content, metadata, and structure. We describe individual IDF-based scoring approaches for each dimension and present a unified scoring framework for multi-dimensional queries over personal information file systems. We also present new data structures and index construction optimizations to make finding and scoring fuzzy matches efficient. While our work could be extended to a variety of data space applications and queries, we focus on a file search scenario in this paper. That is, we consider the granularity of the search results to be a single file in the personal information system. Of course, our techniques could be extended to a more flexible query model where pieces of data within files could be returned as results.

ADVANTAGE

We incrementally build the query dependent DAG structures at query time, only materializing those DAG nodes necessary to answer a query.
To improve sorted access efficiency, we propose techniques to skip the scoring of unneeded DAG nodes by taking advantage of the containment property of the DAG.

MODULES DECRIPTION

Data set
We use a data set that contains files and directories from the working environment of one of the authors.

Scoring Content

Where Q is the content query condition, f is the file being scored, N is the total number of files, Nt is the number of files containing the term t, and Norm Length(f) is a normalizing factor that is a function of f’s length. 2 Note that relaxation is an integral part of the above formulas since they score all files that contain a subset of the terms in the query condition.

Scoring Metadata

We introduce a hierarchical relaxation approach for each type of searchable metadata to support scoring. A key characteristic of this hierarchical representation is containment; that is, the set of files matching a node must be equal to or subsume the set of files matching each of its children nodes. This ensures that the score of a file matching a more relaxed form of a query condition is always less than or equal to the score of a file matching a less relaxed

Scoring Structure

Most users use a hierarchical directory structure to organize their files. When searching for a particular file, a user may often remember some of the components of the containing directory path and their approximate ordering rather than the exact path itself. Thus, allowing for some approximation on structure query conditions is desirable because it allows users to leverage their partial memory to help the search engine locate the desired file.

CONCLUSIONS

We presented a scoring framework for multi-dimensional queries over personal information management systems. Specifically, we defined structure and metadata relaxations and proposed IDF-based scoring approaches for content, metadata, and structure query conditions. This uniformity of scoring allows individual dimension scores to be easily aggregated. We have also designed indexing structures and dynamic index construction and query processing optimizations
to support efficient evaluation of multi-dimensional queries.
We implemented and evaluated our scoring framework and query processing techniques. Our evaluation show that our multi-dimensional score aggregation technique preserves the properties of individual dimension scores and has the potential to significantly improve ranking accuracy. We also show that our indexes and optimizations are necessary to make multi-dimensional searches efficient enough for practical everyday usage. Our optimized query processing strategies exhibit good behavior across all dimensions, resulting in good overall query performance and scalability.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Biometrics Security System Full Download Seminar Report and Paper Presentation	computer science crazy	30	190,561,110	24-02-2021, 08:13 AM Last Post: buy cialis generic
	Ultrasonic Trapping In Capillaries For Trace-Amount Bi (Download Full Seminar Report)	Computer Science Clay	2	104,277,107	17-01-2018, 11:59 AM Last Post: dhanabhagya
	nanorobotics full report	project topics	24	176,551,278	16-01-2018, 05:50 PM Last Post: Guest
	robotic surgery full report	project report tiger	16	150,961,205	07-01-2018, 07:28 PM Last Post: Raymondnof
	Human Computer Interface : Seminar Report and PPT	seminar post	1	1,337	22-09-2017, 11:23 AM Last Post: jaseela123
	4G Broadband : Seminar Report and PPT	study tips	1	1,261	22-09-2017, 11:19 AM Last Post: jaseela123
	Amoeba full report	project topics	1	1,631,984	22-09-2017, 10:38 AM Last Post: jaseela123
	Itanium Processor : Seminar Report and PPT	seminar projects maker	1	1,052	21-09-2017, 12:46 PM Last Post: jaseela123
	Design and Analysis Of Algorithms : Seminar Report and PPT	seminar projects maker	1	1,315	21-09-2017, 12:04 PM Last Post: jaseela123
	Ranked, Efficient and Secure Keyword search over encrypted cloud data PPT	seminar post	1	814	21-09-2017, 11:55 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.