A Multidimensional Sequence Approach to Measuring Tree Similarity

**jaseela123** · 09-09-2017, 10:19 AM

The tree is one of the most common and well-studied data structures in computing. Measuring the similarity of such structures is key to analyzing this type of data. However, measuring the similarity of trees is not trivial because of the inherent complexity of trees and the resulting large search space. Tree kernels, a state-of-the-art measurement of tree similarity, represents trees as vectors in a space of features and measures of similarity in this space. When different characteristics are used, different algorithms are required. Tree distance editing is another widely used similarity measure of trees. It measures similarity through the editing operations needed to transform one tree into another. Without any restriction on editing operations, the calculation cost is too high to be applicable to a large volume of data. To improve the efficiency of the editing distance of the tree, some approximations were introduced in the editing distance of the tree. However, its efficacy may be compromised.

Trees are represented as multidimensional sequences and their similarity is measured on the basis of their sequential representations. Multidimensional sequences have their sequential dimensions and spatial dimensions. We measure sequential similarity by measuring sequence similarity of all common sub-sequences or the longest common subsequence measure, and measure spatial similarity by dynamic time deformation. Then we combine them to give a measure of the similarity of the tree. A brute force algorithm to calculate the similarity will have a high computational cost. In the spirit of dynamic programming two efficient algorithms are designed to calculate the similarity, which have quadratic complexity of time. The new measurements are evaluated in terms of classification accuracy in two popular classifiers (k-nearest neighbor and supporting vector machine) and in terms of search effectiveness and efficiency in the search for similarity between closest neighbors, using three sets of natural language data processing and retrieval information. The experimental results show that the new measures consistently and significantly outperform the reference measures.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	A Calculus Approach to Energy-Efficient Data Transmission With Quality-of-Service	project topics	1	159,496	28-08-2017, 01:12 PM Last Post: jaseela123
	A Parallel Approach to XML Parsing	mechanical engineering crazy	0	14,755,420	25-08-2017, 09:32 PM Last Post: mechanical engineering crazy
	Coupling Based Metrics for Measuring the quality of a software	mechanical engineering crazy	0	17,222,780	25-08-2017, 09:32 PM Last Post: mechanical engineering crazy
	A PROACTIVE APPROACH TO NETWORK SECURITY	nit_cal	0	8,201,340	25-08-2017, 09:32 PM Last Post: nit_cal
	A Graphbased Geometric Approach to Contour Extraction from Noisy Binary Images	mkaasees	0	212	08-11-2016, 11:05 AM Last Post: mkaasees
	Cuckoo search algorithm: a metaheuristic approach to solve structural optimization pr	mkaasees	0	227	31-10-2016, 03:59 PM Last Post: mkaasees
	A Scalable Data Science Workflow Approach for Big Data Bayesian Network Learning	mkaasees	0	183	17-10-2016, 02:35 PM Last Post: mkaasees
	Benefiting from big data A new approach for the telecom industry	mkaasees	0	177	15-10-2016, 12:34 PM Last Post: mkaasees
	A Nanotechnology-based Approach to Data Storage	mkaasees	0	140	07-10-2016, 12:16 PM Last Post: mkaasees
	A LOG BASED APPROACH TO MAKE DIGITAL FORENSICS EASIER ON CLOUD COMPUTING	mkaasees	0	127	06-10-2016, 04:11 PM Last Post: mkaasees

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.