09-09-2016, 02:45 PM
1454149502-doct.doc (Size: 213.5 KB / Downloads: 5)
Abstract - We do use Wikipedia for searching knowledge of objects; there exists two types of relationships in between the wiki pages- explicit and implicit relationship. Explicit relationship is represented by a single link in between two wiki pages for the nodes and implicit relationship is represented by a link structure in between the objects. In the earlier cases cohesion based methods are used for measuring the implicit relationships but it inadequate for measuring implicit relationships. In the existing system generalized maximum flow method is used for measuring the implicit relationship. But by using this we may not get quality and quantitative results. In order to overcome this drawback in the proposed system we partition the co citation into high probability and low probability links using partition algorithms and it eliminates the low probability links and can access the high probability links to reach destination in a short span of time
INTRODUCTION
in this decade for knowing about the things which we are not aware we depend on the search engine. there exists various search engines in that wikipedia is one of most popular search engine. in wikipedia the knowledge about a particular object in brought in to a single wiki page (web page) and it is updated repeatedly by different volunteers. for searching a particular object we first enter the search string in the search engine, after that we press search button and related server will display set of co citations in a wiki page. with the help of set of co citations we reach the destination node. consider for example we want to search about the object “notebook” – then we enter the sting “notebook” in the search engine. then the server displays the related co citations
DISTANCE, CONNECTIVITY, CO CITATION:
In the earlier erdos number (which was introduced by a famous mathematician PaulErdo) was used for calculating the distance. A source co citation has erdos number as 0, the next intermediate node of source co citation has erdos number as „1‟, next intermediate node has erdos number as „2‟ etc, this erdos number represents the shortest path to reach from source co citation to the destination co citation, and this shortest path is considered as the strongest relation ship. But the erdos number is inadequate to represent the implicit relationship as it does not estimate the connectivity in between two objects. The hitting time from the source co citation „A‟ to source co citation „B‟ is defined as the expected number of steps in reaching randomly from A to B. sarkae, Moore proposed THT (truncated hitting time) to calculate the average length of paths between source object to destination object. A smaller distance value represents larger similarity. This THT is also inadequate to represent connectivity between two co citations. For effectively calculating connectivity between source node A to source node B we have to remove minimum number of vertices such that no path exists from A to B. If the connectivity from A to B is large then A is having strong relationship with that of B. the connectivity value between A to B is considered as the value of maximum flow Where Vertex and Edge capacity is equal to 1. The distance estimated by maximum flow may not lead to the correct path. In order to over come this draw back Lu et al proposed a technique for calculating the strength of relationship. He calculated the distance between two nodes using a maximum flow value by setting edge capacities. How ever the maximum flow value does not change by setting edge capacities. Thus this method does not calculate distance effectively with the value of maximum flow. Instead of setting capacities we use generalized maximum flow by setting every gain value less than 1. Thus the value of maximum flow in our method decreases, if distance value becomes longer.
Co citation:
Co citation related techniques assume that two nodes have a stronger relationship if the number of nodes linked by both the two nodes is large and at the other end co occurrence is a concept by which the strength is represented by the number of nodes linking to the both objects. Google similarity distance was proposed by cilibrasi and vitanyi was regarded as a co occurrence based technique. This technique measures the strength of a relationship between two words by counting of web pages containing both the words i.e. it implicitly regards the WebPages as nodes linking to the nodes representing the two words. In a network containing information, a node linked by both nodes becomes a node linking to the both if the direction of every edge is reversed. Thus the co occurrence can be treated as the reverse of the co citation. Milan and Witten also proposed techniques for measuring relation ships in between words in Wikipedia using Wikipedia links based on co citation.co citation related techniques cannot deal with a typical implicit relationship, such as “friend of A= friend of =B= friend of C”. (A, B) and (B, C) and the relationship represents the path formed by 2 edges. In contrast the co citation related methods are in adequate for calculating implicit relationship. Moreover, co citation – related methods cannot deal with three hops (jumps) implicit relationships as already defined because these methods estimate only relationships represented by two edges as stated before. Jon and wisdom proposed simrank, it is an extension of co cited objects, and therefore it can deal with a path whose length is longer than two, although it cannot deal with implicit relationship. “A friend of „A‟ = friend of „C‟ ” similarly to co citation based method if we define all the edges as bidirectional, then simrank could measure typical implicit relationship. But we have seen that simrank computes only the strength of the relationship represented by a path constituted by an odd number of edges to be 0, even if all the edges are bidirectional. Consider simrank computes the strength of the relationship is represented by path (A, C) or (A, B1, B2, C). Such paths abandon the Wikipedia information network. Therefore simrank is inadequate for measuring relationships on Wikipedia.
COHESION:
In social network analysis, cohesion based methods are used to measure the strength of relationship by counting all paths between two objects. Hubbel and Katz, Wassermann and K.faerst originally proposed co citation. But it has a property that it value increases for popular object, an object linked to one or to many objects exists. But it is a defect for measuring the strength of a relationship. PFIBF and CFEC- methods of cohesion are explained below .PFIBF- a cohesion based method was proposed by nakayama et al. PFIBF counts paths whose length is at most i>0 using ith power of the adjacency matrix of an information network. In the matrix if the ith power contains path cycle of almost (i-1). Drawback of PFIBF is that it can not differentiate a path containing cycle and path with no cycle. Consider for i>=3 we get two number of edges (a, b) and (b, a), such that PFIBF counts the path (a, b) and (a, b, b, a) is forming a cycle (a, b, a). if i<= then these exists no cycle, thus PFIBF is in adequate for measuring the implicit relationships. Next for measuring implicit relationships effective conductance was proposed by Doyle and Snell but it also faces same drawback. In order to over come the above drawback Korean et al. proposed CFEC (cycle free effective conductance) based on effective conductance. In measuring the implicit relationship CFEC does not traverse a path containing a cycle, though it won‟t count all the paths.
In the above all the cohesion based methods are in adequate for measuring implicit relationships in Wikipedia. In order to overcome the drawback generalized maximum flow based method was proposed, which supports all the 3 concepts and it does not criticize any major object in the process of measuring implicit relation ships. In the generalized maximum flow every edge e is contain gain γ (e)>0, flow value of edge e is multiplied by γ (e).consider the flow value of edge e, f (e)>=0 and capacity µ (e)>=0. F (e)<=µ(e) must follow for every edge e. in the generalized maximum flow at a greatest extent we reach source vertex to destination vertex. Value of f be the is defined as the total amount of f arriving at destination
PROBLEM STATEMENT:
Using generalized flow method we may not get quality and quantitave results. Existing system implementation steps are:
Step1: enter the object name to be searched in Wikipedia. Step2: Wikipedia displays the related wiki pages.
Step3: In the obtained wiki pages find out the source node.
Step4: after selecting source node traverse possible paths to reach destination nodes. Step5: which contains less distance and more connectivity of object path is to be selected. Step6: apply gain function on the object (selected) path.
Proposed statement:
Proposed system implementation steps are:
Step1: enter the object name to be searched in Wikipedia. Step2: Wikipedia displays the related wiki pages.
Step3: in the step 3 it displays two kinds of links.
i) High probability of links.
ii) Low probability of links.
Step4: eliminate the low probability of links.
Step5: with the high probability of links we reach destination through less number of paths. Step6: supports faster knowledge search results and it is time saving.