EFFICIENT SEMISUPERVISED MEDLINE DOCUMENT CLUSTERING WITH MESH-SEMANTIC AND GLOBAL-CO

**project maker** · 21-07-2014, 10:20 AM

EFFICIENT SEMISUPERVISED MEDLINE DOCUMENT CLUSTERING WITH MESH-SEMANTIC AND GLOBAL-CONTENT CONSTRAINTS

.docx

EFFICIENT SEMISUPERVISED.docx (Size: 259.21 KB / Downloads: 9)

Introduction

To start with we focus on the most searching biomedical text that makes use of clustering of Biomedical Documents. For clustering biomedical documents, we can consider three different types of information: the local-content (LC) information from documents, the global-content (GC) information from the whole MEDLINE collections, and the medical subject heading (MeSH) semantic (MS) information. Recently, the performance of MEDLINE document clustering has been enhanced by linearly combining both the LC and MS information. However,
the simple linear combination could be ineffective because of the limitation of the representation space for combining different types of information (similarities) with different reliability. To overcome the limitation, we propose a new semisupervised spectral clustering method, i.e., SSNCut, for clustering over the LC similarities, with two types of constraints: must-link (ML) constraints on document pairs with high MS (or GC) similarities and cannot-link (CL) constraints on those with low similarities. Experimental results show that SSNCut outperformed a linear combination method and several well-known semisupervised clustering methods, being statistically significant.

Scope of the Project:

In our project we have gone for alternative methods where user can search BioMedical text by improving the performance. When user will search any text it has to follow online databases. For searching Biomedical text user can get documents from PubMed,Medline,PMC,Mesh,etc. Those database contain bulk amount of data.Hence retriving documents from their database makes the performance slow.We can provide option where to get documents,either from online databases or from our local database.We will make clustering of all our database documents and can get documents from different clusters.

Scalable Clustering Algorithms with Balancing Constraints
Author: Arindam Banerjee, A Joydeep Ghosh
Year: 2010
In this paper, They propose a general framework for scalable, balanced clustering. The data clustering process is broken down into three steps: sampling of a small representative subset of the points, clustering of the sampled data, and populating the initial clusters with the remaining data followed by refinements. Basic two steps done here as ,1. Populate: First, the points that were not sampled, and hence do not currently belong to any cluster, are assigned to the existing clusters in a manner that satisfies the balancing constraints while ensuring good quality clusters.
2.Refine: Iterative refinements are done to improve on the clustering objective function while
satisfying the balancing constraints all along. Hence, part 1 gives a reasonably good feasible solution, i.e, a clustering in which the balancing constraints are satisfied. Part 2 iteratively refines the solution while always remaining in the feasible space.

Class Diagram:

A class diagram in the UML is a type of static structure diagram that describes the structure of a system by showing the system’s classes, their attributes, and the relationships between the classes.
Private visibility hides information from anything outside the class partition. Public visibility allows all other classes to view the marked information.
Protected visibility allows child classes to access information they inherited from a parent class.

Object Diagram:

An object diagram in the Unified Modeling Language (UML) is a diagram that shows a complete or partial view of the structure of a modeled system at a specific time.An Object diagram focuses on some particular set of object instances and attributes, and the links between the instances. A correlated set of object diagrams provides insight into how an arbitrary view of a system is expected to evolve over time.Object diagrams are more concrete than class diagrams, and are often used to provide examples, or act as test cases for the class diagrams. Only those aspects of a model that are of current interest need be shown on an object diagram.

Activity Diagram:

Activity diagram are a loosely defined diagram to show workflows of stepwise activities and actions, with support for choice, iteration and concurrency. UML, activity diagrams can be used to describe the business and operational step-by-step workflows of components in a system. UML activity diagrams could potentially model the internal logic of a complex operation. In many ways UML activity diagrams are the object-oriented equivalent of flow charts and data flow diagrams (DFDs) from structural development.

Sequence Diagram:

A sequence diagram in UML is a kind of interaction diagram that shows how processes operate with one another and in what order.
It is a construct of a message sequence chart. Sequence diagrams are sometimes called Event-trace diagrams, event scenarios, and timing diagrams.
The below diagram shows the sequence flow of the Compression of View on Anonymous Networks Folded View

Component Diagram:

Components are wired together by using an assembly connector to connect the required interface of one component with the provided interface of another component. This illustrates the service consumer - service provider relationship between the two components. An assembly connector is a "connector between two components that defines that one component provides the services that another component requires. An assembly connector is a connector that is defined from a required interface or port to a provided interface or port."When using a component diagram to show the internal structure of a component, the provided and required interfaces of the encompassing component can delegate to the corresponding interfaces of the contained components.

Conclusion

We have presented a new semisupervised spectral clustering method, i.e., SSNCut, which can incorporate both ML and CL constraints, for integrating different information for document clustering. We have emphasized that our idea behind this paper is to incorporate three different types of document similarities, i.e., the LC, GC and MS similarities. SSNCut realizes this new idea, providing a more flexible framework than a method of linearly combining the three similarities.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Image Clustering and Retrieval using Image Mining Techniques REPORT	project girl	1	1,221	09-09-2017, 04:45 PM Last Post: jaseela123
	Efficient Code-based Relay Algorithm for Reliable Cognitive Radio Systems pdf	project girl	1	1,187	08-09-2017, 12:01 PM Last Post: jaseela123
	ATTRIBUTE-BASED ACCESS CONTROL WITH EFFICIENT REVOCATION IN DATA OUTSOURCING SYSTEMS	seminar tips	1	980	31-08-2017, 12:50 PM Last Post: jaseela123
	A Calculus Approach to Energy-Efficient Data Transmission With Quality-of-Service	project topics	1	159,496	28-08-2017, 01:12 PM Last Post: jaseela123
	GLOBAL EMPLOYMENT	jaseelati	0	291	25-08-2017, 09:32 PM Last Post: jaseelati
	Online Document Generator Application	jaseelati	0	243	25-08-2017, 09:32 PM Last Post: jaseelati
	Research Document Search Using ElasticSearch	mkaasees	0	225	25-08-2017, 09:32 PM Last Post: mkaasees
	Document Archive	nit_cal	0	14,172,835	25-08-2017, 09:32 PM Last Post: nit_cal
	Tracking and having the View of the Cubicle and the Person sitting in it	Electrical Fan	0	9,224,815	25-08-2017, 09:32 PM Last Post: Electrical Fan

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.