17-04-2013, 04:24 PM
Service-Oriented Architecture for High-Dimensional Private Data Mash up
Service-Oriented.docx (Size: 73.54 KB / Downloads: 31)
Abstract:
Mash up is a web technology that allows different service providers to flexibly integrate their expertise and to deliver highly customize services to their customers. Data mash up is a special type of mash up application that aims at integrating data from multiple data providers depending on the user’s request. However, integrating data from multiple sources brings about
Three challenges: simply joining multiple private data sets together would reveal the sensitive information to the other data providers. The integrated (mash up) data could potentially sharpen the identification of individuals and, therefore, reveal their person-specific sensitive information that was not available before the mash up. The mash up data from multiple sources often contains many data attributes. When enforcing a traditional privacy model, such as K-anonymity, the high-dimensional data would suffer from the problem known as the curse of high dimensionality, resulting in useless data for further data analysis. In this paper, we study and resolve a privacy problem in a real-life mash up application for the online advertising industry in social networks, and propose a service-oriented architecture along with a privacy-preserving data mash up algorithm to address the aforementioned challenges. Experiments on real-life data suggest that our proposed architecture and algorithm is effective for simultaneously preserving both privacy and information utility on the mash up data. To the best of our knowledge, this is the first work that integrates high-dimensional data for mash up service.
Existing System
Data mash up is a special type of mash up application that aims at integrating data from multiple data providers depending on the user’s request. However, integrating data from multiple sources brings about three challenges: Simply joining multiple private data sets together would reveal the sensitive information to the other data providers. The integrated (mash up) data could potentially sharpen the identification of individuals and, therefore, reveal their person-specific sensitive information that was not available before the mash up. The mash up data from multiple sources often contains many data attributes.
Proposed System
A new privacy problem through collaboration with the social networks industry is identified and generalizes the industry’s requirements to formulate the privacy-preserving high-dimensional data mash up problem. Service-oriented architecture for privacy-preserving data mash up in order to securely integrate private data from multiple parties. The study about the privacy threats caused by data mash up and proposes a service-oriented architecture and a privacy preserving data mash up algorithm to securely integrate person-specific sensitive data from different data providers, wherein the integrated data still retains the essential information for supporting general data exploration or a specific data mining task.
Privacy Measure:
Mash up coordinator notifies all contributing data providers with the session identifier. All prospective data providers share a common session context that represents a stateful presentation of information related to a specific execution of the privacy-preserving mash up called PHDMashup. An established session context contains several attributes to identify a PHDMashup process, including the data recipient’s address; the data providers’ addresses and certificates; an authentication token that contains the data recipient’s certificate; and a unique session identifier that uses an end-point reference composed of the service address, a PHDMashup process identifier and runtime status information about the executed.
Anonymous Mash up data:
The mash up data from multiple data providers usually contains many attributes. Enforcing traditional privacy models on high-dimensional data would result in significant information loss. As the number of attributes increases, more generalization is required in order to achieve K-anonymity even if K is small, thereby resulting in data useless for further analysis.
Raw Data Method:
A special type of mash up application that aims at integrating data from multiple data providers depending on the service request from a user. An information service request could be a general count statistic task or a sophisticated data mining task such as classification analysis. Upon receiving a service request, the data mash up web application dynamically determines the data providers, collects information from them through their web service interface, and then in-
tegrates the collected information to fulfill the service request. Further computation and visualization can be performed at the user’s site or on the web application server. This is very different from a traditional web portal that simply divides a web page or a website into independent sections for displaying information from different sources.
Privacy-Preserving High-Dimensional Data Mash up:
The objective of Phase II is to integrate the high dimensional data from multiple data providers such that the final mash up data satisfies a given requirement and preserves as much information as possible for the specified information requirement. Recall that specifies three requirements. Requirements specify the properties of the final mash up data. Requirement states that no data provider should learn more detailed information than the final mash up data during the process of integration. To satisfy requirement we propose a top-down specialization approach called Privacy-preserving High-dimensional Data Mash up. The present an overview of the algorithm followed by the details of each step.