Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: OPTIMIZATION BASED AGGREGATION FOR RANKING FRAUD DETECTION IN MOBILE APPLICATIONS
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
[attachment=70759]



ABSTRACT


Ranking fraud in the mobile application market refers to the fraudulent activities that are performed to raise the applications’ position in the popularity list. Due to fraudulent activities the users cannot judge the difference between fake reviews and the real ones. To overcome this difficulty a system for ranking fraud in mobile Apps has been proposed. Historic ranking of records, ratings and reviews have been collected for real world App data. The ranking fraud detection system for mobile applications accurately locates the ranking fraud. The leading session may be viewed as the time range when the App gains popularity. The three types of evidences that are used include ranking based evidences, rating based evidences, and review based evidences. Optimization based aggregation method has been employed to integrate all these evidences. The behaviour of these evidences has been investigated and evaluated with real world App data.



INTRODUCTION

This session presents an overview of data mining, its various types and areas of application. It also presents the scope of data mining to detect the ranking fraud
in the mobile applications from their historic data.Data mining is the analysis step of the Knowledge Discovery in Databases, or KDD, an interdisciplinary subfield of computers. It is the computational process of discovering patterns in large data involving methods at the intersection of various domains of computer science like computer vision, artificial intelligence, machine learning, statistics, and database systems. The objective of the data mining process is to extract information from a data set and transform it into an understandable structure. Data mining is also known as sorting through data to identify patterns and establish relationships. Data mining combines data analysis techniques with high-end technology for use within a process. The primary goal of data mining is to develop usable knowledge regarding future events. The key properties of data mining include:Automatic discovery of patterns,Prediction of likely outcomes,Creation of actionable information,Focus on large data sets and databases.There are various types of data mining depending on where the mining technique is applied. They are text mining, spatial data mining, web mining, sequence mining.The KDD process is commonly defined with the stages: Selection, Pre-processing, Transformation,Data Mining,Interpretation/Evaluation.The overall goal of the data mining process is to obtain information from a data set. Data mining is the analysis step of KDD. It involves six common classes of tasks namely Anomaly detection, Association rule learning, Clustering, Classification,


Regression and Summarization. The verification of the patterns produced by the data mining algorithm makes it possible to discover knowledge from data.
Fraud detection is a technique of identifying prohibited acts that occur all around the world. It defines the skilled impostor, formalizes the key forms and sub forms of recognized frauds and reveals the gathered data nature. Fraud detection is the identification of symptoms of fraud where no previous disbelief exists.Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. It is essential as it has become very frequent for App developers to use shady means, such as inflating their Apps' sales or posting phony App ratings, to commit ranking fraud to increase the number of downloads and revenue.
Many mobile app stores launch daily app leader board which shows the chart ranking of popular apps. The leader board is important for promoting apps. The grade level of original apps decreases due to the fraudulent mobile apps. Higher rank on the leader board leads to huge number of downloads and the app developers get huge profits. Therefore it is essential to detect the fraudulent apps. The main objective of the system is to find the fraudulent apps in the leader board.The ranking based evidences have been analyzed to detect the manipulation of app ranks in the leader board.The rating based evidences inspect the data to obtain the correct rating for a given application.The reviews are examined to determine the user’s views about the given app.Aggregation of the ranking, rating and review based evidences has been performed to detect the fraudulent ranking of mobile applications.

EXTRACTING EVIDENCES FOR RANKING FRAUD DETECTION
Growing technology has lead to enormous growth in the App development market. With over a million Apps available in the App store, it is tricky to find out the best App. Using accurate rating navigation across the App store with more confidence is possible. The various fraudulent activities are also performed for marketing these Apps to increase revenues and popularity. These give raise to fake chart rankings which provide inaccurate information to the user. Therefore, a system to rank the fraud in mobile Apps has been proposed.
The system takes historic record of mobile Apps as input. Data has been collected from real world for the top 100 Apps. Pre-processing has been performed to remove the unnecessary data. Then validation has been performed based on the ranking evidences to ensure the validation of data set. Ranking based evidences have been analyzed. The fraudulent Apps have been discovered by the aggregation of the evidences. Thus, the system detects the fraudulent Apps from the list of Apps.
Table 3.1 presents the data set description. The data of the top 100 apps have been collected for three categories of applications namely top free, top paid and top gross.



RELATED WORK:
This session presents a brief review of the related works of this project.The related works include the detection of web spam, spam in online reviews,fraudulence detection in taxi driving pattern and the problems in fraud detection.
Ntoulas et al [14] studied the various aspects of content-based spam on the Web and presented a number of heuristic methods for detecting content based spam. A study on the various aspects of content-based spam on the web using a real-world data set from the MSN Search crawler was proposed. An investigation on web spam: the injection of artificially-created pages into the web to influence the results from search engines, to drive traffic to certain pages for fun or profit was done.It is based on the technique of automatic detection of spam pages which comprises a variety of methods like keyword stuffing, analysis of the number of words in the page title, n-gram likelihood for detecting spam. Each method was proposed to be highly parallelizable as it ran in time proportional to the size of the page. Every method identified spam by analyzing the content of the downloaded pages. The experiment was performed on a subset of a crawl by MSN Search. It demonstrated the relative merits of every method. A method on how to employ machine learning techniques to create a highly efficient and reasonably-accurate spam detection algorithm was employed.An examination was done on the effectiveness of this technique in isolation and when it was aggregated using classification algorithms. It was observed that the heuristic correctly identified 86.2% of the spam pages and misidentified 13.8% of the spam and 3.1% of the non-spam pages.This spam detection method proved to be efficient than the previous methods but when used in isolation this method did not identify all the spam pages.
Zhou et al [15] studied the problem of unsupervised Web ranking spam detection. Specifically, they proposed Mobile App Classification with Enriched Contextual Information. The study was based on the use of mobile Apps and it emphasised on the key role of user preference understanding. The user preferences provide opportunities for understanding intelligent personalized context-based services. It had been found that the key step for the mobile App usage analysis was to classify Apps into some predefined categories. However, it had been a non trivial task to effectively classify mobile Apps due to the limited contextual information available for the analysis. For instance, there was limited information about mobile Apps in their names. Thus, contextual information is usually incomplete and ambiguous.An approach for first enriching the contextual information of mobile Apps by exploiting the additional web knowledge from the web search engine had been proposed. The contextual features for mobile Apps from the context-rich device logs of mobile users were extracted by the observation that different types of mobile Apps may be relevant to different real-world contexts. The enriched contextual information was combined into the Maximum Entropy model for training a mobile App classifier.Extensive experiments on 443 mobile users’ device logs were used to show both the effectiveness and efficiency of the approach. The experimental results clearly showed that the approach outperformed two state-of-the-art benchmark methods with a significant margin. It is an efficient online link spam and term spam detection method using spamicity. It is both efficient and effective approach for solving the problem of automatic App classification. But,it cannot be embedded into mobile devices. As different users have different App usage and so behaviours integration of personal preferences into contextual feature extraction is unexplored.
Lim et al [10] proposed a method for detecting Spammers and Spam nets in the Linkedin social network. A manual dataset of real Linkedin users was constructed. Classification was performed to find the Spammers and legitimate users. The method for detecting Linkedin Spammers consisted of a set of new heuristics which used a kNN classifier. A method for detecting Spam nets (fake companies) in Linkedin was also put forth based on a set of new heuristics together with the use of machine learning.Different classification techniques like decision trees, techniques based on rules, neural networks and kNN was proposed to detect Spam nets in Linkedin. It focused on the idea that the fake profiles of a fake company, usually shared similarities that allowed differentiation between legitimate companies and fake companies. This method calculated the similarity between different profiles of the companies using several distance functions. The similarity values obtained were used as thresholds to detect fake companies (Spam nets) among the legitimate companies.It was found that the proposed methods were very effective. A F-Measure of 0.971 and an AUC close to 1 in the detection of Spammer profile were obtained.The heuristics proposed by this method are adequate to detect Spammer profiles.The proposed method performs very well to detect Spam nets (fake companies) in Linkedin.This method does not detect Spam nets effectively in other social networks.
D.M.Blei et al[2] proposed Latent Dirichlet Allocation (LDA), a generative probabilistic model for collections of discrete data such as text amount. Basically it is a three level hierarchical Bayesian model in which each element of a group was demonstrated as a finite mixture over a fundamental set of topics. Each topic was demonstrated as an infinite mixture over fundamental set of topic probabilities. With the reference of text modelling, the topic probabilities provided an open representation of a document.The problem of modelling text corpora and other collections of discrete data was addressed. The objective was to find short descriptions of the members of a collection that enabled efficient processing of large collections while preserving the essential statistical relationships that were useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevance.A flexible generative probabilistic model-LDA for collections of discrete data was described. LDA was based on a simple exchangeability assumption for the words and topics in a document; it was therefore realized by a straightforward application of de Finetti’s representation theorem. LDA was viewed as a dimensionality reduction technique. A simple convexity-based variation approach for inference was put forth. It was observed that it is possible to achieve higher accuracy by dispensing with the requirement of maintaining a bound.It is an efficient approximation inference technique presented based on various methods and it improves performance.The results of classification compares only a collection of unigrams and probabilistic LSI model.
Y. Ge, et al[12] illustrated that the growth in the field of GPS tracking technology has allowed the users to install GPS tracking devices in taxies to gather huge amount of GPS traces for some time period. These traces by GPS have offered an unparalleled opportunity to uncover taxi driving fraud traces. A fraud detection system was proposed to identify the taxi driving fraud. First, two sort of function were uncovered i.e. travel route evidence and driving distance evidence. Even a third function was developed to combine the previous functions based on Dempster - Shafer theory. First identification of interesting locations was done from tremendous amount of taxi GPS logs and then a parameter free method was proposed to extract the travel route evidences. Secondly, concept of route mark was developed to illustrate the driving path between locations and based on those mark, a specific model was characterized for the distribution of driving distance and to discover the driving distance evidences.The system utilized the speed information to design a system called the speed based fraud detection system to model taxi behaviours and detect taxi fraud. The method was found to be robust to location errors and independent of the map information and road networks.Parameter free method is proposed to extract the travel route evidences.It is not applicable for real world taxi driving fraud detection system with a large scale taxi GPS logs.The problem with mobile application ranking has been stated. The overview of the system and the description of the various modules have been presented.
CONCLUSION AND FUTURE WORK
A system has been designed to detect the fraudulency in the ranking of mobile application. It is designed to gain information about the accurate ranking of mobile apps which would help to navigate across the app store with confidence. To do so, the data set has been collected from the leader board chart and pre-processing has been performed. Validation of the ranking based evidences has been done. Fraud detection by using the ranking based evidences has been implemented. The rating and review based evidences would be gathered for the real world apps and the three evidences would be aggregated to detect the fraudulent mobile app ratings.