06-10-2016, 12:22 PM
An Effective Web Server Log Analysis System Using Different Techniques in Cloud Computing
1458020884-project2.docx (Size: 142.78 KB / Downloads: 6)
Abstract:
A Provide runtime profiling, which in effect optimizes the personalization utility while respecting user’s privacy requirements allows for customization of privacy needs and does not require iterative user interaction The framework works in two phases, namely the offline and online phase, for each user. During the offline phase, a hierarchical user profile is constructed and customized with the user-specified privacy requirements. Cloud computing is the delivery of computing services over the Internet. Cloud services allow individuals and businesses to use software and hardware that are managed by third parties at remote locations. Examples of cloud services include online file storage, social networking sites, webmail, and online business applications. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).We have to implement two methods. They are, Connect Based Method and Report Based Method. An Effective generalization algorithm is used in this technique. Input Design is the process of converting a user-oriented description of the input into a computer based system. This design is important to avoid errors in the data input process and show the correct direction to the management for getting correct information from the computerized system. It is achieved by creating user-friendly screens for the data entry to handle large volume of data. The goal of designing input is to make data entry easier and to be free from errors. Thus the objective of input design is to create an input layout that is easy to follow
Introduction
Cloud Computing is the delivery of computing services over the Internet. Cloud services allow individuals and businesses to use software and hardware that are managed by third parties at remote locations. Examples of cloud services include online file storage, social networking sites, webmail, and online business applications. The cloud computing model allows access to information and computer resources from anywhere that a network connection is available. Cloud computing provides a shared pool of resources, including data storage space, networks, computer processing power, and specialized corporate and user applications.
Types:
Software as a service
Platform as a service
Infrastructure as a service
In a Private cloud, the cloud infrastructure is operated solely for a specific organization, and is managed by the organization or a third party.
In a community cloud, the service is shared by several organizations and made available only to those groups. The infrastructure may be owned and operated by the organizations or by a cloud service provider.
A hybrid cloud is a combination of different methods of resource pooling
(for example, combining public and community clouds).
Cloud Computing Overview:
A cloud computing architecture consists of a front end and a back end. They connect to each other through a network, usually the Internet. The front end is the side the computer user, or client, sees. The back end is the “cloud” section of the system.
Front end (Cloud Computing Architecture):
The front end of the cloud computing system has client’s devices (or it may be a computer
network) and some applications are needed for accessing the cloud computing system. All the
cloud computing systems do not give the same interface to users. Web services like electronic
mail programs use some existing web browsers such as Firefox, Microsoft’s internet explorer
or Apple’s Safari. Other types of systems have some unique applications which provide
network access to its clients.
Back end (Cloud Computing Architecture):
Back end have to some physical peripherals. In cloud computing, the back end is cloud itself
which may various computer machines, data storage systems and servers.
Layers of cloud computing
Software as a services
Platform as a services
Infrastructure as a services
Software as services (SaaS)
Software as a service provides a complete web application offered as a service on demand.
We can access any web applications like that web services, google mapping API, Flickr API
etc.
Platform as a service (PaaS)
To wrapped layers of software and provide services as Platform that can be used to build
higher-level services. There are at least two perspectives on
To depending on the perspective of the producer or consumer of the services:
A platform by integrating an OS, middleware, application software, and even a development environment that is then provided to a customer as a service.
To encapsulated service where applications are developed using a set of programming languages and tools provider through an API. The customer interacts with the platform through the API .
Infrastructure as a service(Iaas)
The third segment in cloud computing known as the infrastructure is the backbone of the entire concept. Infrastructure vendor’s environments such as google gears allow users to built application. Cloud storage such as amazon’s is also considered to be the part of the infrastructure segment.
Literature Survey:
To extract actual user behavior from Web server logs, capture anticipated user behavior with the help of cognitive user models and perform a comparison between the two. This deviation analysis would help us identify some navigation related usability problems. Correcting these problems would lead to better functional convenience as characterized by both better effectiveness (higher task completion rate) and efficiency (less time for given tasks). This new method would complement traditional usability practices and overcome some of the existing challenges.
Two types of logs, i.e., server-side logs and client-side logs, are commonly used for Web usage and usability analysis.Server-side logs can be automatically generated byWeb servers, with each entry corresponding to a user request. By analysing these logs,Web workload was characterized and used to suggest performance enhancements for Internet Web servers Because of the vastly unevenWeb traffic, massive user population, and diverse usage environment, coverage-based testing is insufficient to ensure the quality of Web applications . Therefore, server-side logs have been used to construct Web usage models for usage-based Web testing or to automatically generate test cases accordingly to improve test efficiency. Server logs have also been used by organizations to learn about the usability of their products. For example, search queries can be extracted from server logs to discover user information needs for usability task analysis. Data preparation techniques and algorithms can be used to process the rawWeb server logs, and then mining can be performed to discover users’ visitation patterns for further usability analysis .For example, organizations can mine server-side logs to predict users’ behavior and context to satisfy users’ need Users’ revisitiation patterns can be discovered by mining server logs to develop guidelines for browser history mechanism that can be used to reduce users’ cognitive and physical effort. To identify navigation related web usability problems based on comparing actual and anticipated usage patterns[1]. This Paper reviews the cloud computational framework Apache Hadoop ,highlights the differences and similarities Hadoop Map reduce and Apache Spark and evaluates the performance of them[2].Log files provide valuable insight to previous history of system’s usages. Using the information from a log file can help improve future access of a system.
However, Log files often contain huge amount of data which require significant amount of time to be processed[9].Cloud computing employees distributed storage and distributed computing technology to achieve a large stored data as well as fast data analysis and processing[7]. In this paper, we study the file management mechanism of large-scale cloud-based log data. With the rise of big data, there are more and more the Hadoop-based applications[5]. This huge amount of data are processed on more than 140 computing centers distributed across 34 countries. The Map Reduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications[6].The Hadoop framework provides realiable data sstorage by Hadoop distributed file system and map reduce programming model which is a parallel processing system for large data sets[4]. In this context we will also present the special requirements of handling logging systems in highly dynamic infrastructures like enterprise cloud environments, which provide dynamic systems, services and applications[3]. Web server logs stores click stream data which can be useful for mining purposes. The data is stored as a result of user’s access to a website[10].
Proposed System
Provides runtime profiling, which in effect optimizes the personalization utility while respecting user’s privacy requirements allows for customization of privacy needs and does not require iterative user interaction The framework works in two phases, namely the offline and online phase, for each user. During the offline phase, a hierarchical user profile is constructed and customized with the user-specified privacy requirements. Connect based method methods are straightforward—they simply impose bias to clicked pages in the user’s query history. Report based method methods improve the search experience with complicated user-interest models generated from user profiling techniques.
Module Description
1. User Based Profile Process
2. Hyperlink Based Personalized Web Search
3. Personalized web site
4. Recommender system
Process Description:
User Based Profile Process
The generalization process has to meet specific prerequisites to handle the user profile. This is achieved by preprocessing the user profile. At first, the process initializes the user profile by taking the indicated parent user profile into account. The process adds the inherited properties to the properties of the local user profile. Thereafter the process loads the data for the foreground and the background of the map according to the described selection in the user profile.
A process involves remote data services, which might be updated frequently, the cached generalization results might become outdated. Thus selecting a specific caching strategy requires careful analysis.
Hyperlink Based Personalized Web Search
This paper introduces an approach to personalize digital multimedia content based on user profile information. For this, two main mechanisms were developed: a profile generator that automatically creates user profiles representing the user preferences, and a content-based recommendation algorithm that estimates the user's interest in unknown content by matching her profile to metadata descriptions of the content. Both features are integrated into a personalization system.The search web field involves in the information retrieval from the structure of hyperlinked web pages like Google. This kind of engines having the following problems. (1) Allocated weight for Web page, and (2) Hyperlinked Web pages may have related contents that are not considered. The use of personalized PageRank to enable personalized Web searches was suggested as a modification.
Personalized web site
Personalized Web site is constructed using the contents that present in the web pages, the structure of the contents, the link topologies that are used in web pages. The link personalization and content personalization are the types in which the web site personalization takes place. The link personalization deals with the site URLs and the links given in those web pages. The content personalization involves in the content analyzing.
Recommender system
It has become increasingly difficult to search for useful information on the Web because the amount of information on the Web continues to grow. Therefore, we get the feeling of being overwhelmed by the number of choices. This is known as “information overload.” An approaches to reduce this overload, recommender systems have emerged in domains such as E-commerce, digital libraries, and knowledge management. These systems provide personalized suggestions based on user preferences.
Conclusion
This paper presented a client-side privacy protection framework called personalized web search. personalized web search potentially be adopted by any PWS that captures user profiles in a hierarchical taxonomy. The framework allowed users to specify customized privacy requirements via the hierarchical profiles. In addition performed online generalization on user profiles to protect the data information.
Acknowledgement
We would like to acknowledge the guidance of Dr.A.B. Arockia Christopher for her insightful support and inspiration throughout the various stages of this project. We sincerely appreciate the help and advice given by her which went a long way in helping us understanding the key concepts of this project.