18-03-2014, 11:35 AM
Web-Content Mining
Web-Content Mining.pptx (Size: 1.84 MB / Downloads: 29)
Specifies
The WWW is huge, widely distributed, global information service centre for
Information services: news, advertisements, consumer information, financial management, education, government, e-commerce, etc.
Hyper-link information
Access and usage information
WWW provides rich sources of data for data mining
The Web: Opportunities & Challenges
The amount of information on the Web is huge
The coverage of Web information is very wide and diverse
Information/data of almost all types exist on the Web
Much of the Web information is
semi-structured
Much of the Web information is linked
Much of the Web information is redundant
What is Web Data ?
Web data is
Web content –text,image,records,etc.
Web structure –hyperlinks,tags,etc.
Web usage –http logs,app server logs,etc.
Intra-page structures
Inter-page structures
Supplemental data
Profiles
Registration information
Cookies
Web Mining
Web Mining is the use of the data mining techniques to automatically discover and extract information from web documents/services
Web mining is the application of data mining techniques to find interesting and potentially useful knowledge from web data
Web mining is the application of data mining techniques to extract knowledge from web data, including web documents, hyperlinks between documents, usage logs of web sites, etc.
Why is Web Mining Different?
The Web is a huge collection of documents except for
Hyper-link information
Access and usage information
The Web is very dynamic
New pages are constantly being generated
Challenge: Develop new Web mining algorithms and adapt traditional data mining algorithms to
Exploit hyper-links and access patterns
Be incremental
Web Mining Applications
E-commerce (Infrastructure)
Generate user profiles
Targetted advertizing
Fraud
Similar image retrieval
Information retrieval (Search) on the Web
Automated generation of topic hierarchies
Web knowledge bases
Extraction of schema for XML documents
Network Management
Performance management
Fault management
User Profiling
Important for improving customization
Provide users with pages, advertisements of interest
Example profiles: on-line trader, on-line shopper
Generate user profiles based on their access patterns
Cluster users based on frequently accessed URLs
Use classifier to generate a profile for each cluster
Engage technologies
Tracks web traffic to create anonymous user profiles of Web surfers
Has profiles for more than 35 million anonymous users