10-05-2014, 10:26 AM
Association Rule Mining for Punjabi Text
Association Rule Mining.pdf (Size: 332.32 KB / Downloads: 33)
Abstract
Association Rule Mining is process of finding necessary association
rules, relationships from large database .Association Rule Mining
(ARM) has been implemented for various applications. Market
Basket Analysis is one the best example of ARM. Decision
making processes in business can be employed with the help
of ARM. This Text Mining technique can be applied on various
languages to extract association rules .This paper describes the
proposed methodology of implementing Association Rule Mining
on Punjabi Text.
Introduction
Data mining is the process of extracting important patterns from
the data and these patterns are helpful in decision making process
and Text Mining is one of the research fields in Data Mining
which works on free text. Various Text Mining techniques that
are available are [4]:-
• Information extraction
• Topic Tracking
• Summarization
• Clustering
• Categorization
• Question Answering
• Association Rule Mining
This paper is concerned only about the proposed work on one of
the text mining technique i.e. Association Rule Mining. It is the
technique by which important associations among the text are
extracted. One of the examples of ARM is Market Basket Analysis
which helps to know customer purchasing habits so as to increase
the sales of data items [11].
Association Rule Mining can also be applied on textual data to
mine the association rules. These rules can be of the form like
grammatical rules and these are used for efficient searching of
online data [5-6].
Related Work
Association Rule Mining is an active research area of text mining
which helps to fetch important associations from the text.
In [1] Zhou has extracted Association Rules from engineering
documents. This process has been divided into 2 sub processes.
First is, Document structure generation and second is document
content generation. For finding association rules Apriori algorithm
is used. So structure-structure association rules, structure-item
association rules, item-item association rules are extracted.
In [5] Nazish Asad et al. have targeted the importance of Association
Rules for URDU language. A UMM i.e. Urdu Mining Model is
proposed based on Apriori Algorithm and used to extract unique
words and phrases from URDU language.
UMM is formed by three steps namely Pre-processing phase,
creating transactional database, Applying algorithm to get rues
out of it. According to the experiments that are carried out, it
has been shown that Apriori is really effective to mine the text
database and results prove that association rules are decreased as
the number of words are increased and as the minimum support
value is increased, there is decrease in time.
In [6] Nazish Asad et al. have given a new approach for mining
URDU text. Apriori algorithm has not worked well for URDU text
so a new algorithm i.e. Transaction Hash Table Apriori is proposed
to extract strong association rules and both approaches has been
compared. THT-Apriori uses combination of Mutipass with
inverted hashing and pruning and Apriori algorithm. Hash Tables
has been used to store frequent itemsets and their frequencies.
The algorithm utilizes minimum support value to keep the strong
association rules.
Conclusion
Volume of the online documents has been dramatically increased
due to large use of Internet in day-to-day life. Association Rule
Mining i.e. technique of generating specific rules from text is one
of the important data mining task.
In this paper, a methodology has been proposed to generate
association rules out of the Punjabi Text. Much work has been
done on languages like URDU, CHINEESE for the generation of
association rules. As the Punjabi language has not been used for
ARM. So this approach will work to generate association rules.