INTRODUCTION
Due to the rapid growth of digital data made available in recent years, knowledge discovery and data mining have attracted a lot of attention with an imminent need to turn that data into useful information and knowledge. Many applications, such as market analysis and business management, can benefit from the use of information and knowledge drawn from a large amount of data. The discovery of knowledge can be seen as the process of non-trivial extraction of information from large databases, information that is presented implicitly in the data, previously unknown and potentially useful to users. Data mining is therefore an essential step in the process of knowledge discovery in databases. In the last decade, a significant number of data mining techniques have been presented to perform different knowledge tasks. These techniques include association rule mining, frequent element mining, sequential pattern mining, peak pattern mining, and closed pattern mining. Most of them are intended to develop efficient mining algorithms to find particular patterns within a reasonable and acceptable time frame. With a large number of patterns generated through the use of data mining methods, how to use and update these patterns is still an open research problem. In this article, we focus on developing a knowledge discovery model to effectively utilize and update the discovered patterns and apply it to the field of text mining. Text mining is the discovery of interesting knowledge in text documents. It is a challenge to find accurate knowledge (or features) in text documents to help users find what they want. At the outset, Information Retrieval (IR) provided many term-based methods to solve this challenge, such as Rocchio and the probabilistic models of rough models, BM25 and vector-based support machine (SVM) filtering models. The advantages of terminal-based methods include efficient computational performance as well as mature terms-weighting theories that have emerged over the past two decades in infrared learning communities and machines. However, terbased methods suffer from the problems of polysemy and synonymy, where polysemy means that a word has several meanings, and synonymy are multiple words that have the same meaning. The semantic meaning of many terms discovered is uncertain to respond to what users want. Over the years, people have often held the hypothesis that phrasal-based approaches might work better than terms-based phrases, since phrases may carry more "semantics" as information. This hypothesis has not gone very well in the history of IR. Although phrases are less ambiguous and more discriminatory than individual terms, likely reasons for discouraging performance include:
1) phrases have statistical properties lower than terms,
2) have low frequency of occurrence, and
3) there are a large number Redundant words and noisy phrases between them