Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Incremental Learning of Chunk Data for Online Pattern Classification Systems
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Incremental Learning of Chunk Data for Online Pattern Classification Systems


[attachment=34926]

ABSTRACT

This paper presents a pattern classification system in which feature extraction and classifier learning are simultaneously carried out not only online but also in one pass where training samples are presented only once. For this purpose, we have extended incremental principal component analysis (IPCA) and some classifier models were effectively combined with it. However, there was a drawback in this approach that training samples must be learned one by one due to the limitation of IPCA. To overcome this problem, we propose another extension of IPCA called chunk IPCA in which a chunk of training samples is processed at a time. In the experiments, we evaluate the classification performance for several large-scale data sets to discuss the scalability of chunk IPCA under one-pass incremental learning environments. The experimental results suggest that chunk IPCA can reduce the training time effectively as compared with IPCA unless the number of input attributes is too large. We study the influence of the size of initial training data and the size of given chunk data on classification accuracy and learning time. We also show that chunk IPCA can obtain major eigenvectors with fairly good approximation.

About the company

Honeypot IT Consulting Private Limited is an internationally established software development company with offices USA and India. The Indian office and development centre is located in a state-of-art facility in the heart of Hyderabad; India. Honeypot IT is into software product development and enterprise consulting services. Honeypot IT service offerings cater to multiple industry domains and verticals, following full or partial SDLC, optimally customized to cater to specific client needs. Honeypot also provide human resources for the different areas of a software development life cycle.

Vision

Honeypot IT Solutions is an interactive and technology solution Provider. We help clients to plan their online strategy, budget their technology investments, integrate critical applications, and implement projects that achieve business objectives and improve performance. Since our inception we have focused on building dynamic, user-focused web sites, intranet's and extranets, supported by interactive marketing and e-communication campaigns.

ABOUT THE PROJECT

In Many real-world applications such as pattern recognition, data mining, and time-series prediction, we often confront difficult situations where a complete set of training samples is not given when constructing a system. In face recognition, for example, since human faces have large variations due to expressions, lighting conditions, makeup, hairstyles, and so forth, it is hard to consider all variations of face in advance. In many cases, training samples are provided only when a system misclassifies objects; hence the system is learned online to improve the classification performance. On the other hand, in many practical applications of data mining and time-series predictions, the data are generally provided little by little and the property of data source could be changed as time passes. Therefore, the learning of a system must also be conducted sequentially in an online fashion. This type of learning is called incremental learning or continuous learning, and it has recently received a great attention in many practical applications. In pattern recognition and data mining, input data often have a large set of attributes.

Literature Review

Feature Extraction

When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called features extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which over fits the training sample and generalizes poorly to new samples. Feature extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy.

Principal component analysis (PCA)

Principal component analysis (PCA) is a vector space transform often used to reduce multidimensional data sets to lower dimensions for analysis. Depending on the field of application, it is also named the discrete Karhunen-Loève transform (KLT), the Hotel ling transform or proper orthogonal decomposition (POD). PCA was invented in 1901 by Karl Pearson. Now it is mostly used as a tool in exploratory data analysis and for making predictive models. PCA involves the calculation of the eigenvalues decomposition of a data covariance matrix or singular value decomposition of a data matrix, usually after mean centering the data for each attribute. The results of a PCA are usually discussed in terms of component scores and loadings (Shaw, 2003).PCA is the simplest of the true eigenvector-based multivariate analyses. Often, its operation can be thought of as revealing the internal structure of the data in a way which best explains the variance in the data.

Semi definite Embedding (SDE) Or Maximum Variance Unfolding (MVU)

Semi definite embedding (SDE) or maximum variance unfolding (MVU) is an algorithm in computer science that uses semi definite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data. Non-linear dimensionality reduction algorithms attempt to map high-dimensional data onto a low-dimensional Euclidean vector space. Maximum variance Unfolding is a member of the manifold learning family, which also include algorithms such as iso map and locally linear embedding. In manifold learning, the input data is assumed to be sampled from a low dimensional manifold that is embedded inside of a higher dimensional vector space. The main intuition behind MVU is to exploit the local linearity of manifolds and create a mapping that preserves local neighborhoods at every point of the underlying manifold.

Multifactor dimensionality reduction (MDR):

Multifactor dimensionality reduction (MDR) is a data mining approach for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. MDR was designed specifically to identify interactions among discrete variables that influence a binary outcome and is considered a nonparametric alternative to traditional statistical methods such as logistic regression.
The basis of the MDR method is a constructive induction algorithm that converts two or more variables or attributes to a single attribute. This process of constructing a new attribute changes the representation space of the data. The end goal is to create or discover a representation that facilitates the detection of nonlinear or non additive interactions among the attributes such that prediction of the class variable is improved over that of the original representation of the data.

Nonlinear dimensionality reduction

High-dimensional data, meaning data which requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded non-linear manifold within the higher dimensional space. If the manifold is of low enough dimension then the data can be visualized in the low dimensional space. Below are summarized some important algorithms from the history of manifold learning and nonlinear dimensionality reduction. Many of these non-linear dimensionality reduction methods are related to linear methods which are listed below. The non-linear methods can be broadly classified into two groups: those which actually provide a mapping (either from the high dimensional space to the low dimensional embedding or vice versa).