22-08-2012, 02:25 PM
Mathematical Handwriting Recognition with a Neural Network and Calculation
1Mathematical Handwriting.pdf (Size: 280.37 KB / Downloads: 30)
Abstract
The goal of this project was to create a software system that recognizes handwritten
mathematical expressions and computes the answer. No special syntax or formatting was to be
required for these expressions, since a major goal of this system was for users to be able to use
the system without having to learn anything new. Support was desired for algebraic expressions,
integrals, and summations. The Java programming language was chosen for this project because
of its ability to be used on a number of different operating systems and architectures without recompiling
the source code. A graphical front end was also desired in order for the system to be
more user friendly.
Introduction
Handwriting recognition is done in two different ways. The first is on-line recognition which
examines the characters as the user is drawing them. This method is the simpler of the two, since the
system only deals with one character at a time. An example of this method is character recognition on
a personal digital assistant (PDA). The second type is off-line recognition. In off-line recognition the
system must look at an entire group of characters instead of just one at a time. An example of this is
optical character recognition (OCR) software for scanners. This system will use off-line character
recognition. Once the system has broken a picture into its individual characters, a neural network will
be used to determine each individual character. Next these characters, as well as information regarding
their locations, are sent to the scanner. The scanner then rebuilds the individual characters into
numbers and also determines which symbol goes to the parser next. In some cases, the scanner must
also insert additional characters. The parser then requests one character at a time from the scanner and
calculates the expression. Finally, a pop-up is displayed with the calculated answer.
Images
In this system, images are able to be input in two different ways. In either case, images are
required to be gray scale. Support may eventually be added for non-gray scale images, but this was not
considered important for the initial version of the system. The first method of picture input is with a
bitmap file. The functionality for loading bitmap files was included for several reasons. First, since
bitmap files do not compress the picture data no external libraries were required. Thus, converting the
file into a data structure used by this system was much simpler. Second, for testing the system, it is
much easier to send it a list of bitmap images to calculate rather than using the graphical user interface
(GUI) of the system to draw test equations repeatedly. Finally, a future goal of the system is to allow
users to load pictures in from a scanner, so being able to handle image files will allow this to work
much more easily. The system currently does not support loading pictures from a scanner because
scanned images typically have a lot of noise; in tests performed, this noise caused problems when
breaking up the image into individual characters.
Neural Network
The neural network used for the recognition of individual characters is a feed-forward neural
network with four layers. The first layer contains 100 inputs, that is, one for each input pixel. The
output layer contains an output for each character that is to be recognizable by the system. Values for
each input pixel are sent into a corresponding node in the first (input) layer. For each node in the first
layer, its input value is sent to an activation function, in this case the logistic sigmoid function1. The
output of this function is sent to each node in the next layer. However, the output it is not sent directly;
each output is multiplied by some weight before going to the nodes in the next layer. Each node in the
next layer sums all of the signals it receives and sends this value to its activation function. This process
repeats until the final output vector to the network is found.
Scanner
The scanner for this project works quite differently than a scanner for a programming language
compiler. Normally, the next character in the sequence is the next character in the file; however, in this
implementation the next character is not necessarily known. When a user draws an equation, the
system breaks the image into individual characters and has the neural network recognize these
individual characters. Once each individual character is recognized and its location information is
stored, this information is sent to the scanner. The scanner turns this information into tokens which are
then sent to the parser. Numbers (0-9) and decimals (.) must be put together to form the number they
constitute. For example, if the user writes the number 10.4, the system will see each character
separately and must determine that these four individual characters make up the real number 10.4.
Also, when two adjacent terms are multiplied, for example 3x, the scanner must put a multiplication
symbol between the terms. Similarly when the system encounters a power, it must insert the ^ symbol,
so that the parser knows it has reached a power. When series or integrals are found, the system must
look for bounds instead of simply grabbing the next character. The system must also determine which
character or group of characters is the next to be sent to the parser.
Parser
The parser takes the tokens given by the scanner and calculates the result. The parser must
ensure that operations are calculated in the correct order and that operations that require numerical
methods are calculated accurately. To calculate integrals, the trapezoidal method is used. The number
of trapezoids used increases until the estimated error is below .00001 or until 100 trapezoids are used.
Since calculation of multiple integrals requires intense computation, the stopping criteria are relaxed in
order to give a timely result.