Developing proprietary OCR system

**seminar ideas** · 26-07-2012, 03:38 PM

Developing proprietary OCR system

.docx

Developing proprietary OCR system is a complicated task and requires a lot of effort.docx (Size: 18.01 KB / Downloads: 21)

Developing proprietary OCR system is a complicated task and requires a lot of effort. Such systems usually are really complicated and can hide a lot of logic behind the code. The use of artificial neural network in OCR applications can dramatically simplify the code and improve quality of recognition while achieving good performance. Another benefit of using neural network in OCR is extensibility of the system – ability to recognize more character sets than initially defined. Most of traditional OCR systems are not extensible enough. Why? Because such task as working with tens of thousands Chinese characters, for example, is not as easy as working with 68 English typed character set and it can easily bring the traditional system to its knees!
Well, the Artificial Neural Network (ANN) is a wonderful tool that can help to resolve such kind of problems. The ANN is an information-processing paradigm inspired by the way the human brain processes information. Artificial neural networks are collections of mathematical models that represent some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. The key element of ANN is topology. The ANN consists of a large number of highly interconnected processing elements (nodes) that are tied together with weighted connections (links). Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true for ANN as well. Learning typically occurs by example through training, or exposure to a set of input/output data (pattern) where the training algorithm adjusts the link weights. The link weights store the knowledge necessary to solve specific problems.

Typewritten OCR

The accurate recognition of Latin-script, typewritten text is now considered largely a solved problem.

Recognition of hand printing, cursive handwriting, and even the printed typewritten versions of some other scripts (especially those with a very large number of characters), are still the subject of active research.

Hand print OCR

Systems for recognizing hand-printed text on the fly have enjoyed commercial success in recent years. Among these are the input device for personal digital assistants such as those running Palm OS. The Apple Newton pioneered this technology. The algorithms used in these devices take advantage of the fact that the order, speed, and direction of individual lines segments at input are known. Also, the user can be retrained to use only specific letter shapes. These methods cannot be used in software that scans paper documents, so accurate recognition of hand-printed documents is still largely an open problem. Accuracy rates of 80% to 90% on neat, clean hand-printed characters can be achieved, but that accuracy rate still translates to dozens of errors per page, making the technology useful only in very limited contexts. This variety of OCR is now commonly known in the industry as "ICR" (intelligent character recognition).

Cursive OCR

Recognition of cursive text is an active area of research, with recognition rates even lower than that of hand-printed text. Higher rates of recognition of general cursive script will likely not be possible without the use of contextual or grammatical information. For example, recognizing entire words from a dictionary is easier than trying to parse individual characters from script. Reading the Amount line of a cheque (which is always a written out number) is an example where using a smaller dictionary can increase recognition rates greatly. Knowledge of the grammar of the language being scanned can also help determine if a word is likely to be a verb or a noun, for example, allowing greater accuracy. The shapes of individual cursive characters themselves simply do not contain enough information to accurately (greater than 98%) recognize all handwritten cursive script.

Music OCR

Early research into recognition of printed sheet music was performed at the graduate level in the mid 1970's at MIT and other institutions. Successive efforts were made to localize and remove musical staff lines leaving symbols to be recognized and parsed. The first commercial music-scanning product, MIDISCAN, was released in 1991. Several commercial products are now available.

MICR

One area where accuracy and speed of computer input of character information exceeds that of humans is in the area of magnetic ink character recognition, where the error rates range around one read error for every 20,000 to 30,000 checks.

Other research areas

A particularly difficult problem for computers and humans is that of old church baptismal and marriage records containing mostly names. The pages may be damaged by age, water or fire and the names may be obsolete or contain rare spellings. Another research area is cooperative approaches, where computers assist humans and vice-versa. Computer image processing techniques can assist humans in reading extremely difficult texts such as the Archimedes Palimpsest or the Dead Sea Scrolls.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Developing your first enterprise beans	computer science crazy	0	11,965,887	25-08-2017, 09:32 PM Last Post: computer science crazy
	DEVELOPING A LEARNING ORGANIZATION	project maker	0	383	21-07-2014, 10:35 AM Last Post: project maker
	DAKNET: Rethinking Connectivity in Developing Nations	project maker	0	568	07-07-2014, 01:52 PM Last Post: project maker
	An OCR Free Method for Word Spotting in Printed Documents: the Evaluation pdf	seminar projects maker	0	345	18-03-2014, 05:01 PM Last Post: seminar projects maker
	An OCR Free Method for Word Spotting in Printed Documents pdf	seminar projects maker	0	522	25-09-2013, 04:46 PM Last Post: seminar projects maker
	Green Cloud Computing in Developing Regions pdf	study tips	0	1,011	12-08-2013, 04:59 PM Last Post: study tips
	Leveraging XML Technologies in Developing Program Analysis Tools Report	study tips	0	615	31-05-2013, 04:04 PM Last Post: study tips
	OCR(optical character recognition) ppt	study tips	0	855	12-04-2013, 03:02 PM Last Post: study tips
	A design of computer recognition system of Kazakh language text: OCR, morphotactics	study tips	0	522	09-02-2013, 02:59 PM Last Post: study tips
	Automatic Geospatial Web Service Composition for Developing a Routing System pdf	project girl	0	609	21-01-2013, 12:45 PM Last Post: project girl

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.