09-02-2013, 02:59 PM
A design of computer recognition system of Kazakh language text: OCR, morphotactics and morphophonemics
A design of computer .ppt (Size: 511 KB / Downloads: 32)
Objectives
Development of the Kazakh language as state language
Provision of any person easy access to text and documentation in state language
Possibility of online editing, correction, and manipulation different docs
The Kazakh Language Morphology
The Kazakh language is agglutinative one
Kazakh alphabet (based on Cyrillic) contains 33 Cyrillic characters and 9 additional symbols reflecting specific Kazakh sounds
Morphophonemics
Morphotactics (Nominal Paradigm)
Stem + Plural suffix + Possessive suffix + Case suffix + Personal ending (for nominal verb)
Morphophonemics
Vowels harmony
Consonance of syllables (back/hard, front/soft)
Consonance of sounds (combination of consonants with vowels)
Consonants harmony
Progressive assimilation
subsequent consonant has become like the preceding consonant on the syllable boundary
Regressive assimilation
subsequent sound affects to the preceding one
Morphological Module: test results
Test text on economics – 1 page, appr. 2 500 characters without blanks
OCR module processing results in 46 different errors
Morphological module processing corrects 38 from 46 errors (the level of correction = 83%)
After manual consideration just 1 error from remaining 8 errors is the actual error (another 7 errors semantically are not the errors) the level of correction = 97,8%