20-02-2017, 09:44 AM
In computational linguistics, word sense disambiguation (WSD) is an open problem of natural language processing and ontology. WSD is identifying which sense of a word (meaning meaning) is used in a sentence, when the word has multiple meanings. The solution to this problem affects other writings related to the computer, such as speech, improving the relevance of search engines, resolution of anaphora, coherence, inference, and so on.The human brain is quite competent in the disambiguation of the meaning of the word. The fact that natural language is formed in a way that requires so much of it is a reflection of that neurological reality. In other words, human language developed in a way that reflects (and has also helped to shape) the innate ability provided by the neural networks of the brain. In computing and information technology it allows, it has been a long-term challenge to develop the ability of computers to do natural language processing and automatic learning.
To date, a rich variety of techniques have been investigated, from dictionary-based methods that use knowledge encoded in lexical resources to supervised methods of automatic learning in which a classifier is trained for each distinct word in a corpus of manually annotated examples , To completely unsupervised methods that group the occurrences of words, thus inducing the meanings of the word. Among them, supervised learning approaches have been the most successful algorithms to date.The accuracy of current algorithms is difficult to state without a series of caveats. In English, precision at the coarse grain level (homograph) is routinely above 90%, with some methods in particular homographs achieving over 96%. In the finer-sense distinctions, the maximum accuracy from 59.1% to 69.0% has been reported in recent evaluation exercises (SemEval-2007, Senseval-2), where basal accuracy of the simplest possible algorithm to always choose the Prevalence was 51.4% and 57%, respectively.
History
WSD was first formulated as a distinct computational task during the early days of machine translation in the 1940s, making it one of the oldest problems in computational linguistics. Warren Weaver, in his famous 1949 memorandum on translation, first introduced the problem into a computational context. Early investigators well understood the importance and the difficulty of WSD. In fact, Bar-Hillel (1960) used the above example to argue that WSD could not be solved by "electronic computer" because of the general need to model all world knowledge.In the 1970s, WSD was a subtask of semantic interpretation systems developed within the field of artificial intelligence, but since WSD systems were primarily rule-based and manually coded, they were prone to a knowledge acquisition bottleneck.
In the 1980s, large-scale lexical resources, such as the Oxford Advanced Learner's Dictionary of English (OALD), were available: manual coding was replaced by the knowledge automatically extracted from these resources, but the disambiguation was still based on knowledge Or dictionaries.In the 1990s, the statistical revolution travelled through computational linguistics and WSD became a paradigm problem in which to apply machine-supervised learning techniques.
In the 2000s, supervised techniques reached a plateau in precision, so attention has shifted toward the gross-grain senses, domain adaptation, semi-supervised and unsupervised corpus-based systems, combinations of Different methods and the return of knowledge-based systems through graph-based methods. However, monitored systems continue to perform better.
To date, a rich variety of techniques have been investigated, from dictionary-based methods that use knowledge encoded in lexical resources to supervised methods of automatic learning in which a classifier is trained for each distinct word in a corpus of manually annotated examples , To completely unsupervised methods that group the occurrences of words, thus inducing the meanings of the word. Among them, supervised learning approaches have been the most successful algorithms to date.The accuracy of current algorithms is difficult to state without a series of caveats. In English, precision at the coarse grain level (homograph) is routinely above 90%, with some methods in particular homographs achieving over 96%. In the finer-sense distinctions, the maximum accuracy from 59.1% to 69.0% has been reported in recent evaluation exercises (SemEval-2007, Senseval-2), where basal accuracy of the simplest possible algorithm to always choose the Prevalence was 51.4% and 57%, respectively.
History
WSD was first formulated as a distinct computational task during the early days of machine translation in the 1940s, making it one of the oldest problems in computational linguistics. Warren Weaver, in his famous 1949 memorandum on translation, first introduced the problem into a computational context. Early investigators well understood the importance and the difficulty of WSD. In fact, Bar-Hillel (1960) used the above example to argue that WSD could not be solved by "electronic computer" because of the general need to model all world knowledge.In the 1970s, WSD was a subtask of semantic interpretation systems developed within the field of artificial intelligence, but since WSD systems were primarily rule-based and manually coded, they were prone to a knowledge acquisition bottleneck.
In the 1980s, large-scale lexical resources, such as the Oxford Advanced Learner's Dictionary of English (OALD), were available: manual coding was replaced by the knowledge automatically extracted from these resources, but the disambiguation was still based on knowledge Or dictionaries.In the 1990s, the statistical revolution travelled through computational linguistics and WSD became a paradigm problem in which to apply machine-supervised learning techniques.
In the 2000s, supervised techniques reached a plateau in precision, so attention has shifted toward the gross-grain senses, domain adaptation, semi-supervised and unsupervised corpus-based systems, combinations of Different methods and the return of knowledge-based systems through graph-based methods. However, monitored systems continue to perform better.