Enrichment of terminological glossaries of Albanian language using specialized corpora

External Link

Conference Paper

Ali Caka, Vehbi Neziri, Ramadan Dervishi

Publication year: 2013

ABSTRACT

Recent development of information and communication technologies has already affected all areas of knowledge, creating new areas of study: computational linguistics, language technologies, corpus linguistics and other fields that are developing day by day. Based on language technologies, computational linguistics and corpus linguistics today are developed large textual corpora which are used for various purposes: for teaching and developing different vocabularies: such as those explanatory and terminological vocabularies – monolingual or multilingual. The purpose of this paper is using specialized corpus which focuses on a particular area of knowledge to show the way of creating such corpora and development of appropriate terminological dictionaries needed to enrich the existing terminological dictionaries or creating new terminological dictionaries in specific field of knowledge. Specialized corpus contains all the texts that should illustrate with a concrete example the variety explored in a given field of knowledge. In this case, a corpus should contain a sufficient number of examples of this phenomenon, which we plan to investigate; how many examples are sufficient depends on the phenomenon. A research of general morphological or syntactic structures is possible with a corpus of 1 million words; a research of lexical units, specific grammatical structures and idioms requires larger corpora of several hundred million words. This research paper among other things aim to show the use of specialized software tools for corpus analysis and extraction of words in context and concordances needed to develop appropriate vocabulary.

Vehbi Neziri

PhD in Computer Science

Enrichment of terminological glossaries of Albanian language using specialized corpora