The aim of the historical dictionary project is to create a full-type corpus-based dictionary of the early written Latvian texts. The main tasks cover the whole scope of the dictionary making process: to develop a necessary methodology, to write sample entries covering all POS, to make an electronic version of the entries; a further task is to find or to create a lexicographer’s workbench.
Till now ca 500 entries have been compiled (~ 300 appellatives and ~ 200 proper names) and guidelines on dictionary entry writing have been set.
The present report deals with issues concerning corpus compilation, finding spelling variants of the headword, detecting the meaning for lexemes with a small and a large number of occurrences. Special emphasis is put on the description of the origin of the lexeme, detecting lexical, derivational and semantic loans (according to Betz 1959 terminology Lehnwörter; Lehnbildungen; Lehnbedeutungen). Loans could be found among collocations and idioms, as well as in the syntax (German Lehnwendungen; Lehnsyntax. As the early sources are mainly religious texts, the special interest lies in religious discourse analysis.
The Dictionary is the first and for the moment the only known corpus-based dictionary in Latvia. The input data is the Corpus of the Early Latvian Texts ‘SENIE’. The Corpus includes 43 full-text sources with almost 965, 000 tokens covering the 16–18th c.
All main sources of the 16th c. are represented in the Corpus, but more data could be explored: 1) The Lord’s Prayer published in different collections; 2) manuscript data (songs, separate sentences). The huge quantity of 17th c. data should be added to the Corpus, e.g., The Old Testament; dictionaries, both printed and manuscripts; grammars; texts of the late 17 th c. and manuscripts (both ecclesiastic and clerical texts).
One of the issues of corpus development is the unavailability of the early sources in Latvian libraries. Thus, international co-operation should be established in order to raise awareness of the Latvian texts kept in foreign collections and, if possible, to digitalize them.
The early texts are rich in spelling variants which puzzle lexicographers, see five versions of the root māja ‘house’ written as follows: mahj-, mahy-, mai-, maj- and may-. In order to facilitate finding all the occurrences of the head word, a time-consuming solution is to rewrite all the texts in standardized form or to use some software to detect all the spelling variants. Such a solution is found for the Old English texts (software VARD — Variant Detector and the adoption of this practice is worth considering.
In detecting the meaning of a lexeme, problems are caused by words with one or two occurrences in the Corpus and those with several thousand occurrences. By means of the concordance program one is able to process words with up to 2,000 occurrences. See the entry pasaule ‘world’ (1,528 occur at the moment of writing this entry) where not only word meanings with the first and the last citation are listed, but also a number of collocations are presented. While processing a headword with a large number of occurrences (e.g., the conjunction ka ‘that’ with >16,000 occurrences or Dievs ‘God’ > 11,000 occur.) the compilers decided to analyze only two sources per century in detail.
If only one occurrence is met in the Corpus, additional sources should be examined to determine the meaning: other 17–18th c. dictionaries, dictionaries of different vernaculars, studies in history and botany, the Mülenbach-Endzelin dictionary, Grimm’s Das Deutsche Wörterbuch, in some cases (delete the) Luther’s Bible is consulted (e.g. pakaļazobi — Luther’s Backenzähne ‘molars’).
The on-going Dictionary supplies new data for studies of the origin of Latvian words, it detects more precisely the time of the lexeme’s entry into the written language which in most cases is identical to the time of the word’s origin in general.
The compilers of the Dictionary explore the former studies of semantic and lexical loans and only some new explanations or previously unrecognized lexemes are expected to be found. Derivational loans are a challenge for researchers, and new examples are found in corpus analysis, e.g., next to kapsēta ‘graveyard’ we can find the lexeme baznīcsēta ‘churchyard’, which is a derivational loan from Middle Low German kerkhof. In contemporary German Kirchhof is encountered, the same as Swedish kyrkogård.
Early Latvian texts are rich in derivations and compounds the origins of which are still to be clarified. Hopefully, work with corpus, careful text analysis and comparison to possible source texts can supply new data for a historical dictionary and studies of the early religious lexis.
Riga: Latvijas Universitātes Latviešu valodas institūts , 2012. 196-209 p.