![Interlinear gloss](https://www.english.nina.az/wikipedia/image/aHR0cHM6Ly91cGxvYWQud2lraW1lZGlhLm9yZy93aWtpcGVkaWEvY29tbW9ucy90aHVtYi84Lzg5L1RvdXNzYWludC1MYW5nZW5zY2hlaWR0X1NwYW5pc2NoXzcuMTMucG5nLzE2MDBweC1Ub3Vzc2FpbnQtTGFuZ2Vuc2NoZWlkdF9TcGFuaXNjaF83LjEzLnBuZw==.png )
In linguistics and pedagogy, an interlinear gloss is a gloss (series of brief explanations, such as definitions or pronunciations) placed between lines, such as between a line of original text and its translation into another language. When glossed, each line of the original text acquires one or more corresponding lines of transcription known as an interlinear text or interlinear glossed text (IGT) – an interlinear for short. Such glosses help the reader follow the relationship between the source text and its translation, and the structure of the original language. In its simplest form, an interlinear gloss is a literal, word-for-word translation of the source text.
History
![image](https://www.english.nina.az/wikipedia/image/aHR0cHM6Ly93d3cuZW5nbGlzaC5uaW5hLmF6L3dpa2lwZWRpYS9pbWFnZS9hSFIwY0hNNkx5OTFjR3h2WVdRdWQybHJhVzFsWkdsaExtOXlaeTkzYVd0cGNHVmthV0V2WTI5dGJXOXVjeTkwYUhWdFlpODRMemc1TDFSdmRYTnpZV2x1ZEMxTVlXNW5aVzV6WTJobGFXUjBYMU53WVc1cGMyTm9YemN1TVRNdWNHNW5Mekl5TUhCNExWUnZkWE56WVdsdWRDMU1ZVzVuWlc1elkyaGxhV1IwWDFOd1lXNXBjMk5vWHpjdU1UTXVjRzVuLnBuZw==.png)
Interlinear glosses have been used for a variety of purposes over a long period of time. One common usage has been to annotate bilingual textbooks for language education. This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language.
Such annotations have occasionally been expressed not through interlinear layout, but rather through enumeration of words in the object and meta language. One such example is Wilhelm von Humboldt's annotation of Classical Nahuatl:
1
ni-
1
ich
2
c-
3
mache
3
chihui
2
es
4
-lia
4
für
5
in
5
der
6
no-
6
mein
7
piltzin
7
Sohn
8
ce
8
ein
9
calli
9
Haus
This "inline" style allows examples to be included within the flow of text, and for the word order of the target language to be written in an order which approximates the target language syntax. (In the gloss here, mache es is reordered from the corresponding source order to approximate German syntax more naturally.) Even so, this approach requires the readers to "re-align" the correspondences between source and target forms.
More modern 19th- and 20th-century approaches took to glossing vertically, aligning the same sort of word-by-word content in such a way that the metalanguage terms were placed vertically below the source language terms. In this style, the given example might be rendered thus (here English gloss):
ni-
I
c-
it
chihui
make
-lia
for
in
to-the
no-
my
piltzin
son
ce
a
calli
house
"I made my son a house."
Here word ordering is determined by the syntax of the object language.
Finally, modern linguists have adopted the practice of using abbreviated grammatical category labels. A 2008 publication which repeats this example labels it as follows:
ni-c-chihui-lia
1SG.SUBJ-3SG.OBJ-mach-APPL
in
DET
no-piltzin
1SG.POSS-Sohn
ce
ein
calli
Haus
This approach is denser and also requires effort to read, but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms.
In computing, special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses.
Structure
Though there is no formal specification for the IGT format, the Leipzig Glossing Rules are a set of guidelines that aim to standardize the format as much as possible.
An interlinear text for linguistics will commonly consist of some or all of the following, usually in this order, from top to bottom:
- The original orthography (typically in italic or bold italic),
- a conventional transliteration into the Latin alphabet,
- a phonetic transcription,
- a morphophonemic transliteration,
- a word-by-word or morpheme-by-morpheme gloss, where morphemes within a word are separated by hyphens or other punctuation,
and finally
- a free translation, which may be placed in a separate paragraph or on the facing page if the structures of the languages are too different for it to follow the text line by line.
As an example, the following Taiwanese Minnan clause has been transcribed with five lines of text:
- 1. the standard pe̍h-ōe-jī transliteration,
- 2. a gloss using tone numbers for the surface tones,
- 3. a gloss showing the underlying tones in citation form (before undergoing tone sandhi),
- 4. a morpheme-by-morpheme gloss in English, and
- 5. an English translation:
(1.)
(2.)
(3.)
(4.)
goá
goa1
goa2
I
iáu-boē
iau1-boe3
iau2-boe7
not-yet
koat-tēng
koat2-teng3
koat4-teng7
decide
tang-sî
tang7-si5
tang1-si5
when
boeh
boeh2
boeh4
want
tńg-khì
tng1-khi3.
tng2-khi3.
return.
(5.) "I have not yet decided when I shall return."
Word-by-word alignment. According to the Leipzig Glossing Rules, it is standard to left-align the words in the object language with the corresponding words in the metalanguage; this alignment can be seen between lines (1-3) and line (4).
Morpheme-by-morpheme correspondence. At the sub-word level, segmentable morphemes are separated by hyphens, both in the example and in the gloss. There should be the same number of hyphens in the example and in the gloss, as shown in the following example:
Gila
now
abur-u-n
they-OBL-GEN
ferma
farm
hamišaluǧ
forever
güǧüna
behind
amuqʼ-da-č
stay-FUT-NEG
'Now their farm will not stay behind forever.'
Grammatical category labels. In amuqʼ-da-č, the stem (amuq) is translated into the corresponding English lexeme (stay) while the inflectional affixes (da) and (č) are inflectional affixes representing future tense and negation. These inflectional affixes are glossed as FUT and NEG; a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules.
One-to-many correspondences. When a single object-language element corresponds to several metalanguage elements, they are separated by periods. E.g.,
çık-mak
come.out-INF
'to come out'
Non-overt elements. if the morpheme-by-morpheme gloss (middle line) contains an element that does not correspond to an overt element in the example, a standard strategy is to include an overt "ø" in the object-language text, which is separated by a hyphen like an overt element would be:
puer-ø
boy-NOM
'boy'
Reduplication is treated similarly to affixation but with a tilde (instead of the standard hyphen) that connects the copied element to the stem:
bi~bili
IPFV~buy
'is buying'
Punctuation
In interlinear morphological glosses, various forms of punctuation separate the glosses. Typically, the words are aligned with their glosses; within words, a hyphen is used when a boundary is marked in both the text and its gloss, a period when a boundary appears in only one. That is, there should be the same number of words separated with spaces in the text and its gloss, as well as the same number of hyphenated morphemes within a word and its gloss. This is the basic system, and can be applied universally. For example:
oda-dan
room-ABL
room-from
hız-lı
speed-COM
speed-with
çık-tı-m
go.out-PFV-1sg
go_out-perfective-I
Turkish
'I left the room quickly.'
An underscore may be used instead of a period, as in go_out-PFV, when a single word in the source language happens to correspond to a phrase in the glossing language, though a period would still be used for other situations, such as Greek oikíais house.FEM.PL.DAT 'to the houses'.
However, sometimes finer distinctions may be made. For example, clitics may be separated with a double hyphen (or, for ease of typing, an equal sign) rather than a hyphen. A French example:
je⹀te⹀aime
I⹀you⹀love
(French)
'I love you.'
Affixes which cause discontinuity (infixes, circumfixes, transfixes, etc.) may be set off by angle brackets, and reduplication with tildes, rather than with hyphens:
sulat
write
su~sulat
~write
s⟨um⟩ulat
⟨agent trigger.past⟩write
s⟨um⟩u~sulat
⟨agent trigger⟩contemplative~write
(See affix for other examples.)
Morphemes which cannot be easily separated out, such as umlaut, may be marked with a backslash rather than a period:
unser-n
our-DAT.PL
Väter-n
father\PL-DAT.PL
(German)
'to our fathers' (the singular of Väter 'fathers' is Vater)
A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules.
Interlinear gloss resources
Efforts have been undertaken to digitize IGT for hundreds of the world's languages.
Online Database of Interlinear Text
The Online Database of Interlinear Text (ODIN) is a database of over 200,000 instances of interlinear glosses for more than 1,500 languages extracted from scholarly linguistic research. The database was constructed in two phases: automatic construction followed by manual correction. The automatic construction stage itself was completed in three steps:
- First, search engines (e.g., Google, Bing) were queried to retrieve scholarly documents that were likely to contain interlinear glosses. The queries comprised terms relevant to linguistic research such as grammatical morphemes (e.g., "NOM", short for nominative; "3SG", short for 3rd person singular).
- Second, each line in an extracted document was tagged for whether it was a line belonging to an interlinear gloss or not using sequence-labeling methods from Machine Learning.
- Third, each interlinear gloss instance was assigned a language name (e.g., Tagalog) and an ISO 693-3 language ID. Language names and IDs were automatically assigned to interlinear glosses using Coreference Resolution models from Natural Language Processing, where the interlinear gloss instance was tagged with the language name (and ID) that appears in the scholarly document the interlinear gloss instance was extracted from.
In the manual correction phase, the database creators manually corrected the boundaries of the interlinear gloss instances discovered by the sequence-labelling method in Step 2 of the automatic construction phase. The creators then verified the language names and language codes in a second and third pass over the data, respectively.
Range of interlinear gloss instances | Number of languages | Number of interlinear gloss instances | Percent of interlinear gloss instances |
---|---|---|---|
>10,000 | 3 (1) | 36,691 (10,814) | 19.39 (6.88) |
1000-9999 | 37 (31) | 97,158 (81,218) | 51.34 (51.69) |
100-999 | 122 (139) | 40,260 (46,420) | 21.27 (29.55) |
10-99 | 326 (460) | 12,822 (15,560) | 6.78 (9.96) |
1-9 | 838 (862) | 2,313 (3,012) | 1.22 (1.92) |
Total | 1,326 (1,493) | 189,244 (157,114) | 100 (100) |
Automatic processing of interlinear gloss instances
Natural Language Processing models leveraging interlinear gloss resources, such as the Online Database of Interlinear Text, have been developed.
Automatic glossing
Natural Language Processing systems, for example, have been developed to automatically produce interlinear glosses.:
mi-s
you-GEN
ħumukuli
camel
elu-ab-ok'ek'-asi
we.OBL-ERG.1.PL-steal-PRT
anu
be.NEG
'We didn't steal your camel.'
Given the morpheme segmented line (first line above) and the free translation line (third line above), the task is to produce the middle glossed line comprising stem translations (e.g., mi:you) and the grammatical category labels corresponding to affixes (e.g., a:ERG.1.PL). Sequence prediction models from Natural Language Processing have been used to perform this task. Two factors contribute to the difficulty of this task:
- The translation is not necessarily in alignment with the morpheme segmented line (e.g., camel is the last word in the translation but the second word in the morpheme segmented line).
- Some words in the morpheme segmented line have multiple correspondences in the gloss (e.g., anu:be.NEG).
Some constructed languages like Ithkuil and Lojban have automated tools that (in theory) will always result in accurate glossing due to the regularized and logical nature of these languages. Here are examples of glosses of Ithkuil and Lojban respectively:
A'zvaţcaxüẓpöňḑeššaščëirktöňçogjahnói
S1-“dog”-‘what is inferred to be X’₁-‘huge’₁-‘as a planned result of human action’₁-‘some or other’₁-DDF-'as powder or dust’₁-‘eaten as afternoon snack’₁-‘trustworthiness of source unknown, and info not verifiable’₁-‘conjecture/theory/hypothesis that is testable/verifiable’₁-COU-POT
nnţ
"It can only mean one thing..."
There's only one explanation; can't prove this and my mental state is somewhat foggy, but it would definitely have been an ill formed fusion of that pair of different man-made huge creatures that seem to be dogs in the form of dust served as an afternoon snack way over there by you. Oh and don't quote me on that.
mi
I=x1
lumci
wash
le
DET
creka
shirt=x2
le
DET
grasu
grease=x3
le
DET
rirxe
river=x4
I wash the grease off the shirt in the river.
Automatic discovery of morphological structure from glosses
Researchers have used interlinear glosses to obtain the morphological paradigms of the object language (i.e., the language being glossed). To automatically create morphological paradigms from interlinear glosses, researchers have created tables for every stem in the gloss and a (possibly empty) slot for every grammatical category (e.g., ERG) in the gloss. For instance, given the glossed sentence below:
Vecher-om
evening-INS
ya
1.SG.NOM
pobeja-la
run-PFV.PST.SG.FEM
v
in
magazin
store.ACC
'In the evening I ran to the store.'
There would be a paradigm for the stem pobeja with slots for PFV.PST.SG.FEM and PFV.PST.SG.MASC:
Slot | inflection |
---|---|
PFV.PST.SG.FEM | pobeja-la |
PFV.PST.SG.MASC | ? |
The slot for PFV.PST.SG.FEM would be filled (since it was observed in the interlinear gloss data) but the slot for PFV.PST.SG.MASC would be empty (assuming that no other interlinear gloss instance contains pobeja inflected for the PFV.PST.SG.MASC grammatical category). A statistical machine learning model for morphological inflection can be used to fill in the missing entries.
See also
- Kanbun – Japanese tradition of glossing Classical Chinese texts
- Ruby text – a gloss sometimes used with Chinese or Japanese to show the pronunciation
- Part-of-speech tagging, often displayed as interlinear glosses under the tagged words, sometimes at the same time as an interlinear word-by-word translation
- Treebanks, often displayed as a gloss or annotation to the original text.
- James Hamilton, nineteenth-century composer and promoter of interlinear texts for language learning
- Metaphrase
References
- Lehmann, Christian (2004-01-23). "Directions for interlinear morphemic translations". In Geert Booij; Christian Lehmann; Joachim Mugdan; Stavros Skopeteas (eds.). Morphologie. Ein internationales Handbuch zur Flexion und Wortbildung. Handbücher der Sprach- und Kommunikationswissenschaft. Vol. 2. Berlin: W. de Gruyter. pp. 1834–1857.
- Haspelmath, Martin (2008). Language typology and language universals: an international handbook. Walter de Gruyter. p. 715. ISBN 978-3-11-011423-2.
- Bickel, Balthasar; Bernard Comrie; Martin Haspelmath (February 2008). "The Leipzig Glossing Rules. Conventions for Interlinear Morpheme by Morpheme Glosses". Dept. of Linguistics – Resources – Glossing Rules. Retrieved 2010-06-30.
- Example from A Basic Vocabulary for a Beginner in Taiwanese by Ko Chek Hoan and Tan Pang Tin
- Georgi, Ryan (2016). From Aari to Zulu: massively multilingual creation of language tools using interlinear glossed tex (PhD). University of Washington.
- Xia, Fei; Lewis, William; Wayne, Michael; Slayden, Glenn; Georgi, Ryan; Crowgey, Joshua; Bender, Emily (2016). "Enriching a massively multilingual database of interlinear glossed text". Language Resources and Evaluation. 50 (2): 321–349. doi:10.1007/s10579-015-9325-4. S2CID 2674996. Retrieved 2021-12-15.
- Xingyuan, Zhao; Satoru, Ozaki; Anastasopoulos, Antonios; Neubig, Graham; Levin, Lori (2020). "Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations". COLING. Proceedings of the 28th International Conference on Computational Linguistics: 5397–5408. doi:10.18653/v1/2020.coling-main.471. S2CID 227231816. Retrieved 2021-12-15.
- Moeller, Sarah; Liu, Ling; Yang, Changbing; Kann, Katharina; Hulden, Mans (2020). "IG2P: From Interlinear Glossed Texts to Paradigms". EMNLP. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP): 5251–5262. doi:10.18653/v1/2020.emnlp-main.424. S2CID 226262296. Retrieved 2021-12-15.
- Silfverberg, Miikka; Hulden, Mans (2018). "An Encoder-Decoder Approach to the Paradigm Cell Filling Problem". Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics. pp. 2883–2889. doi:10.18653/v1/D18-1315. S2CID 53082616.
- Wu, Shijie; Cotterell, Ryan; Hulden, Mans (2021). "Applying the Transformer to Character-level Transduction". Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: Association for Computational Linguistics. pp. 1901–1907. arXiv:2005.10213. doi:10.18653/v1/2021.eacl-main.163. S2CID 218718982.
- Nicolai, Garrett; Cherry, Colin; Kondrak, Grzegorz (2015). "Inflection Generation as Discriminative String Transduction". Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, Colorado: Association for Computational Linguistics. pp. 922–931. doi:10.3115/v1/N15-1093. S2CID 14929030.
- Bhargava, Aditya; Kondrak, Grzegorz (2012). "Leveraging supplemental representations for sequential transduction". Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Montréal, Canada: Association for Computational Linguistics: 396–406.
External links
- The Leipzig Glossing Rules: Conventions for interlinear morpheme-by-morpheme glosses
- Interlinear Glossed Text Standards (E-MELD)
- Interlinear Glossed Text Levels (E-MELD)
- Towards a General Model of Interlinear Text (E-MELD)
- Interlinear Morphemic Glosses
- Glossing Ancient Languages and Texts. A forum for recommendations on the Interlinar Morphemic Glossing of ancient languages as attested in ancient manuscripts.
- Online Interlinear of Biblical Greek Scriptures (New Testament) text
- ODIN - The Online Database of INterlinear text
- Latinum Interlinear Method page Listing of older interlinear and construed texts, mostly from Latin or Ancient Greek and mostly to English
- Ernest Blum, "The New Old Way of Learning Languages", The American Scholar, Autumn 2008.
In linguistics and pedagogy an interlinear gloss is a gloss series of brief explanations such as definitions or pronunciations placed between lines such as between a line of original text and its translation into another language When glossed each line of the original text acquires one or more corresponding lines of transcription known as an interlinear text or interlinear glossed text IGT an interlinear for short Such glosses help the reader follow the relationship between the source text and its translation and the structure of the original language In its simplest form an interlinear gloss is a literal word for word translation of the source text HistoryInterlinear text in Toussaint Langenscheidt Spanisch a Spanish language textbook for German speakers 1910 Interlinear glosses have been used for a variety of purposes over a long period of time One common usage has been to annotate bilingual textbooks for language education This sort of interlinearization serves to help make the meaning of a source text explicit without attempting to formally model the structural characteristics of the source language Such annotations have occasionally been expressed not through interlinear layout but rather through enumeration of words in the object and meta language One such example is Wilhelm von Humboldt s annotation of Classical Nahuatl 1 ni 1 ich2 c 3 mache3 chihui 2 es4 lia 4 fur5 in 5 der6 no 6 mein7 piltzin 7 Sohn8 ce 8 ein9 calli 9 Haus 1 2 3 4 5 6 7 8 9 ni c chihui lia in no piltzin ce calli 1 3 2 4 5 6 7 8 9 ich mache es fur der mein Sohn ein Haus This inline style allows examples to be included within the flow of text and for the word order of the target language to be written in an order which approximates the target language syntax In the gloss here mache es is reordered from the corresponding source order to approximate German syntax more naturally Even so this approach requires the readers to re align the correspondences between source and target forms More modern 19th and 20th century approaches took to glossing vertically aligning the same sort of word by word content in such a way that the metalanguage terms were placed vertically below the source language terms In this style the given example might be rendered thus here English gloss ni Ic itchihui make lia forin to theno mypiltzin sonce acalli house ni c chihui lia in no piltzin ce calli I it make for to the my son a house I made my son a house Here word ordering is determined by the syntax of the object language Finally modern linguists have adopted the practice of using abbreviated grammatical category labels A 2008 publication which repeats this example labels it as follows ni c chihui lia 1SG SUBJ 3SG OBJ mach APPLin DETno piltzin 1SG POSS Sohnce eincalli Haus ni c chihui lia in no piltzin ce calli 1SG SUBJ 3SG OBJ mach APPL DET 1SG POSS Sohn ein Haus This approach is denser and also requires effort to read but it is less reliant on the grammatical structure of the metalanguage for expressing the semantics of the target forms In computing special text markers are provided in the Specials Unicode block to indicate the start and end of interlinear glosses StructureThough there is no formal specification for the IGT format the Leipzig Glossing Rules are a set of guidelines that aim to standardize the format as much as possible An interlinear text for linguistics will commonly consist of some or all of the following usually in this order from top to bottom The original orthography typically in italic or bold italic a conventional transliteration into the Latin alphabet a phonetic transcription a morphophonemic transliteration a word by word or morpheme by morpheme gloss where morphemes within a word are separated by hyphens or other punctuation and finally a free translation which may be placed in a separate paragraph or on the facing page if the structures of the languages are too different for it to follow the text line by line As an example the following Taiwanese Minnan clause has been transcribed with five lines of text 1 the standard pe h ōe ji transliteration 2 a gloss using tone numbers for the surface tones 3 a gloss showing the underlying tones in citation form before undergoing tone sandhi 4 a morpheme by morpheme gloss in English and 5 an English translation 1 2 3 4 goa goa1 goa2 Iiau boe iau1 boe3 iau2 boe7 not yetkoat teng koat2 teng3 koat4 teng7 decidetang si tang7 si5 tang1 si5 whenboeh boeh2 boeh4 wanttng khi tng1 khi3 tng2 khi3 return 1 goa iau boe koat teng tang si boeh tng khi 2 goa1 iau1 boe3 koat2 teng3 tang7 si5 boeh2 tng1 khi3 3 goa2 iau2 boe7 koat4 teng7 tang1 si5 boeh4 tng2 khi3 4 I not yet decide when want return 5 I have not yet decided when I shall return Word by word alignment According to the Leipzig Glossing Rules it is standard to left align the words in the object language with the corresponding words in the metalanguage this alignment can be seen between lines 1 3 and line 4 Morpheme by morpheme correspondence At the sub word level segmentable morphemes are separated by hyphens both in the example and in the gloss There should be the same number of hyphens in the example and in the gloss as shown in the following example Gila nowabur u n they OBL GENferma farmhamisaluǧ foreverguǧuna behindamuqʼ da c stay FUT NEG Gila abur u n ferma hamisaluǧ guǧuna amuqʼ da c now they OBL GEN farm forever behind stay FUT NEG Now their farm will not stay behind forever Grammatical category labels In amuqʼ da c the stem amuq is translated into the corresponding English lexeme stay while the inflectional affixes da and c are inflectional affixes representing future tense and negation These inflectional affixes are glossed as FUT and NEG a list of standard abbreviations for grammatical categories that are widely used in linguistics can be found in the Leipzig Glossing Rules One to many correspondences When a single object language element corresponds to several metalanguage elements they are separated by periods E g cik mak come out INF cik mak come out INF to come out Non overt elements if the morpheme by morpheme gloss middle line contains an element that does not correspond to an overt element in the example a standard strategy is to include an overt o in the object language text which is separated by a hyphen like an overt element would be puer o boy NOM puer o boy NOM boy Reduplication is treated similarly to affixation but with a tilde instead of the standard hyphen that connects the copied element to the stem bi bili IPFV buy bi bili IPFV buy is buying PunctuationIn interlinear morphological glosses various forms of punctuation separate the glosses Typically the words are aligned with their glosses within words a hyphen is used when a boundary is marked in both the text and its gloss a period when a boundary appears in only one That is there should be the same number of words separated with spaces in the text and its gloss as well as the same number of hyphenated morphemes within a word and its gloss This is the basic system and can be applied universally For example Odadan hizli ciktim oda dan room ABL room fromhiz li speed COM speed withcik ti m go out PFV 1sg go out perfective ITurkish oda dan hiz li cik ti m room ABL speed COM go out PFV 1sg room from speed with go out perfective I I left the room quickly An underscore may be used instead of a period as in go out PFV when a single word in the source language happens to correspond to a phrase in the glossing language though a period would still be used for other situations such as Greek oikiais house FEM PL DAT to the houses However sometimes finer distinctions may be made For example clitics may be separated with a double hyphen or for ease of typing an equal sign rather than a hyphen A French example Je t aime je te aime I you love French je te aime I you love I love you Affixes which cause discontinuity infixes circumfixes transfixes etc may be set off by angle brackets and reduplication with tildes rather than with hyphens sulat susulat sumulat sumusulat verbal declensions Tagalog sulat writesu sulat writes um ulat agent trigger past writes um u sulat agent trigger contemplative write sulat su sulat s um ulat s um u sulat write write agent trigger past write agent trigger contemplative write See affix for other examples Morphemes which cannot be easily separated out such as umlaut may be marked with a backslash rather than a period unser n our DAT PLVater n father PL DAT PL German unser n Vater n our DAT PL father PL DAT PL to our fathers the singular of Vater fathers is Vater A few other conventions which are sometimes seen are illustrated in the Leipzig Glossing Rules Interlinear gloss resourcesEfforts have been undertaken to digitize IGT for hundreds of the world s languages Online Database of Interlinear Text The Online Database of Interlinear Text ODIN is a database of over 200 000 instances of interlinear glosses for more than 1 500 languages extracted from scholarly linguistic research The database was constructed in two phases automatic construction followed by manual correction The automatic construction stage itself was completed in three steps First search engines e g Google Bing were queried to retrieve scholarly documents that were likely to contain interlinear glosses The queries comprised terms relevant to linguistic research such as grammatical morphemes e g NOM short for nominative 3SG short for 3rd person singular Second each line in an extracted document was tagged for whether it was a line belonging to an interlinear gloss or not using sequence labeling methods from Machine Learning Third each interlinear gloss instance was assigned a language name e g Tagalog and an ISO 693 3 language ID Language names and IDs were automatically assigned to interlinear glosses using Coreference Resolution models from Natural Language Processing where the interlinear gloss instance was tagged with the language name and ID that appears in the scholarly document the interlinear gloss instance was extracted from In the manual correction phase the database creators manually corrected the boundaries of the interlinear gloss instances discovered by the sequence labelling method in Step 2 of the automatic construction phase The creators then verified the language names and language codes in a second and third pass over the data respectively The language distribution of interlinear gloss instances in Online Database of Interlinear Text after phase 1 and phase 2 Range of interlinear gloss instances Number of languages Number of interlinear gloss instances Percent of interlinear gloss instances gt 10 000 3 1 36 691 10 814 19 39 6 88 1000 9999 37 31 97 158 81 218 51 34 51 69 100 999 122 139 40 260 46 420 21 27 29 55 10 99 326 460 12 822 15 560 6 78 9 96 1 9 838 862 2 313 3 012 1 22 1 92 Total 1 326 1 493 189 244 157 114 100 100 Automatic processing of interlinear gloss instancesNatural Language Processing models leveraging interlinear gloss resources such as the Online Database of Interlinear Text have been developed Automatic glossing Natural Language Processing systems for example have been developed to automatically produce interlinear glosses mi s you GENħumukuli camelelu ab ok ek asi we OBL ERG 1 PL steal PRTanu be NEG mi s ħumukuli elu ab ok ek asi anu you GEN camel we OBL ERG 1 PL steal PRT be NEG We didn t steal your camel Given the morpheme segmented line first line above and the free translation line third line above the task is to produce the middle glossed line comprising stem translations e g mi you and the grammatical category labels corresponding to affixes e g a ERG 1 PL Sequence prediction models from Natural Language Processing have been used to perform this task Two factors contribute to the difficulty of this task The translation is not necessarily in alignment with the morpheme segmented line e g camel is the last word in the translation but the second word in the morpheme segmented line Some words in the morpheme segmented line have multiple correspondences in the gloss e g anu be NEG Some constructed languages like Ithkuil and Lojban have automated tools that in theory will always result in accurate glossing due to the regularized and logical nature of these languages Here are examples of glosses of Ithkuil and Lojban respectively A zvaţcaxuẓponḑessasceirktoncogjahnoi S1 dog what is inferred to be X huge as a planned result of human action some or other DDF as powder or dust eaten as afternoon snack trustworthiness of source unknown and info not verifiable conjecture theory hypothesis that is testable verifiable COU POTnnţ It can only mean one thing A zvaţcaxuẓponḑessasceirktoncogjahnoi nnţ S1 dog what is inferred to be X huge as a planned result of human action some or other DDF as powder or dust eaten as afternoon snack trustworthiness of source unknown and info not verifiable conjecture theory hypothesis that is testable verifiable COU POT It can only mean one thing There s only one explanation can t prove this and my mental state is somewhat foggy but it would definitely have been an ill formed fusion of that pair of different man made huge creatures that seem to be dogs in the form of dust served as an afternoon snack way over there by you Oh and don t quote me on that mi I x1lumci washle DETcreka shirt x2le DETgrasu grease x3le DETrirxe river x4 mi lumci le creka le grasu le rirxe I x1 wash DET shirt x2 DET grease x3 DET river x4 I wash the grease off the shirt in the river Automatic discovery of morphological structure from glosses Researchers have used interlinear glosses to obtain the morphological paradigms of the object language i e the language being glossed To automatically create morphological paradigms from interlinear glosses researchers have created tables for every stem in the gloss and a possibly empty slot for every grammatical category e g ERG in the gloss For instance given the glossed sentence below Vecher om evening INSya 1 SG NOMpobeja la run PFV PST SG FEMv inmagazin store ACC Vecher om ya pobeja la v magazin evening INS 1 SG NOM run PFV PST SG FEM in store ACC In the evening I ran to the store There would be a paradigm for the stem pobeja with slots for PFV PST SG FEM and PFV PST SG MASC Partial paradigm for pobeja Slot inflectionPFV PST SG FEM pobeja laPFV PST SG MASC The slot for PFV PST SG FEM would be filled since it was observed in the interlinear gloss data but the slot for PFV PST SG MASC would be empty assuming that no other interlinear gloss instance contains pobeja inflected for the PFV PST SG MASC grammatical category A statistical machine learning model for morphological inflection can be used to fill in the missing entries See alsoKanbun Japanese tradition of glossing Classical Chinese texts Ruby text a gloss sometimes used with Chinese or Japanese to show the pronunciation Part of speech tagging often displayed as interlinear glosses under the tagged words sometimes at the same time as an interlinear word by word translation Treebanks often displayed as a gloss or annotation to the original text James Hamilton nineteenth century composer and promoter of interlinear texts for language learning MetaphraseReferencesLehmann Christian 2004 01 23 Directions for interlinear morphemic translations In Geert Booij Christian Lehmann Joachim Mugdan Stavros Skopeteas eds Morphologie Ein internationales Handbuch zur Flexion und Wortbildung Handbucher der Sprach und Kommunikationswissenschaft Vol 2 Berlin W de Gruyter pp 1834 1857 Haspelmath Martin 2008 Language typology and language universals an international handbook Walter de Gruyter p 715 ISBN 978 3 11 011423 2 Bickel Balthasar Bernard Comrie Martin Haspelmath February 2008 The Leipzig Glossing Rules Conventions for Interlinear Morpheme by Morpheme Glosses Dept of Linguistics Resources Glossing Rules Retrieved 2010 06 30 Example from A Basic Vocabulary for a Beginner in Taiwanese by Ko Chek Hoan and Tan Pang Tin Georgi Ryan 2016 From Aari to Zulu massively multilingual creation of language tools using interlinear glossed tex PhD University of Washington Xia Fei Lewis William Wayne Michael Slayden Glenn Georgi Ryan Crowgey Joshua Bender Emily 2016 Enriching a massively multilingual database of interlinear glossed text Language Resources and Evaluation 50 2 321 349 doi 10 1007 s10579 015 9325 4 S2CID 2674996 Retrieved 2021 12 15 Xingyuan Zhao Satoru Ozaki Anastasopoulos Antonios Neubig Graham Levin Lori 2020 Automatic Interlinear Glossing for Under Resourced Languages Leveraging Translations COLING Proceedings of the 28th International Conference on Computational Linguistics 5397 5408 doi 10 18653 v1 2020 coling main 471 S2CID 227231816 Retrieved 2021 12 15 Moeller Sarah Liu Ling Yang Changbing Kann Katharina Hulden Mans 2020 IG2P From Interlinear Glossed Texts to Paradigms EMNLP Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing EMNLP 5251 5262 doi 10 18653 v1 2020 emnlp main 424 S2CID 226262296 Retrieved 2021 12 15 Silfverberg Miikka Hulden Mans 2018 An Encoder Decoder Approach to the Paradigm Cell Filling Problem Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Brussels Belgium Association for Computational Linguistics pp 2883 2889 doi 10 18653 v1 D18 1315 S2CID 53082616 Wu Shijie Cotterell Ryan Hulden Mans 2021 Applying the Transformer to Character level Transduction Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics Main Volume Online Association for Computational Linguistics pp 1901 1907 arXiv 2005 10213 doi 10 18653 v1 2021 eacl main 163 S2CID 218718982 Nicolai Garrett Cherry Colin Kondrak Grzegorz 2015 Inflection Generation as Discriminative String Transduction Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Denver Colorado Association for Computational Linguistics pp 922 931 doi 10 3115 v1 N15 1093 S2CID 14929030 Bhargava Aditya Kondrak Grzegorz 2012 Leveraging supplemental representations for sequential transduction Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics Human Language Technologies Montreal Canada Association for Computational Linguistics 396 406 External linksThe Leipzig Glossing Rules Conventions for interlinear morpheme by morpheme glosses Interlinear Glossed Text Standards E MELD Interlinear Glossed Text Levels E MELD Towards a General Model of Interlinear Text E MELD Interlinear Morphemic Glosses Glossing Ancient Languages and Texts A forum for recommendations on the Interlinar Morphemic Glossing of ancient languages as attested in ancient manuscripts Online Interlinear of Biblical Greek Scriptures New Testament text ODIN The Online Database of INterlinear text Latinum Interlinear Method page Listing of older interlinear and construed texts mostly from Latin or Ancient Greek and mostly to English Ernest Blum The New Old Way of Learning Languages The American Scholar Autumn 2008