Author: Nikolett Mus
Date: 28.05.2025
This document outlines the annotation conventions for spoken language-specific features in Tundra Nenets, including prosodic tags, discourse phenomena, and structural relations. It reflects the current annotation practice and serves as a working reference.
<n_n> = Non-linguistic noise, but speaker generated (e.g., cough, laugh, sigh)<w_n> = Environmental/world noise (e.g., background chatter, traffic, mic bump)ID: lexical unit on its own
FORM: tag
LEMMA: tag
UPOS: X
DEPREL: noise, attached to the element it follows
Gloss: NOISE
Interruptions in the speaker's normal flow, including errors, hesitations, restarts.
<a_d> = Audible disfluency (e.g., "uh", "erm")ID: lexical unit on its own
FORM: tag
LEMMA: tag
UPOS: INTJ
DEPREL: discourse, attached to the following element
Gloss: DISFL
<a> = Articulatory lengthening (e.g., prolonged sound)ID: attached to lexeme
FORM: The tag <a> should be attached to the relevant lexeme without a space.
LEMMA: not relevant
UPOS: not relevant
DEPREL: not relevant
Gloss: not relevant
<u_l> = Unfinished lexemeID: attached to lexeme
FORM: The tag <u_l> should be attached to the relevant lexeme without a space.
LEMMA: Full form of expected word
UPOS: not relevant
DEPREL: not relevant
Gloss: not relevant
<f_s> = False startID: attached to lexeme
FORM: The tag <f_s> should be attached to the lexeme without a space.
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: reparandum attached to the node that replaces or repairs it
Gloss: Gloss of base/intended word
<e_r> = Exact repetitionID: attached to lexeme
FORM: The tag <e_r> attached to lexeme without a space
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: reparandum attached to the next instance (the "actual", i.e. the first use).
Gloss: Gloss of base/intended word
<p_r> = Partial repetitionID: attached to lexeme
FORM: The tag <p_r> attached to lexeme without a space
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: reparandum attached to the next instance (the "actual", i.e. the first use).
Gloss: Gloss of base/intended word
<m_l> = Merged lexemes, when two or more lexemes are pronounced without a boundary in speech, but are distinct syntactic units.ID: attached to lexeme
FORM: The lexemes must be separated and the tag <m_l> should precede the subsequent lexeme without a space.
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: As per syntax
Gloss: Gloss of lemma
<u_w> = Unusual or incorrect word (word choice error)
ID: attached to lexeme
FORM: The tag <u_w> attached to word without space
LEMMA: Lemma of incorrect word
UPOS: POS of the word as it appears
DEPREL: As if correct
Gloss: Literal gloss of the incorrect word
In Tundra Nenets it appears:
<s_p> = Silent pause<b_p> = Breathing pauseID: lexical unit on its own
FORM: Tag
LEMMA: Tag
UPOS: PUNCT
DEPREL: punct attached to the head of the clause/sentence it follows. Note, that the last punct is always dependent to the root
Gloss: SIL
In Tundra Nenets it appears:
<s_p> = Silent pause<b_p> = Breathing pauseID: lexical unit on its own
FORM: Tag
LEMMA: Tag
UPOS: X
DEPREL: discourse attached to the constituent it follows.
Gloss: PAUSE
This document is part of the [Project Name] and is freely available under a CC BY 4.0 license.