Author: Nikolett Mus
Date: 28.05.2025
This document outlines the annotation conventions for spoken language-specific features in Tundra Nenets, including prosodic tags, discourse phenomena, and structural relations. It reflects the current annotation practice and serves as a working reference.
<n_n>
= Non-linguistic noise, but speaker generated (e.g., cough, laugh, sigh)<w_n>
= Environmental/world noise (e.g., background chatter, traffic, mic bump)ID: lexical unit on its own
FORM: tag
LEMMA: tag
UPOS: X
DEPREL: noise
, attached to the element it follows
Gloss: NOISE
Interruptions in the speaker's normal flow, including errors, hesitations, restarts.
<a_d>
= Audible disfluency (e.g., "uh", "erm")ID: lexical unit on its own
FORM: tag
LEMMA: tag
UPOS: INTJ
DEPREL: discourse
, attached to the following element
Gloss: DISFL
<a>
= Articulatory lengthening (e.g., prolonged sound)ID: attached to lexeme
FORM: The tag <a>
should be attached to the relevant lexeme without a space.
LEMMA: not relevant
UPOS: not relevant
DEPREL: not relevant
Gloss: not relevant
<u_l>
= Unfinished lexemeID: attached to lexeme
FORM: The tag <u_l>
should be attached to the relevant lexeme without a space.
LEMMA: Full form of expected word
UPOS: not relevant
DEPREL: not relevant
Gloss: not relevant
<f_s>
= False startID: attached to lexeme
FORM: The tag <f_s>
should be attached to the lexeme without a space.
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: reparandum
attached to the node that replaces or repairs it
Gloss: Gloss of base/intended word
<e_r>
= Exact repetitionID: attached to lexeme
FORM: The tag <e_r>
attached to lexeme without a space
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: reparandum
attached to the next instance (the "actual", i.e. the first use).
Gloss: Gloss of base/intended word
<p_r>
= Partial repetitionID: attached to lexeme
FORM: The tag <p_r>
attached to lexeme without a space
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: reparandum
attached to the next instance (the "actual", i.e. the first use).
Gloss: Gloss of base/intended word
<m_l>
= Merged lexemes, when two or more lexemes are pronounced without a boundary in speech, but are distinct syntactic units.ID: attached to lexeme
FORM: The lexemes must be separated and the tag <m_l>
should precede the subsequent lexeme without a space.
LEMMA: underlying lemma (base word) without the tag
UPOS: POS of the base word
DEPREL: As per syntax
Gloss: Gloss of lemma
<u_w>
= Unusual or incorrect word (word choice error)
ID: attached to lexeme
FORM: The tag <u_w>
attached to word without space
LEMMA: Lemma of incorrect word
UPOS: POS of the word as it appears
DEPREL: As if correct
Gloss: Literal gloss of the incorrect word
In Tundra Nenets it appears:
<s_p>
= Silent pause<b_p>
= Breathing pauseID: lexical unit on its own
FORM: Tag
LEMMA: Tag
UPOS: PUNCT
DEPREL: punct
attached to the head of the clause/sentence it follows. Note, that the last punct is always dependent to the root
Gloss: SIL
In Tundra Nenets it appears:
<s_p>
= Silent pause<b_p>
= Breathing pauseID: lexical unit on its own
FORM: Tag
LEMMA: Tag
UPOS: X
DEPREL: discourse
attached to the constituent it follows.
Gloss: PAUSE
This document is part of the [Project Name] and is freely available under a CC BY 4.0 license.