NCHLT Tshivenda POS tag set

Tag set

 

For purposes of annotators, this tag set is by and large taken over from Taljard et al (2008) and various documents compiled by G Faasz  and U Heid  from the IMS, Stuttgart and D J Prinsloo and E Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). As described above, the logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

 

The second level of annotation includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

deficient

MORPH

def

 

As for the actual tagging, an additional first level of tagging is envisaged. On this level, linguistic words will be tagged. For Tshivena, this implies that the four orthographic units ndi + a + mu + funa will be tagged as V, since together they constitute a linguistic verb.

 

The tagset currently distinguishes 29 categories and different levels of annotation. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

1.         $ = Punctuation

2.         ABBR = abbreviation

3.         ADJ = adjective

4.         ADV = adverb

5.         ASP = aspectual marker

6.         AUX = auxilliary verb

7.         CCOP = class-indicating copulative subject concord

8.         CDEM = class-indicating demonstrative

9.         CDEMCOP = class-indicating demonstrative copulative

10.      CN = class-indicating nominal prefix

11.      CO = class-indicating object concord

12.      CPOSS = class-indicating possessive concord

13.      CS = class-indicating subject concord

14.      ENUM = enumerative

15.      IDEO = ideophone

16.      INT = interjection

17.      JUNC = conjunction

18.      MNEG =  negative morpheme

19.      N = noun

20.      NPP = place and brand name

21.      NUM = numerative

22.      PART = particle

23.      PROEMP = emphatic pronoun

24.      PROPOSS = possessive pronoun

25.      PROQUANT = quantitative pronoun

26.      QUE = question word

27.      TENSE = tense marker

28.      V = verbal

29.      VCOP = copulative verb

As we envisage going deeper into morphological analysis, we also plan for the implementation of the following tags:

AS = adjectival stem

CA = class indicating adjectival prefix

NS = noun stem

NSuf = nominal suffix

VEnd = verbal ending

VExt = verbal extension

VR = verb root

 

1.         PUNCTUATION

The tag $ is used for all punctuation marks. These include full stops, commas, colons, semi-colons, quotation marks, hyphens, exclamation marks, brackets, etc.

2.         ABBREVIATION

All abbreviations are tagged as ABBR.

 

3.         ADJECTIVE

The following tags are used:

Level 1: ADJ01-14, ADJLOC

Notes:

Examples:

            tshi tshihulu ADJ07

            ha  Hamasia ADJLOC

4.         ADVERB

The following tags are used:

Level 1:          ADV

Level 2:         ADV_loc

Notes:

Examples:   

Nga maana            ADV_nil

Hamaenzhe           ADV_loc

5.         ASPECTUAL MARKER

The following tags are used:

Level 1: ASP

Level 2: ASP_pot, ASP_prog

Note:

Examples:

vha vho amba

ASP_nil

vha kha i amba

ASP_prog

vha  nga amba

ASP_pot

 

6.         AUXILLIARY

The following tag is used:

Level 1: AUX

Notes:

Examples:

vho no  swika

AUX

o mbo i amba 

AUX

 

 

7.         [CLASS-INDICATING] COPULATIVE SUBJECT CONCORD

The following tags are used:

Level 1: CCOP01-10, CCOP14-15, CCOPLOC, CCOPPERS

Level 2: CCOPPERS_1sg, CCOPPERS_1pl, CCOPPERS_2sg, CCOPPERS_2pl

Notes:

Examples:

na nṋe ndi hone

CCOPPERS_1sg

vhurotho vhu hone

CCOP14_nil

ri hafha oroboni

CCOPPERS_1pl

 

8.         [CLASS-INDICATING] DEMONSTRATIVES

The followings tags are used:

CDEM01-10, CDEM14-15, CDEMLOC

Notes:

Examples:   

            Vhathu  vha  CDEM02

            tshioni tshino        CDEM07

            kule  afho      CDEMLOC

 

9.         [CLASS-INDICATING] COPULATIVE DEMONSTRATIVES

The followings tags are used:

Level 1: CDEMCOP

Level 2: CDEMCOP_01-10, CDEMCOP_14-15, CDEMCOP_loc

 

 

 

Notes:

Examples:

            khouno          CDEMCOP_01

            khezwino                  CDEMCOP_08

            khefha                       CDEMCOP_loc

 

10.      [CLASS-INDICATING] NOMINAL PREFIX

11.      [CLASS-INDICATING] OBJECT CONCORD

The following tags are used:

Level 1: CO01-10, CO14-15, COLOC, COPERS

Level 2: COPERS_1pl, COPERS_2pl, COPERS_2sg

Notes:

Examples:

Vho ri  thusa

COPERS_1pl

Ri a u  oa

COPERS_2sg

Ndi  a a funa

CO06

vha o tshi  renga

CO07

 

 

 

12.      [CLASS-INDICATING] POSSESSIVE CONCORD

The following tags are used:

Level 1: CPOSS01-10, 14-15, CPOSSLOC

Notes:

Examples:

Vhana  vha hawe

CPOSS02

zwiambaro zwa    vhana

CPOSS08

Fhasi ha ṱafula

CPOSSLOC

 

13.      [CLASS-INDICATING] SUBJECT CONCORD

The following tags are used:

Level 1: CS01-10, CS14-15, CSLOC, CSINDEF, CSNEUT, CSPERS

Level 2: CSPERS_1sg, CSPERS_1pl, CSPERS_2sg, CSPERS_2pl

Notes:

Examples:

zwo  swika

CS07

a  dzi athu u swika

CS10

fhasi  hu a rothola

CSLOC

hu a fhisa

CSINDEF

ho vha hu vhuriha

CSNEUT

u a dina

CSPERS_2sg

ra thoma mushumo

CSPERS_1pl

 

14.      ENUMERATIVE

The following tag is used:

Level 1:          ENUM

Note:

Examples:

bulayo ivhi  

ENUM

mukhwa  muvhi

ENUM

 

15.      IDEOPHONE

The following tag is used:

Level 1:          IDEO

Examples:   

ngindi

IDEO

nzuru

IDEO

 

16.      INTERJECTION

The following tag is used:

Level 1: INT

Level 2: INT_neg

Notes:          

Examples:

nandi

INT_nil

aiwa

INT_neg

 

17.      CONJUNCTION

The following tag is used:

Level 1:          JUNC

 

 

 

 

Notes:

Examples:

ngavhe

JUNC

uri

JUNC

 

18.      NEGATIVE MORPHEME

The following tag is used:

Level 1: MNEG

Notes:

Examples:

a vha ambi

MNEG

vha sa ambi

MNEG

uri  vha si ambe

MNEG

 

19.      NOUN

The following tags are used:

Level 1: N01-10, N01a, N02b, N14, NLOC

Level 2: _aug, _dim, _loc, _name

Notes:

Examples:

mpho

N09_nil

mpho

N01a_name

tshubwana

N09_dim

mphohoni

N09_loc

noukadzi

N09_aug

wahani

N03_dim_loc

fhasi

NLOC

vhoshumani

N02b_name

20.      PLACE AND BRAND NAMES

The following tag is used:

Level 1: NPP

Level 2: NPP_name, NPP_brand

Notes:

Examples:

Bulugwane

NPP_place

coke

NPP_brand

 

21.      NUMERATIVE

The following tag is used:

NUM

Note:

22.      PARTICLE

The following tags are used:

Level 1:          PART

Level 2:         PART_cop, PART_agen, PART_hort, PART_loc, PRT_que, PART_temp, PART_ins, PART_con

Notes:

Examples:

ndi vhuriha

PART_cop

i vhonwa nga dzimmbwa

PART_agen

kha  ri vhale

PART_hort

ho vha hu oroboni

PART_loc

na vho a?

PART_que

nga Mugivhela

PART_temp

nga lufhanga

PART_ins

hu na khombo

PART_con

 

 

23.      EMPHATIC PRONOUN

The following tags are used:

Level 1: PROEMP01-10, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

Examples:

ene

PROEMP01

rie

PROEMPPERS_1pl

hune

PROEMPLOC

dzibugu dzone

PROEMP10

kha yone

PROEMP09

24.      POSSESSIVE PRONOUNS

The following tags are used:

Level 1: PROPOSS01-10, PROPOSS14-15, PROPOSSLOC, PROPOSSPERS

Level 2: PROPOSSPERS_1sg, PROPOSSPERS_1pl, PROPOSSPERS_2sg, PROPOSSPERS_2pl

Notes:

Examples:

vhana vha hawe

PROPOSS01

vhana vha hashu

PROPOSSPERS_1pl

vhana vha rie

PROPOSSPERS_1pl

milenzhe ya  dzone

PROPOSS10

zwikolo zwa hone

PROPOSSLOC

 

25.      QUANTITATIVE PRONOUNS

The following tags are used:

PROQUANT01 – 10, PROQUANT14-15, PROQUANTLOC

Notes:

Examples:

vhana vhohe

PROQUANT02

dzohe  dzo fhela

PROQUANT10

Ri na  vhohe

PROQUANT02

 

26.      QUESTION WORDS

The following tags are used:

Level 1: QUE

Level 2: QUE_N01a, QUE_N02b, QUE_loc, QUE_time, QUE_man, QUE_01 – 10, 14 – 15

Notes:

 

Examples:

                 

vho swika lini?

QUE_time

vha dzula ngafhi?

QUE_loc

vhathu vhafhio

QUE_02

U oa nnyi?

QUE_N01a

No renga mini?

QUE_nil

 

27.      TENSE MARKER

The following tags are used:

Level 1: TENSE

Level 2: TENSE_fut, TENSE_pres, TENSE_past

Notes:

Examples:

vha o amba

TENSE_fut

vha a amba

TENSE_pres

vha  nga si o amba

TENSE_fut

a vha nga ambi

TENSE_neg

 

 

 

 

28.      VERBAL

The following tag is used:

Level 1: V

Notes:

Examples:

mmbudza

V_tr

ifunza

V_tr

nndina

V_tr

shumela

V_dtr

raha

V_tr

 

29.      COPULATIVE VERB

The following tag is used:

Level 1: VCOP

Level 2: VCOP_neg

Notes:

Examples:

ndi na li

VCOP_nil

ho vha hu vhuriha

VCOP_nil

o vha a si hone

VCOP_neg

ha vha tshilimo

VCOP_nil