NCHLT: isiNdebele POS tag set

Tag set

For purposes of annotators, this tag set is by and large taken over from Taljard et al (2008) and various documents compiled by G Faasz  and U Heid  from the IMS, Stuttgart and D J Prinsloo and E Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). As described above, the logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

 

The second level of annotation includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

deficient

MORPH

def

 

As for the actual tagging, an additional first level of tagging is envisaged. On this level, linguistic words will be tagged. For Northern Sotho, this implies that the four orthographic units ke + a + mo + rata will be tagged as V, since together they constitute a linguistic verb.

 

The tagset currently distinguishes 29 categories and different levels of annotation. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

1.         $ = Punctuation

2.         ABBR = abbreviation

3.         ADJ = adjective

4.         ADV = adverb

5.         ASP = aspectual marker

6.         AUX = auxilliary verb

7.         CCOP = class-indicating copulative subject concord

8.         CDEM = class-indicating demonstrative

9.         CDEMCOP = class-indicating demonstrative copulative

10.      CN = class-indicating nominal prefix

11.      CO = class-indicating object concord

12.      CPOSS = class-indicating possessive concord

13.      CS = class-indicating subject concord

14.      ENUM = enumerative

15.      IDEO = ideophone

16.      INT = interjection

17.      JUNC = conjunction

18.      MNEG =  negative morpheme

19.      N = noun

20.      NPP = place and brand name

21.      NUM = numerative

22.      PART = particle

23.      PROEMP = emphatic pronoun

24.      PROPOSS = possessive pronoun

25.      PROQUANT = quantitative pronoun

26.      QUE = question word

27.      TENSE = tense marker

28.      V = verbal

29.      VCOP = copulative verb

As we envisage going deeper into morphological analysis, we also plan for the implementation of the following tags:

AS = adjectival stem

CA = class indicating adjectival prefix

NS = noun stem

NSuf = nominal suffix

VEnd = verbal ending

VExt = verbal extension

VR = verb root

1.         PUNCTUATION

 

The tag $ is used for all punctuation marks. These include full stops, commas, colons, semi-colons, quotation marks, hyphens, exclamation marks, brackets, etc.

 

2.         ABBREVIATION

 

All abbreviations are tagged as ABBR.

 

3.         ADJECTIVE

 

The following tags are used:

Level 1: ADJ01-14, ADJLOC

Notes:

Examples:

            esikhulu        ADJ07

            la kukuhle     ADJLOC

4.         ADVERB

 

The following tags are used:

Level 1:          ADV

Level 2:         ADV_loc

Notes:

Examples:   

kwamambala                       ADV_nil

kwaZakhele  ADV_loc

5.         ASPECTUAL MARKER

 

The following tags are used:

Level 1: ASP

Level 2: ASP_pot, ASP_prog

Note:

Examples:

bayakhuluma

ASP_nil

basakhuluma

ASP_prog

bangakhuluma

ASP_pot

 

6.         AUXILLIARY

 

The following tag is used:

Level 1: AUX

Notes:

Examples:

sebafikile

AUX

besele bafikile

AUX

 

 

7.         [CLASS-INDICATING] COPULATIVE SUBJECT CONCORD

 

The following tags are used:

Level 1: CCOP01-10, CCOP14-15, CCOPLOC, CCOPPERS

Level 2: CCOPPERS_1sg, CCOPPERS_1pl, CCOPPERS_2sg, CCOPPERS_2pl

Notes:

Examples:

Nami ngikhona

CCOPPERS_1sg

Uborotho bukhona

CCOP14_nil

sisedorobheni

CCOPPERS_1pl

 

8.         [CLASS-INDICATING] DEMONSTRATIVES

 

The followings tags are used:

CDEM01-10, CDEM14-15, CDEMLOC

Notes:

Examples:   

            abantu laba  CDEM02

            isililo leso      CDEM07

            khona lapha CDEMLOC

 

9.         [CLASS-INDICATING] COPULATIVE DEMONSTRATIVES

 

The followings tags are used:

Level 1: CDEMCOP

Level 2: CDEMCOP_01-10, CDEMCOP_14-15, CDEMCOP_loc

Notes:

Examples:

            ngilaba          CDEMCOP_01

            ngilabo          CDEMCOP_08

            ngilabaya      CDEMCOP_loc

 

10.      [CLASS-INDICATING] NOMINAL PREFIX

11.      [CLASS-INDICATING] OBJECT CONCORD

 

The following tags are used:

Level 1: CO01-10, CO14-15, COLOC, COPERS

Level 2: COPERS_1pl, COPERS_2pl, COPERS_2sg

Notes:

Examples:

basisizile

COPERS_1pl

siyakufuna

COPERS_2sg

ngiyathanda

CO06

bazosithenga

CO07

12.      [CLASS-INDICATING] POSSESSIVE CONCORD

 

The following tags are used:

Level 1: CPOSS01-10, 14-15, CPOSSLOC

Notes:

Examples:

Abantwana bakhe

CPOSS02

Izambatho zakhe

CPOSS08

Ngaphasi kwetafula

CPOSSLOC

 

13.      [CLASS-INDICATING] SUBJECT CONCORD

 

The following tags are used:

Level 1: CS01-10, CS14-15, CSLOC, CSINDEF, CSNEUT, CSPERS

Level 2: CSPERS_1sg, CSPERS_1pl, CSPERS_2sg, CSPERS_2pl

Notes:

Examples:

sifikile

CS07

azikafiki

CS10

phasi kumakhaza

CSLOC

kuyatjhisa

CSINDEF

bekusebusika

CSNEUT

uyatshwenya

CSPERS_2sg

sithoma umsebenzi

CSPERS_1pl

 

14.      ENUMERATIVE

 

The following tag is used:

Level 1:          ENUM

Note:

Examples:

Ilanga linye  

ENUM

Umuzi muphi

ENUM

 

15.      IDEOPHONE

 

The following tag is used:

Level 1:          IDEO

Examples:   

dutlu

IDEO

phara

IDEO

 

16.      INTERJECTION

 

The following tag is used:

Level 1: INT

Level 2: INT_neg

Notes:          

Examples:

iye

INT_nil

awa

INT_neg

 

17.      CONJUNCTION

 

The following tag is used:

Level 1:          JUNC

Notes:

Examples:

nanyana

JUNC

ngombana

JUNC

 

18.      NEGATIVE MORPHEME

 

The following tag is used:

Level 1: MNEG

Notes:

Examples:

abakhulumi

MNEG

bangakhulumi

MNEG

Ukobana bangakhulumi

MNEG

 

19.      NOUN

 

The following tags are used:

Level 1: N01-10, N01a, N02b, N14, NLOC

Level 2: _aug, _dim, _loc, _name

Notes:

Examples:

UBavukile

N09_nil

UThokozani

N01a_name

UMsanyana

N09_dim

Embusweni

N09_loc

indlovukazi

N09_aug

emtjhaneni

N03_dim_loc

phasi

NLOC

AboNtuli

N02b_name

20.      PLACE AND BRAND NAMES

 

The following tag is used:

Level 1: NPP

Level 2: NPP_name, NPP_brand

Notes:

Examples:

EPitori

NPP_place

icoke

NPP_brand

 

21.      NUMERATIVE

 

The following tag is used:

NUM

Note:

22.      PARTICLE

 

The following tags are used:

Level 1:          PART

Level 2:         PART_cop, PART_agen, PART_hort, PART_loc, PRT_que, PART_temp, PART_ins, PART_con

Notes:

Examples:

busika

PART_cop

Ibonwe zizinja

PART_agen

asibale

PART_hort

le edorobheni

PART_loc

Alo bafikile?

PART_que

ngoMgqibelo

PART_temp

ngomukhwa

PART_ins

kunengozi

PART_con

 


23.      EMPHATIC PRONOUN

 

The following tags are used:

Level 1: PROEMP01-10, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

Examples:

yena

PROEMP01

thina

PROEMPPERS_1pl

khona

PROEMPLOC

Iincwadi zona

PROEMP10

ngayo

PROEMP09

24.      POSSESSIVE PRONOUNS

 

The following tags are used:

Level 1: PROPOSS01-10, PROPOSS14-15, PROPOSSLOC, PROPOSSPERS

Level 2: PROPOSSPERS_1sg, PROPOSSPERS_1pl, PROPOSSPERS_2sg, PROPOSSPERS_2pl

Notes:

 

 

 

 

 

Examples:

Abantwana bakhe

PROPOSS01

Abantwana bekhethu

PROPOSSPERS_1pl

Abantwana bethu

PROPOSSPERS_1pl

Iinyawo zazo

PROPOSS10

Iinkolo zakhona

PROPOSSLOC

 

25.      QUANTITATIVE PRONOUNS

 

The following tags are used:

PROQUANT01 – 10, PROQUANT14-15, PROQUANTLOC

Notes:

Examples:

abantwana boke

PROQUANT02

koke kuphelile

PROQUANT10

thina soke

PROQUANT02

 

26.      QUESTION WORDS

 

The following tags are used:

Level 1: QUE

Level 2: QUE_N01a, QUE_N02b, QUE_loc, QUE_time, QUE_man, QUE_01 – 10, 14 – 15

Notes:

Examples:

                 

bafike  nini?

QUE_time

bahlala kuphi?

QUE_loc

abantu baphi?

QUE_02

ufuna bani?

QUE_N01a

uthenge ini?

QUE_nil

 

27.      TENSE MARKER

 

The following tags are used:

Level 1: TENSE

Level 2: TENSE_fut, TENSE_pres, TENSE_past

Notes:

 

 

 

 

 

 

Examples:

bazokukhuluma

TENSE_fut

Bayakhuluma

TENSE_pres

angikazokutjela

TENSE_fut

abakazokukhuluma

TENSE_neg

 

28.      VERBAL

 

The following tag is used:

Level 1: V

Notes:

Examples:

mila

V_tr

funda

V_tr

tshwenya

V_tr

yenzela

V_dtr

idla

V_tr

 

29.      COPULATIVE VERB

 

The following tag is used:

Level 1: VCOP

Level 2: VCOP_neg

Notes:

Examples:

Ukuba ngusiyazi

 

VCOP_nil

Aba makhaza

VCOP_nil

Ukungabi ngusiyazi

VCOP_neg

Akabi makhaza

VCOP_nil