NCHLT: Sesotho POS tag set

Tag set

 

For purposes of annotators, this tag set is by and large taken over from Taljard et al (2008) and various documents compiled by G Faasz  and U Heid  from the IMS, Stuttgart and D J Prinsloo and E Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). As described above, the logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

 

The second level of annotation includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

Deficient

MORPH

def

 

As for the actual tagging, an additional first level of tagging is envisaged. On this level, linguistic words will be tagged. For Northern Sotho, this implies that the four orthographic units ke + a + mo + rata will be tagged as V, since together they constitute a linguistic verb. <Sesotho adaptation required here>

 

The tagset currently distinguishes 29 categories and different levels of annotation. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

1.         $ = Punctuation

2.         ABBR = abbreviation

3.         ADJ = adjective

4.         ADV = adverb

5.         ASP = aspectual marker

6.         AUX = auxilliary verb

7.         CCOP = class-indicating copulative subject concord

8.         CDEM = class-indicating demonstrative

9.         CDEMCOP = class-indicating demonstrative copulative

10.      CN = class-indicating nominal prefix

11.      CO = class-indicating object concord

12.      CPOSS = class-indicating possessive concord

13.      CS = class-indicating subject concord

14.      ENUM = enumerative

15.      IDEO = ideophone

16.      INT = interjection

17.      JUNC = conjunction

18.      MNEG =  negative morpheme

19.      N = noun

20.      NPP = place and brand name

21.      NUM = numerative

22.      PART = particle

23.      PROEMP = emphatic pronoun

24.      PROPOSS = possessive pronoun

25.      PROQUANT = quantitative pronoun

26.      QUE = question word

27.      TENSE = tense marker

28.      V = verbal

29.      VCOP = copulative verb

As we envisage going deeper into morphological analysis, we also plan for the implementation of the following tags:

AS = adjectival stem

CA = class indicating adjectival prefix

NS = noun stem

NSuf = nominal suffix

VEnd = verbal ending

VExt = verbal extension

VR = verb root

 

1.         PUNCTUATION

The tag $ is used for all punctuation marks. These include full stops, commas, colons, semi-colons, quotation marks, hyphens, exclamation marks, brackets, etc.

2.         ABBREVIATION

All abbreviations are tagged as ABBR.

 

3.         ADJECTIVE

The following tags are used:

Level 1: ADJ01-14, ADJLOC

Notes:

Examples:

            se seholo      ADJ07

4.         ADVERB

The following tags are used:

Level 1:          ADV

Level 2:         ADV_loc

Notes:

Examples:   

ruri                  ADV_nil

haModjadji    ADV_loc

5.         ASPECTUAL MARKER

The following tags are used:

Level 1: ASP

Level 2: ASP_pot, ASP_prog

Note:

The deficient verbs forms, also called deficient auxiliary verb forms, -mo, -no, yo and -tšo are tagged as ASP. <Sesotho examples required here>

 

Examples:

ba sa bua

ASP_prog

ba ka bua

ASP_pot

 

6.         AUXILLIARY

The following tag is used:

Level 1: AUX

Notes:

Examples:

ba se ba fihlile

AUX

o ile bua jwalo

AUX

 

7.         [CLASS-INDICATING] COPULATIVE SUBJECT CONCORD

The following tags are used:

Level 1: CCOP01-10, CCOP14-15, CCOPLOC, CCOPPERS

Level 2: CCOPPERS_1sg, CCOPPERS_1pl, CCOPPERS_2sg, CCOPPERS_2pl

 

 

Notes:

Examples:

le nna ke hona

CCOPPERS_1sg

borotho bo teng

CCOP14_nil

re toropong

CCOPPERS_1pl

 

8.         [CLASS-INDICATING] DEMONSTRATIVES

The followings tags are used:

CDEM01-10, CDEM14-15, CDEMLOC

Notes:

Examples:   

            batho bao     CDEM02

            sefate  seo    CDEM07

            hona moo       CDEMLOC

 

9.         [CLASS-INDICATING] COPULATIVE DEMONSTRATIVES

The followings tags are used:

Level 1: CDEMCOP

Level 2: CDEMCOP_01-10, CDEMCOP_14-15, CDEMCOP_loc

Notes:

Examples:

                        sedi                CDEMCOP_08

            ke sela                       CDEMCOP_loc

10.      [CLASS-INDICATING] NOMINAL PREFIX

11.      [CLASS-INDICATING] OBJECT CONCORD

The following tags are used:

Level 1: CO01-10, CO14-15, COLOC, COPERS

Level 2: COPERS_1pl, COPERS_2pl, COPERS_2sg

Notes:

Examples:

Ba re thusitse

COPERS_1pl

Re a ho batla

COPERS_2sg

Ke a a rata

CO06

Ba tlo se reka

CO07

12.      [CLASS-INDICATING] POSSESSIVE CONCORD

The following tags are used:

Level 1: CPOSS01-10, 14-15, CPOSSLOC

Notes:

Examples:

bana ba hae

CPOSS02

diaparo tsa bana

CPOSS08

tlasa tafole <insert possessive concord>

CPOSSLOC

 

 

13.      [CLASS-INDICATING] SUBJECT CONCORD

The following tags are used:

Level 1: CS01-10, CS14-15, CSLOC, CSINDEF, CSNEUT, CSPERS

Level 2: CSPERS_1sg, CSPERS_1pl, CSPERS_2sg, CSPERS_2pl

Notes:

Examples:

se fihlile

CS07

fatse ho a bata

CSLOC

ho a tjhesa

CSINDEF

e ne e le mariha

CSNEUT

o a tshwenya

CSPERS_2sg

ra qala mosebetsi

CSPERS_1pl

 

14.      ENUMERATIVE

The following tag is used:

Level 1:          ENUM

Note:

Examples:

mokgwa o sele

ENUM

 

 

 

15.      IDEOPHONE

The following tag is used:

Level 1:          IDEO

Examples:   

thwa  

IDEO

Pha

IDEO

 

16.      INTERJECTION

The following tag is used:

Level 1: INT

Level 2: INT_neg

Notes:          

Examples:

A e!

INT_neg

 

17.      CONJUNCTION

The following tag is used:

Level 1:          JUNC

Notes:

Examples:

hore

JUNC

 

18.      NEGATIVE MORPHEME

The following tag is used:

Level 1: MNEG

Notes:

 

 

Examples:

ha ba bue

MNEG

ba sa bue

MNEG

hore ba se bue

MNEG

 

19.      NOUN

The following tags are used:

Level 1: N01-10, N01a, N02b, N14, NLOC

Level 2: _aug, _dim, _loc, _name

Notes:

Examples:

Mpho

N09_nil

Mpho

N01a_name

Mphonyana

N09_dim

Mphong

N09_loc

Tauhadi

N09_aug

sefatenyaneng

N03_dim_loc

fatse

NLOC

 

20.      PLACE AND BRAND NAMES

The following tag is used:

Level 1: NPP

Level 2: NPP_name, NPP_brand

Notes:

Examples:

polokwane

NPP_place

coke

NPP_brand

 

21.      NUMERATIVE

The following tag is used:

NUM

Note:

22.      PARTICLE

The following tags are used:

Level 1:          PART

Level 2:         PART_cop, PART_agen, PART_hort, PART_loc, PRT_que, PART_temp, PART_ins, PART_con

Notes:

Examples:

ke mariha

PART_cop

e bonwa ke dintja

PART_agen

a re bale

PART_hort

ka kua toropong

PART_loc

na ba tlile?

PART_que

ka Moqebelo

PART_temp

ka thipa

PART_ins

ho na le kotsi

PART_con

 

23.      EMPHATIC PRONOUN

The following tags are used:

Level 1: PROEMP01-10, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

Examples:

yena

PROEMP01

Rona

PROEMPPERS_1pl

hona

PROEMPLOC

dibuka tsona

PROEMP10

ka yona

PROEMP09

24.      POSSESSIVE PRONOUNS

The following tags are used:

Level 1: PROPOSS01-10, PROPOSS14-15, PROPOSSLOC, PROPOSSPERS

Level 2: PROPOSSPERS_1sg, PROPOSSPERS_1pl, PROPOSSPERS_2sg, PROPOSSPERS_2pl

 

Notes:

Examples:

bana ba gagwe

PROPOSS01

bana ba geso

PROPOSSPERS_1pl

bana ba rena

PROPOSSPERS_1pl

maoto a tsona

PROPOSS10

dikolo tsa gona

PROPOSSLOC

 

25.      QUANTITATIVE PRONOUNS

The following tags are used:

PROQUANT01 – 10, PROQUANT14-15, PROQUANTLOC

Notes:

Examples:

bana bohle

PROQUANT02

tsohle di fedile

PROQUANT10

rena bohle

PROQUANT02

 

26.      QUESTION WORDS

The following tags are used:

Level 1: QUE

Level 2: QUE_N01a, QUE_N02b, QUE_loc, QUE_time, QUE_man, QUE_01 – 10, 14 – 15

Notes:

Examples:

                 

ba fihlile neng?

QUE_time

ba dula kae?

QUE_loc

batho bafe

QUE_02

o batla mang?

QUE_N01a

o rekile eng?

QUE_nil

 

27.      TENSE MARKER

The following tags are used:

Level 1: TENSE

Level 2: TENSE_fut, TENSE_pres, TENSE_past

Notes:

 

 

 

 

 

Examples: <check for correctness>

ba tlo bua

TENSE_fut

ba a bua

TENSE_pres

ba ka se bua

TENSE_fut

ha ba a bua

TENSE_neg

 

28.      VERBAL

 

 

The following tag is used:

Level 1: V

Notes:

Examples:

mmotsa

V_tr

Ithuta

V_tr

ntshwenya

V_tr

Etsetsa

V_dtr

Eja

V_tr

 

29.      COPULATIVE VERB

The following tag is used:

Level 1: VCOP

Level 2: VCOP_neg

 

 

 

Notes:

Examples:

ke na le

VCOP_nil

h e le mariha <check for correctness>

VCOP_nil

ha a le siko

VCOP_neg

ya ba selemo

VCOP_nil

 

 

Working Tagset

 

ADJ01

 

Adjective

ADJ02

 

Adjective

ADJ03

 

Adjective

ADJ04

 

Adjective

ADJ05

 

Adjective

ADJ06

 

Adjective

ADJ07

 

Adjective

ADJ08

 

Adjective

ADJ09

 

Adjective

ADJ10

 

Adjective

ADJ14

 

Adjective

ADJC01

 

Adjective Concord

ADJC02

 

Adjective Concord

ADJC03

 

Adjective Concord

ADJC04

 

Adjective Concord

ADJC05

 

Adjective Concord

ADJC06

 

Adjective Concord

ADJC07

 

Adjective Concord

ADJC08

 

Adjective Concord

ADJC09

 

Adjective Concord

ADJC10

 

Adjective Concord

ADJC15

 

Adjective Concord

ADJLOC

 

Adjective

ADJ15

 

Adjective

ADV

 

Adverb

CCOP07

 

Copulative concord

CCOP09

 

Copulative concord

CCOP09

 

Copulative concord

CCOP09

 

Copulative concord

CCOP10

 

Copulative concord

CCOP10

 

Copulative concord

CCOP10

 

Copulative concord

CCOPPERS

 

Copulative concord

CD01

 

Demonstrative

CD02

 

Demonstrative

CD03

 

Demonstrative

CD04

 

Demonstrative

CD05

 

Demonstrative

CD06

 

Demonstrative

CD07

 

Demonstrative

CD08

 

Demonstrative

CD09

 

Demonstrative

CD10

 

Demonstrative

CD14

 

Demonstrative

CD15

 

Demonstrative

CD17

 

Demonstrative

CD18

 

Demonstrative

CDLOC

 

Demonstrative

CN

 

Infinitive class prefix

CO01

 

Object concord

CO02

 

Object concord

CO03

 

Object concord

CO04

 

Object concord

CO05

 

Object concord

CO06

 

Object concord

CO07

 

Object concord

CO08

 

Object concord

CO09

 

Object concord

CO10

 

Object concord

CO14

 

Object concord

CO15

 

Object concord

COLOC

 

Object concord

CONJ

 

Conjunctive

COPERS

 

Object concord

CPOSS01

 

Possessive concord

CPOSS02

 

Possessive concord

CPOSS03

 

Possessive concord

CPOSS04

 

Possessive concord

CPOSS05

 

Possessive concord

CPOSS06

 

Possessive concord

CPOSS07

 

Possessive concord

CPOSS08

 

Possessive concord

CPOSS09

 

Possessive concord

CPOSS10

 

Possessive concord

CPOSS14

 

Possessive concord

CPOSS15

 

Possessive concord

CPOSSLOC

 

Possessive concord

CS01

 

Subject concord

CS02

 

Subject concord

CS03

 

Subject concord

CS04

 

Subject concord

CS05

 

Subject concord

CS06

 

Subject concord

CS07

 

Subject concord

CS08

 

Subject concord

CS09

 

Subject concord

CS10

 

Subject concord

CS14

 

Subject concord

CS15

 

Subject concord

CSINDEF

 

Subject concord

CSLOC

 

Subject concord

CSNEUT

 

Subject concord

CSPERS

 

Subject concord

ENUM

 

Enumerative

IDEO

 

Idiophone

INF

 

Infinitive class prefix

INT

 

Interjection

MORPHFUT

 

Future

MNEG

 

Negative morpheme

MORPHPER

 

Progressive

MORPHPOT

 

Potential

MORPHPRES

 

Present tense marker

N01

 

Noun

N01a

 

Noun

N02

 

Noun

N02b

 

Noun

N03

 

Noun

N04

 

Noun

N05

 

Noun

N06

 

Noun

N07

 

Noun

N08

 

Noun

N09

 

Noun

N10

 

Noun

N14

 

Noun

N16

 

Noun

N17

 

Noun

N18

 

Noun

NLOC

 

Noun

PART

 

Particle

 

PROEMP01

 

Emphatic pronoun

PROEMP02

 

Emphatic pronoun

PROEMP03

 

Emphatic pronoun

PROEMP04

 

Emphatic pronoun

PROEMP05

 

Emphatic pronoun

PROEMP06

 

Emphatic pronoun

PROEMP07

 

Emphatic pronoun

PROEMP08

 

Emphatic pronoun

PROEMP09

 

Emphatic pronoun

PROEMP10

 

Emphatic pronoun

PROEMP14

 

Emphatic pronoun

PROEMP15

 

Emphatic pronoun

PROEMPLOC

 

Emphatic pronoun

 

PROEMPPERS

 

Emphatic pronoun

PROPOSS02

 

Possessive pronoun

PROPOSS03

 

Possessive pronoun

PROPOSS04

 

Possessive pronoun

PROPOSS05

 

Possessive pronoun

PROPOSS06

 

Possessive pronoun

PROPOSS07

 

Possessive pronoun

PROPOSS08

 

Possessive pronoun

PROPOSS09

 

Possessive pronoun

PROPOSS10

 

Possessive pronoun

PROPOSS14

 

Possessive pronoun

PROPOSSPERS

 

Possessive pronoun

PROQUANT01

 

Quantitative pronoun

PROQUANT02

 

Quantitative pronoun

PROQUANT03

 

Quantitative pronoun

PROQUANT04

 

Quantitative pronoun

PROQUANT05

 

Quantitative pronoun

PROQUANT06

 

Quantitative pronoun

PROQUANT07

 

Quantitative pronoun

PROQUANT08

 

Quantitative pronoun

PROQUANT09

 

Quantitative pronoun

PROQUANT10

 

Quantitative pronoun

PROQUANT14

 

Quantitative pronoun

PROQUANT15

 

Quantitative pronoun

PROQUANT17

 

Quantitative pronoun

PROQUANTLOC

 

Quantitative pronoun

QUE

 

Question word

RO

 

RS

 

RV

 

V

 

Verb

VAUX

 

Auxiliary verb

VCOP

 

Copulative verb

ZE

 

ZM

 

ZPL

 

ZPR