Technologies and Supported Languages

Top  Previous  Next

Core Technologies

 

 

In total, 85 existing core technologies are available through web services. See the relevant topics below for more information on these technologies.

 

Language identifier (1)

 

OCR engines (10)

 

Tokenisers (10)

 

Sentence boundary detectors (10)

 

Part of speech (POS) taggers (10)

 

Named entity recognisers (10)

 

Phrase chunkers (10)
 

Universal part of speech (UPOS) taggers (10)

 

Lemmatiser (10)

 

Morphological Analyser (4)

 

Supported Languages

 

 

CTexT NCHLT Web Services fully supports all official South African languages, except for English. We do not include English Part of Speech tagging, Named Entity Recognition, or Phrase Chunking, Universal Part of Speech Tagging, Lemmatistion, Tokenisation or Morphological Analysis as these tools are readily available elsewhere.

 

Undefined is only applicable to language identification when LID cannot determine the language of a given document or line (see Language ID).

 

 

Language name

ISO 2 letter code

ISO 3 letter code

Conjunctive / Disjunctive

TOK

SENT

LID

OCR

POS

NER

PC

 

UPOS

LEM

MA

Afrikaans

AF

AFR

Disjunctive

x

x

x

x

x

x

x

x

x

 

English

EN

ENG

Disjunctive

 

x

x

x

 

 

 

 

 

 

isiNdebele

NR

NBL

Conjunctive

x

x

x

x

x

x

x

x

x

x

isiXhosa

XH

XHO

Conjunctive

x

x

x

x

x

x

x

x

x

x

isiZulu

ZU

ZUL

Conjunctive

x

x

x

x

x

x

x

x

x

x

Sesotho sa Leboa

NSO

NSO

Disjunctive

x

x

x

x

x

x

x

x

x

 

Sesotho

ST

SOT

Disjunctive

x

x

x

x

x

x

x

x

x

 

Setswana

TN

TSN

Disjunctive

x

x

x

x

x

x

x

x

x

 

Siswati

SS

SSW

Conjunctive

x

x

x

x

x

x

x

x

x

x

Tshivenḓa

VE

VEN

Disjunctive

x

x

x

x

x

x

x

x

x

 

Xitsonga

TS

TSO

Disjunctive

x

x

x

x

x

x

x

x

x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Undefined

NA

NA / NONE

-

 

 

x