Technologies and Supported Languages

Top  Previous  Next

In total, 61 existing core technologies are available through web services. See the relevant topics below for more information on these technologies.

 

Language identifier (1)

 

OCR engines (10)

 

Tokenisers (10)

 

Sentence boundary detectors (10)

 

Part-of-speech (POS) taggers (10)

 

Named-entity recognisers (10)

 

Phrase chunkers (10)

 

The core technologies fully support all official South African languages, with the exception of English. We do not include English Part of Speech tagging, Named Entity Recognition, or Phrase Chunking, as these tools are readily available elsewhere.

Undefined is only applicable to language identification when LID cannot determine the language of a given document or line (see Language ID).

 

 

Language name

ISO 2 letter code

ISO 3 letter code

Conjunctive / Disjunctive

LID

OCR

POS

NER

PC

Afrikaans

AF

AFR

Disjunctive

x

x

x

x

x

English

EN

ENG

Disjunctive

x

x

 

 

 

isiNdebele

NR

NBL

Conjunctive

x

x

x

x

x

isiXhosa

XH

XHO

Conjunctive

x

x

x

x

x

isiZulu

ZU

ZUL

Conjunctive

x

x

x

x

x

Sesotho sa Leboa

NSO

NSO

Disjunctive

x

x

x

x

x

Sesotho

ST

SOT

Disjunctive

x

x

x

x

x

Setswana

TN

TSN

Disjunctive

x

x

x

x

x

SiSwati

SS

SSW

Conjunctive

x

x

x

x

x

Tshivenḓa

VE

VEN

Disjunctive

x

x

x

x

x

Xitsonga

TS

TSO

Disjunctive

x

x

x

x

x

 

 

 

 

 

 

 

 

 

Undefined

NA

NA / NONE

-

x