Language Identifier (LID)

Top  Previous  Next

Language Identifier (LID) is a tool to automatically classify a text document by line or document as one of the eleven official South African languages.

 

Evaluation of LID:

(In-house evaluation)

 

Language

Document level F-score

Line level F-score

Afrikaans

0.99

0.99

English

0.99

0.99

isiNdebele

0.99

0.95

isiXhosa

0.99

0.97

isiZulu

0.98

0.97

Sesotho sa Leboa

0.99

0.99

Sesotho

0.99

0.99

Setswana

0.99

0.98

SiSwati

0.99

0.98

Tshivenḓa

1.0

0.99

Xitsonga

1.0

0.99