Named Entity Recognition (NER)

Top Previous Next

Named-Entity Recognition (NER) is the process of automatically classifying distinct unique identifiers, known as named entities (NE), according to a predefined set of categories. Automatic NER systems were developed for each of the languages, typically serving as baseline systems that could be used for other development projects or as starting points for improving NER systems for South African languages. Although several different techniques have proven to be accurate for NER classification, it was decided to use linear-chain conditional random fields (CRFs) with L2 regularisation, as this method has been shown to be both effective and scalable for solving sequence labelling problems in the NER domain.

See Annotation Tag Sets for tag details.

Evaluation results for NER:

(As reported in Eiselen, R, 2016, Government Domain Named Entity Recognition for South African Languages, In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, Portorož, Slovenia, European Language Resources Association (ELRA), pp. 3344–3348.)

Language	F-score
Afrikaans	0.7586
isiNdebele	0.7510
isiXhosa	0.7708
isiZulu	0.6993
Sesotho sa Leboa	0.7446
Sesotho	0.7309
Setswana	0.7806
Siswati	0.6429
Tshivenḓa	0.7343
Xitsonga	0.7093