Named Entity Recognition (NER)

Top  Previous  Next

Named entity recognition (NER) is the process of automatically classifying different unique identifiers known as named entities (NE), according to a predefined set of types.

In the second part of the NCHLT Text Phase II project, automatic NER systems for each of the languages were developed to form, in most cases, baseline systems that could be used for other development projects, or as starting points from which to improve NER systems for the South African languages. Although several different techniques have been shown to be accurate approaches to NER classification, it was decided to use linear-chain conditional random fields (CRFs) with L2 regularisation, since this has been shown to be an effective and scalable technique to solve sequence labelling problems in the NER domain.

 

See Annotation Tag Sets for tag details.

 

Evaluation results for NER:

(As reported in Eiselen, R., 2016. Government Domain Named Entity Recognition for South African Languages. In LREC.)

 

Language

F-score

Afrikaans

0.7586

isiNdebele

0.7510

isiXhosa

0.7708

isiZulu

0.6993

Sesotho sa Leboa

0.7446

Sesotho

0.7309

Setswana

0.7806

SiSwati

0.6429

Tshivenḓa

0.7343

Xitsonga

0.7093