Technologies and Supported Languages |
Top Previous Next |
In total, 85 existing core technologies are available through web services. See the relevant topics below for more information on these technologies.
• Language identifier (1)
• OCR engines (10)
• Tokenisers (10)
• Sentence boundary detectors (10)
• Part of speech (POS) taggers (10)
• Named entity recognisers (10)
• Phrase chunkers (10) • Universal part of speech (UPOS) taggers (10)
• Lemmatiser (10)
• Morphological Analyser (4)
CTexT NCHLT Web Services fully supports all official South African languages, except for English. We do not include English Part of Speech tagging, Named Entity Recognition, or Phrase Chunking, Universal Part of Speech Tagging, Lemmatistion, Tokenisation or Morphological Analysis as these tools are readily available elsewhere.
Undefined is only applicable to language identification when LID cannot determine the language of a given document or line (see Language ID).
|