Text Corpora
Appen has a variety of text collections in different languages available for license. Additional corpora are being added continuously.
- Single language texts: the texts collected are from many different situations, different writers and in various languages. The text types we are collecting are: travel, sports, recipes, instructions, descriptions, narrative, letters, opinions, fables, children's stories and email.
- Parallel text corpora: some of these texts are now available in parallel format - i.e. translation of one text into a second language.
- Named entity annotated texts: corpora of 500,000 words have also been developed in several languages (named entities included persons, titles, quantities, geopolitical entities, locations, facilities, etc.
Please use our interactive Product Catalogue for a list and detailed descriptions of text corpora available for license.