Lexicons

Appen provides pronunciation lexicons in many languages, including standard and non-standard varieties of languages spoken in different countries.

Examples include:

  • French (Canada, France, Belgium, etc)
  • Arabic (Maghreb, Iraq, UAE, Levantine etc)
  • German (Switzerland, Germany, Italy, etc)
  • English (USA, UK, NZ, South Africa, Singapore etc)

Our holdings include millions of words in 40+ languages. See the Appen holdings listed in the Product Catalogue.

Appen works in all scripts and directionality. Appen lexicons are structured to maximise their usefulness to clients and can be delivered in formats to suit end-user needs. All lexicons are developed with consistent and correct spelling; phonemes are written in SAMPA, the keyboard equivalent of the International Phonetic Alphabet (IPA); conversion to client formats can be performed if required.

Appen Lexicons include some or all of the following features:

  • Multiple pronunciation variants
  • Variant ranking
  • Stress and syllabification marking
  • Part of Speech, lemma, frequency tagging
  • Annotation of regional and dialectal varieties
  • Native speaker staff processing

For most languages, sub-lexicons are available in specific categories such as common words, numbers, given names, family names and place names.

Automatic Speech Recognition (ASR)

Appen develops variant pronunciation lexicons, based on common forms and regional dialects. The number of variants provided is by request. Appen linguists work in all languages and provide phonological analyses of lesser known languages and less well researched regional varieties of languages to help ASR developers understand the phonotactic processes in operation while decoding speech. Information on mapping between languages can be provided and Appen has also developed lexicons to match transcription data. [More]

Text-to-Speech (TTS)

Appen provides pronunciation lexicons for text-to-speech applications, and undertakes tuning of the exceptions dictionaries for live TTS applications. This is especially relevant where people and place names are involved. Appen has extensive experience in names pronunciation and extensive holdings of names. Aligned grammatical and prosodic mark up of speech can be performed which delivers appropriate information for generation of natural sounding speech. Appen can also provide evaluation reports on TTS systems in speech applications to help developers optimise performance and adapt products to user behavior. [More]