Home Language Resources and Services Data Collections Data Collection Types and Locations

Data Collection Types and Locations

Appen has performed speech and language data collections in more than 80 languages across 40+ countries around the world -- from North and South-East Asia, North Africa, the Middle East, Europe, Scandinavia, North and South America.

telephonystreetdx1.jpgAppen has experience collecting in a variety of modes. These include:

  • telephony - fixed-line, mobile, in-car
  • microphone recorded - for embedded device applications
  • broadcast - for acoustic search applications
  • desktop
  • web interface
  • field microphone
  • tablet style PCs

studiorecording2.jpgAppen data collections have been based in a wide range of locations:

  • in-car (microphone and telephony) - include some involving our experience with Lombard effect
  • recording studio
  • office environment
  • street and public place recordings

The range of speech and language types collected includes:

  • handwritingdx.jpgscripted speech
  • elicited speech
  • free speech
  • two-way conversational speech
  • multi-speaker meeting interaction
  • text corpora - emails, SMS, vowelised Arabic, ontologies, domain-specific materials
  • handwriting - databases for handwriting recognition and document parsing (diagrams etc)