CORPORA from CSLU: Yes/No v1.2

Case ID:
Web Published:

The Yes/No Corpus is a collection of answers to yes/no questions from other CSLU corpora.

Recording Details:
The data in this corpus were collected over telephone lines. They were collected from both analog and digital phone lines.

The analog data were recorded using a Gradient Technologies analog-to- digital conversion box. These files were recorded as 16-bit, 8 khz and stored in a linear format.

The digital data were recorded with the CSLU T1 digital data collection system. These files were sampled at 8 khz 8-bit and stored as ulaw files.

All of the data use the RIFF standard file format. This file format is 16-bit linearly encoded.

Directory Structure:
There are five top-level directories in this distribution:

  • docs -- the docs directory contains assorted documentation files
  • labels -- the labels directory contains .phn files containing time align phoneme transcription.
  • misc -- the misc directory contains scripts and archival information
  • speech -- the speech directory contains the .wav files containing speech data
  • trans -- the trans directory contains .txt files containing transcriptions of the corresponding .wav files in the speech directory

The speech and trans directories contain the data files, which have the following name structure:




xxxxx = call number


y = utterance code


zzz = file extension (txt/wav)

The Center for Spoken Language Understanding (CSLU) distributes corpora to commercial entities and academic institutions for a fee. Commercial entities can use these corpora for research but also for creating commercial products such as generating acoustic models for speech recognition.


To place your order:

1. Click on the type of license you wish to order: Academic or non-profit entity or Commercial entity.

2. Terms of the license agreement can be viewed by clicking on the word "terms".

3. You agree to the terms of the license agreement when you click on "Add to Order" and proceed to the next screen.

4. If information on the "Order Contents" screen is correct, press "Check out".

5. On the next screen, a brief "Intended Use" is required. For "Recipient Scientist Information" enter the appropriate information for yourself or if you are placing the order for another person enter that information. We will use this information should we have questions about the order, payment or shipping address.

6. Once your payment has been received and verified by OHSU, your order will be approved by Technology Transfer & Business Development and then the DVD will be sent out by the Center for Spoken Language Understanding by FedEx within 5-10 business days.  


For demos and more information, visit the CSLU Corpora website at: 


Files will be made available by download from which requires customers to set up a free account. 


Patent Information:
Speech & Language
For Information, Contact:
Arvin Paranjpe
Technology Development Manager
Oregon Health & Science University
(503) 494-8200
Education & Training
Education & Training - Speech & Language
© 2022. All Rights Reserved. Powered by Inteum