Open Speech and Language Resources

Phone: 425 247 4129
(Daniel Povey)


Identifier: SLR39

Summary: Spanish data, mirrored from the LDC

Category: Speech

License: apache 2.0

Download: LDC2006S37.tar.gz [2.1G]   ( Speech and transcripts )   Mirrors: [US]  

About this resource:


The Heroico corpus (LDC2006S37) was originally collected to train acoustic models for pronunciation modeling in Spanish language learning applications. The corpus consists of two main subcorpora:

1. A subcorpus collected at Mexico's Military Academy called Heroico.

2. A subcorpus collected at the United States Military Academy (USMA) in West Point New York.

The Heroico corpus is further divided into recited and prompted speech subcorpora. The recited speech appears under the recordings directory and the prompted speech under the answers directory.

The USMA subcorpus includes 1.2 hours of speech from nonnative informants and 1 hour of speech from native speakers. All the speech in the USMA corpus was recited.

The Heroico subcorpus has 11.8 hours of speech. One hour segment of speech in the Heroico corpus was recited from the same set of prompts that was used in the USMA collection.

External URL: