Consult the corpus

The CORMA corpus (Corpus Oral del Español Hablado en Madrid) is a collection of spontaneous spoken Spanish recorded in Madrid. It forms part of a broader research initiative on variation and change in contemporary Spanish. The corpus aims to provide empirically grounded data for the study of pragmatic, syntactic, and sociolinguistic phenomena in natural conversation.

Corpus contents

The materials included here correspond to the pseudonymised versions of the recorded conversations, distributed as plain text (.txt) files.

Each file contains:

  • A short metadata section specifying information about the speakers involved in the conversation (e.g., code, age, gender, social class), and
  • The orthographic transcription of the conversation itself.

Metadata files

Two accompanying Excel files provide detailed metadata and quantitative information:

CORMA21_metadatos_conversaciones.xlsx

Contains structured metadata about the conversations as a whole, including:

  • Conversation codes and structure of identifiers
  • Recording period
  • Context or discourse situation
  • Number of participants
  • Distribution by generation, gender, and social class

Download conversaciones excel

CORMA21_metadatos_hablantes.xlsx

Provides quantitative information about the speakers, including:

  • Sociodemographic features (age, gender, social class, discourse situation)
  • Number of words produced by each speaker in the corpus

Download hablantes excel

Ethical and methodological considerations

All files have been pseudonymised to protect speaker privacy, following institutional ethical guidelines. Identifying information has been removed or replaced with neutral codes.

Recordings and transcriptions

  • AM.GEN4.F.01a_(1)
  • play
  • Transcription:
  • pdf
  • txt
  • Transcription with chrono:
  • pdf
  • txt
  • AM.GEN4.F.01a_(2)
  • play
  • Transcription:
  • pdf
  • txt
  • Transcription with chrono:
  • pdf
  • txt
  • AM.GEN4.F.01a_(3)
  • play
  • Transcription:
  • pdf
  • txt
  • Transcription with chrono:
  • pdf
  • txt
  • AM.GEN4.F.01b_(1)
  • play
  • Transcription:
  • pdf
  • txt
  • Transcription with chrono:
  • pdf
  • txt
  • AM.GEN4.F.01b_(2)
  • play
  • Transcription:
  • pdf
  • txt
  • Transcription with chrono:
  • pdf
  • txt
  • AM.GEN4.F.01b_(3)
  • play
  • Transcription:
  • pdf
  • txt
  • Transcription with chrono:
  • pdf
  • txt
  • FA.02_sobre_negocios
  • play
  • Transcription:
  • pdf
  • txt
  • Transcription with chrono:
  • pdf
  • txt

 

Download corpus

Disclaimer

The corpus data is to be used for academic purposes only.

By accessing the data, you agree to the following terms and conditions of use:

  • The data cannot be publicly distributed, published, transferred, or sold without the explicit written permission of the corpus creators.
  • When using the data in academic works, presentations, conferences, etc., ensure that any personal data in the extract is removed to protect the privacy of the participants.
  • Citing the corpus is mandatory.

Access to the corpus may be revoked if there is evidence that a user has not respected these conditions.