About

The aim of the Spoken Turkish Corpus project (STC), which is being hosted and supported by the Department of Foreign Language Education since October 2008, is to construct a linguistically analyzed resource consisting of one million words of face-to-face or mediated interactions in present-day Turkish. This corpus will be made available to academia and researchers in all areas of studies related to language.

STC was supported by TÜBİTAK through October 2008 to April 2010 under project no. 108K283, and later has been being analyzed with the support of METU project no. BAP-05-03-2011-001. The first small-scale trial corpus has accessible to researchers since 2010. A 400,000 running word new (2.0) version will be available toward the end of 2024. In the years to follow, we plan to increase the size of the corpus. In the project, sound/video recordings are transcribed with EXMARaLDA. The transcriptions are time-lined with the recordings for viewing and conducting searches.

1. Distribution
Users of the Spoken Turkish Corpus will not distribute the corpus to any third party under any condition.

2. Protection of Speaker Identities
Users of the Spoken Turkish Corpus will not disclose any information that might be used to identify the speakers in the recordings. Any written and/or oral publication or sharing of information ensuing from use of the corpus will keep strictly confidential the ID information of the speakers in the corpus (e.g., names, home and/or workplace addresses, names of institutions, etc.)

Users of the Spoken Turkish Corpus will not share any information about speakers other than that coded in the corpus in their publications or in sharing of information about the corpus.

3. Evaluating Speaker Utterances
Criticism and/or derogatory comments and evaluations of the utterances of speakers or the speakers themselves can under no condition be made public.

Your Contribution

The success of the STC project crucially depends on voluntary contributions.

If you would wish to contribute existing recordings of contemporary Turkish to the project, please fill in the Contributing Archives Form at the web site of the project or send an e-mail to the address below. Please do not hesitate to write to us for any queries.

The names of individuals and institutions contributing recordings to the corpus are published on the web site. Individuals and institutions who make or donate recordings or contribute to doing the transcriptions are awarded certificates.

odtustd@metu.edu.tr

STC was supported by TÜBİTAK through October 2008 to April 2010 under project no. 108K283, and is currently being analyzed with the support of METU project no. BAP-05-03-2011-001. The first small-scale trial corpus is accessible to researchers since 2010. A 400,000 running word beta version will be available toward the end of 2013. In the years to follow, we plan to increase the size of the corpus. In the project, sound/video recordings are transcribed with EXMARaLDA. The transcriptions are time-lined with the recordings for viewing and conducting searches.

1. Distribution
Users of the Spoken Turkish Corpus will not distribute the corpus to any third party under any condition.

Users of the Spoken Turkish Corpus will not share any information about speakers other than that coded in the corpus in their publications or in sharing of information about the corpus.

3. Evaluating Speaker Utterances
Criticism and/or derogatory comments and evaluations of the utterances of speakers or the speakers themselves can under no condition be made public.

Your Contribution

The success of the STC project crucially depends on voluntary contributions.

odtustd@metu.edu.tr

About

Your Contribution

Your Contribution

What is a Corpus?

Aim of the Project

Links

Members