Congreso, coloquio o simposio

Infoling 1.34 (2013)
Título:Compilation and annotation of spoken corpora: Towards best practice? (ICAME34)
Entidad organizadora:NHH Norwegian School of Economics
Lugar de celebración:Santiago de Compostela, España
Fecha de celebración:22 de mayo de 2013
Circular Nº:1
Contacto:Gisle Andersen,
DescripciónThis workshop provides a meeting ground for scholars involved in the creation of corpora of spoken language or with a more general interested in the representation of spoken data based on audio/video recordings. The workshop addresses the need to harmonise corpus-building methods by developing or utilising internationally recognised standards in corpus linguistics or best practice guidelines for the transcription and annotation of audio/video data.

The aim is to facilitate the exchange of experience from large-scale and coordinated corpus building efforts as well as small-scale and local initiatives. This includes accounts of, on the one hand, the practicalities encountered in corpus compilation, transcription and annotation, and on the other hand, how annotation decisions are grounded in linguistic theory. This will hopefully stimulate a fruitful discussion about whether/how cross-corpora comparison is hampered by lack of uniformity in annotation schema and procedures, what solutions corpus builders recommend at different annotation levels, practical experience with the use of existing standards or de facto standards (e.g. COBUILD/NERC, TEI, XCES), methods for testing and improving inter-annotator agreement, etc. Relevant topics include, but are not restricted to:
- Corpus design (techniques for capturing and linking text and audio/video data; ensuring consistency in transcription; ensuring inter-annotator agreement)
- Orthographic transcription (transcription of non-standard vocabulary, slang, swearing, neologisms; standardised vs. idiosyncratic orthography; standardised representation of pauses, backchannels and hesitation phenomena)
- Annotation of syntactic features (the relevance and reliability of part-of-speech tagging for (informal/messy) conversational data; syntactic parsing of speech; parsers’/taggers’ capability of handling non-standard forms and neologisms)
- Annotation of prosodic, phonetic, or acoustic features (standardised vs. in-house annotation schemes, simple vs. detailed prosodic annotation; the relevance and reliability of phonetic annotation)
- Pragmatic or gestural annotation (standardised/in-house systems for annotation of speech act information, discourse functions, pragmatic markers, quotatives, anaphora and deixis; gestural annotation schemes)
We invite papers that discuss specific corpus initiatives dealing with any of the above topics, or that report on corpus-based case studies which illustrate or problematise the need for methodological harmonisation and standardisation in the field.

The workshop will be organised as a series of thematic slots consisting of 15-minute papers followed by joint discussions.

Abstracts of 300-400 words should be submitted by e-mail to all three convenors:, and The notification of acceptance will be sent out in late February 2013.

Workshop convenors: Gisle Andersen (NHH-NO), John Kirk (QUB-UK), Susan Lee Nacey (HiHm-NO)
Área temática:Lingüística de corpus
Plazo de envío de propuestas: hasta el31 de enero de 2013
Lengua(s) oficial(es) del evento:English

Fecha de publicación en Infoling:17 de enero de 2013
Gisle Andersen
NHH Norwegian School of Economics