The automatic analysis of corpora

Sinclair, John

doi:10.1515/9783110867275.379

Cited by 23 publications

(13 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Al respecto existen opiniones enfrentadas en torno a si una posterior revisión manual del etiquetado sería o no necesaria. Por ejemplo, Sinclair (1992) se manifiesta contrario a la postedición humana y prioriza un análisis automático en su integridad: "Analysis should be restricted to what the machine can do without human checking, or intervention" (1992: 381). Al contrario, Kahrel, Barnett y Leech (1997) opinan que "ultimately it is the human being's mental interpretation that enables us to evaluate the quality of annotation " (1997: 244).…”

Section: Figura 4 Archivo Etiquetado Con Treetaggerunclassified

Compilación Y Análisis De Un Corpus Paralelo Para La Investigación en Traducción: Proyecto Con Déjà Vu, Treetagger E Ims Open Corpus Workbench

Molés-Cases

2016

RLA

View full text Add to dashboard Cite

RESuMENAunque en los últimos años la lingüística de corpus ha experimentado una gran evolución y en la actualidad cuenta con una creciente presencia en proyectos de investigación en torno a estudios de Lingüística y Traducción (por ejemplo: Kübler y Foucou, 2003; Laroche y Langlais, 2010), los procedimientos técnicos más avanzados enfocados a la compilación y explotación de corpus siguen siendo un escollo. El principal propósito de este trabajo es, por tanto, hacer accesible este tipo de información a toda la comunidad investigadora poco experta en la materia. En concreto, presenta la experiencia de creación de un corpus paralelo alineado con Déjà Vu, etiquetado lingüísticamente con TreeTagger, documentado con Notepad++ e indexado con IMS Open Corpus Workbench. Además, incluye una breve introducción a la exploración y el análisis de corpus con Corpus Query Processor, la principal herramienta de IMS Open Corpus Workbench.Palabras clave: Lingüística de corpus; Déjà Vu; TreeTagger; IMS Open Corpus Workbench. RLA. Revista de Lingüística Teórica y AplicadaConcepción (Chile), 54 (1), I Sem. 2016, pp. 149-174. CL ISSN 0033 -698X * Este trabajo ha sido posible gracias a los proyectos "Refinamiento y sistematización del aná-lisis del corpus COVALT a través de su preprocesamiento y ampliación mediante la inclusión de traducciones al castellano" (FFI2012-35239/FILO) del Ministerio de Educación de España y "Los corpus en la enseñanza de la traducción. Ampliación y explotación didáctica del corpus COVALT" (P1.1B2013-44) de la Universitat Jaume I (España) y a una ayuda para movilidad del personal investigador de la Fundació Caixa Castelló-Bancaixa ("Acción 2 del Plan de promoción a la investigación de la Universitat Jaume I para el curso 2012/2013") en la Universität Leipzig (Alemania). Quiero agradecerles a Ulrike Oster, Víctor González, Daniel Renau, Francisco Nevado y a los dos evaluadores anónimos sus consejos y comentarios. 150RLA. Revista de Lingüística Teórica y Aplicada, 54 (1), I Sem. 2016 ABSTRACTAlthough Corpus linguistics has advanced a great deal in recent years and is now being increasingly more frequently included within research projects regarding Linguistics and Translation (for instance: Kübler & Foucou, 2003;Laroche & Langlais, 2010), the most advanced technical procedures focused on the creation and exploitation of corpora are still a pitfall. The main aim of this paper is, then, to make this kind of information more widely available to the research community with little experience in the field. In particular, it presents the experience of creating a parallel corpus that was aligned with INTRODuCCIÓNExiste un amplio consenso en cuanto a que la lingüística de corpus constituye una herramienta óptima para el estudio de fenómenos lingüísticos y traductológicos (véase, por ejemplo: Bernardini, 2004;Aston, 2009;Kübler, 2011). Cuando un investigador dispone del corpus adecuado para sus fines y una interfaz de bús-queda que le permita extraer de él la información específica que busca, los corpus electrónicos ...

show abstract

Section: Figura 4 Archivo Etiquetado Con Treetaggerunclassified

Compilación Y Análisis De Un Corpus Paralelo Para La Investigación en Traducción: Proyecto Con Déjà Vu, Treetagger E Ims Open Corpus Workbench

Molés-Cases

2016

RLA

View full text Add to dashboard Cite

show abstract

“…Modern corpus studies originated in studies of language (Firth, 1957;Sinclair, 1992;Quirk, I960;), and the working hypotheses of those studies continue to influence corpus studies in music. If one thinks of language comprehension as involving a type of auditory scene analysis (Bregman, 1990), one sees that studies of verbal corpora have made the quite reasonable and often methodologically necessary simplification that language constitutes a single auditory stream, one represented either as transcribed speech or as written text to be heard internally by a reader.…”

Section: The Assumption Of a Single Auditory Streammentioning

confidence: 99%

“Historically Informed” Corpus Studies

Gjerdingen¹

2012

Music Perception

View full text Add to dashboard Cite

Musicians can choose between various “historicist” or “presentist” ways of performing works from the past. Music scholars who study early music sometimes are forced to make similar choices. If one thinks of corpus studies in music as an objective form of counting the “elements of music,” the question of what constitutes an “element” can involve similar historicist/presentist dilemmas. The article examines three historically significant characteristics of European art music—three historicist features—that are not always recognized in presentist corpus studies. For an illustrative example, a comparison is made between how the cadenza doppia in a Bach toccata for organ might be represented in a corpus study as either a two-voice framework or a series of Roman numerals in the tradition of Allen McHose (1947). Because that type of cadence was a commonplace in Bach’s time and in Bach’s compositions, a corpus analysis should be able to detect its multiple occurrences as a core element of the music.

show abstract

“…Stubbs briefly describes work at the University of Birmingham which has gone towards the compilation of a 200 million word machine-readable corpus of spoken and written English entitled the Bank of English. and he refers to other work which has developed dictionaries, grammars, and a variety of linguistic scholarship describing English according to these principles (Sinclair 1987(Sinclair , 1990(Sinclair , 1991(Sinclair , 1992c. In his introduction, Stubbs also establishes the tone of much of the collection: his final principle flatly declares that Sinclair's work provides sufficient evidence to conclude that "Saussurian [and Chomskyan] dualisms are misconceived" (p. 3).…”

Section: Reviewed By W Wilfried Schuhmachermentioning

confidence: 99%