We argue for a perspective on bilingual heritage speakers as native speakers of both their languages and present results from a large-scale, cross-linguistic study that took such a perspective and approached bilinguals and monolinguals on equal grounds. We targeted comparable language use in bilingual and monolingual speakers, crucially covering broader repertoires than just formal language. A main database was the open-access RUEG corpus, which covers comparable informal vs. formal and spoken vs. written productions by adolescent and adult bilinguals with heritage-Greek, -Russian, and -Turkish in Germany and the United States and with heritage-German in the United States, and matching data from monolinguals in Germany, the United States, Greece, Russia, and Turkey. Our main results lie in three areas. (1) We found non-canonical patterns not only in bilingual, but also in monolingual speakers, including patterns that have so far been considered absent from native grammars, in domains of morphology, syntax, intonation, and pragmatics. (2) We found a degree of lexical and morphosyntactic inter-speaker variability in monolinguals that was sometimes higher than that of bilinguals, further challenging the model of the streamlined native speaker. (3) In majority language use, non-canonical patterns were dominant in spoken and/or informal registers, and this was true for monolinguals and bilinguals. In some cases, bilingual speakers were leading quantitatively. In heritage settings where the language was not part of formal schooling, we found tendencies of register leveling, presumably due to the fact that speakers had limited access to formal registers of the heritage language. Our findings thus indicate possible quantitative differences and different register distributions rather than distinct grammatical patterns in bilingual and monolingual speakers. This supports the integration of heritage speakers into the native-speaker continuum. Approaching heritage speakers from this perspective helps us to better understand the empirical data and can shed light on language variation and change in native grammars. Furthermore, our findings for monolinguals lead us to reconsider the state-of-the art on majority languages, given recurring evidence for non-canonical patterns that deviate from what has been assumed in the literature so far, and might have been attributed to bilingualism had we not included informal and spoken registers in monolinguals and bilinguals alike.
The present study analyzes morphological productivity for complex verbs in second language acquisition by analyzing a corpus of German as a Foreign Language (GFL). It shows that advanced learners of GFL use prefix and particle verbs relatively frequently and productively but less so than native speakers do and discusses these findings in the light of different linguistic models and acquisition theories. It argues that corpus data must be evaluated against good models and that it is necessary to make the categorization decisions available as annotations.
In this paper, we present corpus data that questions the concept of native speaker homogeneity as it is presumed in many studies using native speakers (L1) as a control group for learner data (L2), especially in corpus contexts. Usage-based research on second and foreign language acquisition often investigates quantitative differences between learners, and usually a group of native speakers serves as a control group, but often without elaborating on differences within this group to the same extent. We examine inter-personal differences using data from two well-controlled German native speaker corpora collected as control groups in the context of second and foreign language research. Our results suggest that certain linguistic aspects vary to an extent in the native speaker data that undermines general statements about quantitative expectations in L1. However, we also find differences between phenomena: while morphological and syntactic sub-classes of verbs and nouns show great variability in their distribution in native speaker writing, other, coarser categories, like parts of speech, or types of syntactic dependencies, behave more predictably and homogeneously. Our results highlight the necessity of accounting for inter-individual variance in native speakers where L1 is used as a target ideal for L2. They also raise theoretical questions concerning a) explanations for the divergence between phenomena, b) the role of frequency distributions of morphosyntactic phenomena in usage-based linguistic frameworks, and c) the notion of the individual adult native speaker as a general representative of the target language in language acquisition studies or language in general.
Die Sprache von Lerner/-innen einer Fremdsprache unterscheidet sich auf allen linguistischen Ebenen von der Sprache von Muttersprachler/-innen. Seit einigen Jahrzehnten werden Lernerkorpora gebaut, um Lernersprache quantitativ und qualitativ zu analysieren. Hier argumentieren wir anhand von drei Fallbeispielen (zu Modifikation, Koselektion und rhetorischen Strukturen) für eine linguistisch informierte, tiefe Phänomenmodellierung und Annotation sowie für eine auf das jeweilige Phänomen passende formale und quantitative Modellierung. Dabei diskutieren wir die Abwägung von tiefer, mehrschichtiger Analyse einerseits und notwendigen Datenmengen für bestimmte quantitative Verfahren andererseits und zeigen, dass mittelgroße Korpora (wie die meisten Lernerkorpora) interessante Erkenntnisse ermöglichen, die große, flacher annotierte Korpora so nicht erlauben würden. Fragestellung und HintergrundWenn wir uns die folgenden Beispiele, geschrieben von fortgeschrittenen Lernenden des Deutschen als Fremdsprache (aus den Korpora Falko und Kobalt-DaF, siehe Abschn. 1.3) anschauen, sehen wir zunächst oberflächennahe Fehler und Merkwürdigkeiten.1 Dazu zählen beispielsweise orthografische Abweichungen (wie ausgeshen in (1a) oder aüßern in (1c)), Kasus-oder Argumentstrukturprobleme (wie kann ich die Frauenbewegung für vieles dankbar sein in (1a)), Flexionsprobleme (wie müssten in 1b) oder phraseologische Probleme (wie Haushalt einhalten in (1b)).1 Das große und interessante Thema ‚Fehler' können wir hier nicht ausführlich besprechen. Wir verwenden den Begriff ‚Fehler' hier so, wie er in der Lernerkorpusdiskussion oft gebraucht wird. In anderen Artikeln haben wir uns ausführlich mit den theoretischen Hintergründen und der Modellierung von Abweichungen, Fehlern, Errors und Mistakes beschäftigt (für einen Überblick siehe Lüdeling/Hirschmann 2015).
Quantitative approaches are gaining popularity in German legal research. The analysis of large corpora of legal text may be supported by text mining methods. In this study, we employ topic modeling—which aims at retrieving the “topics” of a corpus—to identify words related to certain areas of law present in the case law of the German Federal Constitutional Court (FCC). This information is then evaluated by legal experts and used to show significant content-related differences between the two most frequent types of proceedings before the FCC. Technical and somewhat unstable areas of law, such as tax law, social law, and civil service law, are significantly overrepresented in referrals for judicial review, whereas areas of law characterized by well-developed case law and judicial doctrine appear substantially more often in constitutional complaints. This insight may come as a surprise due to the fact that the Court’s material scope of review is identical in both types of proceedings. Our considerations do not, however, seem to apply to private law. Though we recognize the methodological and epistemological concerns regarding the heuristic nature of topic modeling, this study exemplifies its productive use in complementing, rather than replacing, more traditional techniques of analysis in legal studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.