Abstract:While Gallo-Italic varieties clearly belong to the Romance language family, their subgrouping as either Gallo-Romance or Italo-Romance has been the source of disagreement in the classificatory literature. While earlier analyses tended to classify Gallo-Italic as Gallo-Romance (notably Schmid, 1956; Bec, 1970-1971), later work has either argued for or tacitly assumed a classification of Gallo-Italic as part of the Italo-Romance branch, a view that is both different from as well as irreconcilable with the earlie… Show more
“…The lack of availability is very annoying, not in the sense that we have no access to a given article in the form of a scan or a book, but rather because many authors collect data, write articles about them, but then do not share their data officially. It is still not surprising that articles are being published in which new ideas are postulated or new conclusions are being made, but in which scholars do not share the data upon which they base their conclusions openly (Tamburelli and Brasca 2017). The same holds for many grammatical descriptions, in which scholars extract individual sentences from their personally collected private corpus but never reference them sufficiently, nor offer the full corpus.…”
Section: Data Problems In Comparative Linguisticsmentioning
While the discipline of computational linguistics mostly deals with the modeling and the investigation of individual languages (often "big" languages such as English, German, Arabic, or Chinese), Multilingual Computational Linguistics focuses on the comparison of languages, trying to develop new methods and techniques by which languages can be compared automatically or in a computer-assisted manner. The comparison itself follows different perspectives (maintaining a historical, typological, or areal viewpoint). In this scientific practice course, we will take a closer look at basic theories and methods which are relevant for the discipline of Multilingual Computational Linguistics. We will look at large corpora with multiple languages of the world as well as data from individual languages and language families.
“…The lack of availability is very annoying, not in the sense that we have no access to a given article in the form of a scan or a book, but rather because many authors collect data, write articles about them, but then do not share their data officially. It is still not surprising that articles are being published in which new ideas are postulated or new conclusions are being made, but in which scholars do not share the data upon which they base their conclusions openly (Tamburelli and Brasca 2017). The same holds for many grammatical descriptions, in which scholars extract individual sentences from their personally collected private corpus but never reference them sufficiently, nor offer the full corpus.…”
Section: Data Problems In Comparative Linguisticsmentioning
While the discipline of computational linguistics mostly deals with the modeling and the investigation of individual languages (often "big" languages such as English, German, Arabic, or Chinese), Multilingual Computational Linguistics focuses on the comparison of languages, trying to develop new methods and techniques by which languages can be compared automatically or in a computer-assisted manner. The comparison itself follows different perspectives (maintaining a historical, typological, or areal viewpoint). In this scientific practice course, we will take a closer look at basic theories and methods which are relevant for the discipline of Multilingual Computational Linguistics. We will look at large corpora with multiple languages of the world as well as data from individual languages and language families.
“…(Mahsun, 1995;Onishi, 2019) Furthermore, the results of the calculation via the dialectology formula need to be consulted with the isolect status rules based on the following Comparative Historical Linguistics criteria/rules. Tamburelli & Brasca (2018) If the historical relationship between ML and LL in figure 2 is the relationship between dialects of a language, then in Table 2, this determination has undergone an innovation, namely the average number of kinship relations is the result of the number 48.5%, this percentage is in the range of status criteria isolect of a language, which is 36-81%. So the isolect status, shows the kinship relationship of the language of 'Language of Families'.…”
Section: New Developing Group Proto and Language Contemporarymentioning
The purpose of this paper is to determine the facts of language that have historical relations in comparative diachronic linguistics studies. The existence of the Lampung language (LL) as a reflection of the Proto *WPM was initially studied by Dyen 1965 and has been conservative for four decades or forty years old. During those four decades, apart from Dyen 1965, the research results showed 89.1% (Figure 0); there is also Walker's (1976) inference at the observation point of Way Lima Lampung used 200 Swadesh Vocabulary Lexicon, the calculation result is 82.2%. Furthermore, Sudirman and Fernandez, the National Seminar on Austronesian Language and Culture II, studied the "Status of the Komering Isolect in the LL Group" the results of the 82.16 % Dialectometric Lexicostatistics calculations were close to the results of Walker's previous study of 82.2%, so that the historical relation of the isolect with the status of a conservative dialect can be observed in Figure 2. In addition to conservative results, there are also innovative results, namely the occurrence of share retentions and share innovations shown by the results of the 48.5% Dialectometric Lexicostatistics calculation as the realization of the historical relation between Malay Language (ML) and Lampung Language (LL).
“…It is still very difficult to find particular datasets, since linguistic journals often do not have a policy on supplementary data and may lack resources for hosting data on their servers. It is also often difficult to access data, and many papers which are based on original data are still being published without the data 1 and having to request the data from the authors is sometimes a more serious obstacle than it should be 22 , 23 . Due to idiosyncratic formats, linguistic datasets also often lack interoperability and are therefore not reusable .…”
The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.