Nagy Viktor scite author profile

The choice of natural language technology appropriate for a given language is greatly impacted by density (availability of digitally stored material). More than half of the world speaks medium density languages, yet many of the methods appropriate for high or low density languages yield suboptimal results when applied to the medium density case. In this paper we describe a general methodology for rapidly collecting, building, and aligning parallel corpora for medium density languages, illustrating our main points on the case of Hungarian, Romanian, and Slovenian. We also describe and evaluate the hybrid sentence alignment method we are using. IntroductionThere are only a dozen large languages with a hundred million speakers or more, accounting for about 40% of the world population, and there are over 5,000 small languages with less than half a million speakers, accounting for about 4% (Grimes 2003). In this paper we discuss some ideas about how to build parallel corpora for the five hundred or so medium density languages that lie between these two extremes based on our experience building a 50M word sentence-aligned Hungarian-English parallel corpus. Throughout the paper we illustrate our strategy mainly on Hungarian (14m speakers), also mentioning Romanian (26m speakers), and Slovenian (2m speakers), but we emphasize that the key factor leading the success of our method, a vigorous culture of native language use and (digital) literacy, is by no means restricted to Central European languages. Needless to say, the density of a language (the availability of digitally stored material) is predicted only imperfectly by the population of speakers: major Prakrit or Han dialects, with tens, sometimes hundreds, of million speakers, are low density, while minor populations, such as the Inuktitut, can attain high levels of digital literacy given the political will and a conscious Hansard-building effort (Martin et al 2003). With this caveat, population (or better, GDP) is a very good approximation for density, on a par with web size.The rest of the paper is structured as follows. In Section 1 we describe our methods of corpus collection and preparation. Our hybrid sentence-level aligner is discussed in Section 2. Evaluation is the subject of Section 3.

show abstract

Impact of Autonomous Vehicles on Roundabout Capacity

Boualam

Borsos

Koren

et al. 2022

Sustainability

View full text Add to dashboard Cite

Studying the impact of AVs on our road infrastructure offers a lot of potential in the transportation domain; one of these issues is how capacity will be affected. This paper presents a contribution to this research area by investigating the impact of AVs on the capacity of single-lane roundabouts using a microsimulation model. For the development of the model, a roundabout situated in Győr (Hungary) was selected and field data on the roundabout geometric characteristics as well as traffic volumes were used. Simulations using Vissim were run for various scenarios based on varying input traffic volumes and market penetration rates of AVs to assess queue lengths. The highway capacity manual (HCM) roundabout model was used to estimate the capacity of the existing roundabout. Values of follow-up times and critical gaps were set to decreasing as the penetration rate of AVs increases. The results demonstrated that 20% and 40% AVs in the flow would increase leg capacities by about 10% and 20%, respectively. Furthermore, a reduction in excessive queue lengths was estimated and capacities and queue lengths were calculated by legs. It was found that these are highly influenced by the distribution of flows among legs, and the share of flows in various directions.

show abstract

Web-based frequency dictionaries for medium density languages

Kornai¹,

Halácsy²,

Viktor³

et al. 2006

View full text Add to dashboard Cite

show abstract

The effects of autonomous buses to vehicle scheduling system

Viktor

Horváth

2020

Procedia Computer Science

View full text Add to dashboard Cite

Effect of Self-driving Buses on Vehicle Scheduling

Viktor

Horváth

2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nagy Viktor

Parallel corpora for medium density languages

Impact of Autonomous Vehicles on Roundabout Capacity

Web-based frequency dictionaries for medium density languages

The effects of autonomous buses to vehicle scheduling system

Effect of Self-driving Buses on Vehicle Scheduling

Contact Info

Product

Resources

About