Maria Han Veiga scite author profile

A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just like it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, while genetic relationships-a convenient benchmark used for evaluation in previous work-appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another. *

show abstract

Machine learning applied to simulations of collisions between rotating, differentiated planets

Timpe

Veiga

Knabenhans

et al. 2020

Comput. Astrophys. Cosmol.

View full text Add to dashboard Cite

In the late stages of terrestrial planet formation, pairwise collisions between planetary-sized bodies act as the fundamental agent of planet growth. These collisions can lead to either growth or disruption of the bodies involved and are largely responsible for shaping the final characteristics of the planets. Despite their critical role in planet formation, an accurate treatment of collisions has yet to be realized. While semi-analytic methods have been proposed, they remain limited to a narrow set of post-impact properties and have only achieved relatively low accuracies. However, the rise of machine learning and access to increased computing power have enabled novel data-driven approaches. In this work, we show that data-driven emulation techniques are capable of classifying and predicting the outcome of collisions with high accuracy and are generalizable to any quantifiable post-impact quantity. In particular, we focus on the dataset requirements, training pipeline, and classification and regression performance for four distinct data-driven techniques from machine learning (ensemble methods and neural networks) and uncertainty quantification (Gaussian processes and polynomial chaos expansion). We compare these methods to existing analytic and semi-analytic methods. Such data-driven emulators are poised to replace the methods currently used in N-body simulations, while avoiding the cost of direct simulation. This work is based on a new set of 14,856 SPH simulations of pairwise collisions between rotating, differentiated bodies at all possible mutual orientations.

show abstract

DeC and ADER: Similarities, Differences and a Unified Framework

2021

View full text Add to dashboard Cite

Capturing Near-Equilibrium Solutions: A Comparison between High-Order Discontinuous Galerkin Methods and Well-Balanced Schemes

Veiga

Romero

Abgrall

et al. 2019

CiCP

View full text Add to dashboard Cite

Equilibrium or stationary solutions usually proceed through the exact balance between hyperbolic transport terms and source terms. Such equilibrium solutions are affected by truncation errors that prevent any classical numerical scheme from capturing the evolution of small amplitude waves of physical significance. In order to overcome this problem, we compare two commonly adopted strategies: going to very high order and reduce drastically the truncation errors on the equilibrium solution, or design a specific scheme that preserves by construction the equilibrium exactly, the so-called well-balanced approach. We present a modern numerical implementation of these two strategies and compare them in details, using hydrostatic but also dynamical equilibrium solutions of several simple test cases. Finally, we apply our methodology to the simulation of a protoplanetary disc in centrifugal equilibrium around its star and model its interaction with an embedded planet, illustrating in a realistic application the strength of both methods.

show abstract

A Cross-Platform Collection of Social Network Profiles

Veiga

Eickhoff

2016

View full text Add to dashboard Cite

The proliferation of Internet-enabled devices and services has led to a shifting balance between digital and analogue aspects of our everyday lives. In the face of this development there is a growing demand for the study of privacy hazards, the potential for unique user de-anonymization and information leakage between the various social media profiles many of us maintain. To enable the structured study of such adversarial effects, this paper presents a dedicated dataset of cross-platform social network personas (i.e., the same person has accounts on multiple platforms). The corpus comprises 850 users who generate predominantly English content. Each user object contains the online footprint of the same person in three distinct social networks: Twitter, Instagram and Foursquare. In total, it encompasses over 2.5M tweets, 340k check-ins and 42k Instagram posts. We describe the collection methodology, characteristics of the dataset, and how to obtain it. Finally, we discuss a common use case, cross-platform user identification.Comment: 4 pages, 5 figures, SIGIR 2016, short paper. SIGIR 2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieva

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maria Han Veiga

What Do Language Representations Really Represent?

Machine learning applied to simulations of collisions between rotating, differentiated planets

DeC and ADER: Similarities, Differences and a Unified Framework

Capturing Near-Equilibrium Solutions: A Comparison between High-Order Discontinuous Galerkin Methods and Well-Balanced Schemes

A Cross-Platform Collection of Social Network Profiles

Contact Info

Product

Resources

About