Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation

Παππάς, Νικόλαος; Werlen, Lesly Miculicich; Henderson, J.

doi:10.48550/arxiv.1808.10681

Cited by 1 publication

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We call our deep learning model life2vec. The life2vec model is based on a transformerarchitecture [31,30,44,45,46,47,48,49,50,51]. Transformers are well suited for representing life-sequences due to their ability to compress contextual information [52,53] and take into account temporal and positional information [5,54].…”

Section: The Life2vec Modelmentioning

confidence: 99%

“…Each contextual representation, x i , is transformed via f 1 (x) = tanh(x W 1 + b 1 ), followed by l2-normalisation, norm(x) = x/∥x∥. The weights of the final layer, f 2 , are tied to the embedding matrix, E V , which is further normalized to preserve only directions [48]. The resulting scores is scaled by α to sharpen the distribution [46] MLM…”

Section: Pre-training: Learning Structure Of the Datamentioning

confidence: 99%

“…For each masked token the model must uncover, the decoder returns the likelihood distribution over the entire vocabulary. The likelihood (in our case) is a product of the scaled cosine distance between the contextualized representation of a token and the original representations of tokens in E V [48,47].…”

Section: Pre-training: Learning Structure Of the Datamentioning

confidence: 99%

“…At the start of training, each α is initialized to zero (meaning that none of the layers contribute. We introduced several modifications to BERT architecture, such as ReZero [44], ScaleNorm [46], Swish [45], and Weight Tying [47,48] to speed up the convergence and reduce the size of the model.…”

Section: Encoder Componentmentioning

confidence: 99%

See 3 more Smart Citations

Using Sequences of Life-events to Predict Human Lives

Savcisens

Eliassi-Rad

Hansen

et al. 2023

Preprint

View full text Add to dashboard Cite

Over the past decade, machine learning has revolutionized computers' ability to analyze text through flexible computational models. Due to their structural similarity to written language, transformer-based architectures have also shown promise as tools to make sense of a range of multi-variate sequences from protein-structures, music, electronic health records to weather-forecasts. We can also represent human lives in a way that shares this structural similarity to language. From one perspective, lives are simply sequences of events: People are born, visit the pediatrician, start school, move to a new location, get married, and so on. Here, we exploit this similarity to adapt innovations from natural language processing to examine the evolution and predictability of human lives based on detailed event sequences. We do this by drawing on arguably the most comprehensive registry data in existence, available for an entire nation of more than six million individuals across decades. Our data include information about life-events related to health, education, occupation, income, address, and working hours, recorded with day-to-day resolution. We create embeddings of life-events in a single vector space showing that this embedding space is robust and highly structured. Our models allow us to predict diverse outcomes ranging from early mortality to personality nuances, outperforming state-of-the-art models by a wide margin. Using methods for interpreting deep learning models, we probe the algorithm to understand the factors that enable our predictions. Our framework allows researchers to identify new potential mechanisms that impact life outcomes and associated possibilities for personalized interventions.

show abstract