2017
DOI: 10.48550/arxiv.1707.05154
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Preliminary Exploration of Formula Embedding for Mathematical Information Retrieval: can mathematical formulae be embedded like a natural language?

Abstract: While neural network approaches are achieving breakthrough performance in the natural language related elds, there have been few similar a empts at mathematical language related tasks. In this study, we explore the potential of applying neural representation techniques to Mathematical Information Retrieval (MIR) tasks. In more detail, we rst brie y analyze the characteristic di erences between natural language and mathematical language. en we design a "symbol2vec" method to learn the vector representations of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(21 citation statements)
references
References 5 publications
0
21
0
Order By: Relevance
“…More recently, data-driven approaches which incorporate classic word embeddings (Gao et al, 2017;Mansouri et al, 2019), GNN (Song and Chen, 2021) or transformer models Reusch et al, 2021a,b) have also been proposed for the MIR domain. By observing token co-occurrence during training, these models can easily discover synonyms, equivalent mathematically transformed formulas, or high-level semantic similarities, making them a good enhancement for structure search approaches.…”
Section: Data-driven Methodsmentioning
confidence: 99%
“…More recently, data-driven approaches which incorporate classic word embeddings (Gao et al, 2017;Mansouri et al, 2019), GNN (Song and Chen, 2021) or transformer models Reusch et al, 2021a,b) have also been proposed for the MIR domain. By observing token co-occurrence during training, these models can easily discover synonyms, equivalent mathematically transformed formulas, or high-level semantic similarities, making them a good enhancement for structure search approaches.…”
Section: Data-driven Methodsmentioning
confidence: 99%
“…Recent work (Roy, Upadhyay, and Roth 2016;Zanibbi et al 2016;) focused mainly on the structural features of math equations, and utilized tree structures to represent equations for mathematical information retrieval and mathematical word problem solving. Other work Krstovski and Blei 2018;Yasunaga and Lafferty 2019) instead focused mainly on the semantic features of equations. They processed an equation as a sequence of symbols in order to learn its representation.…”
Section: Related Work Mathematical Equation Representationmentioning
confidence: 99%
“…We view headline generation as a special type of summarizaton, with the constraint that only a short sequence of words is generated and that it preserves the essential meaning of a math question document. Recently, headline generation methods with end-to-end frameworks (Tan, Wan, and Xiao 2017b;Narayan, Cohen, and Lapata 2018;Zhang et al 2018;Gavrilov, Kalaidin, and Malykh 2019) achieved significant success. Math headline generation is similar to existing headline generation tasks, but still differs in several aspects.…”
Section: Summarization and Headline Generationmentioning
confidence: 99%
“…is extracted from the English Wikipedia page about Van der Waerden's theorem 4 . Without further explanation, the symbols W , k, and ε might have several possible meanings.…”
Section: Introductionmentioning
confidence: 99%
“…Word embedding techniques has received significant attention over the last years in the Natural Language Processing (NLP) community, especially after the publication of word2vec [18]. Recently, more and more projects try to adapt this knowledge for solving Mathematical Information Retrieval (MIR) tasks [4,12,36,34]. While all of these projects follow similar approaches and obtain promising results, all of them fail to understand mathematical expressions because of the same fundamental issues.…”
Section: Introductionmentioning
confidence: 99%