Abstract:We propose the task of disambiguating symbolic expressions in informal STEM documents in the form of L A T E X files -that is, determining their precise semantics and abstract syntax tree -as a neural machine translation task. We discuss the distinct challenges involved and present a dataset with roughly 33,000 entries. We evaluated several baseline models on this dataset, which failed to yield even syntactically valid L A T E X before overfitting. Consequently, we describe a methodology using a transformer la… Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.