Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Design and implementation of the system for retrieving information about mathematical formulas – MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 × 105 arXiv documents and about 1.5 × 108 mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.
Design and implementation of the system for retrieving information about mathematical formulas – MFIRS. The structure of the system is mainly divided into the modules: input normalization, mathematical formula unification, mathematical formula encoding, text information feature extraction, mathematical formula feature extraction, mathematical formula indexing, retrieval and ranking. A method for extracting mathematical formulas and keywords based on FastText word embedding technology is proposed. This method can be used not only to get the structural features of the formula, but also to facilitate the calculation of the similarity of the formula by the vector result. At the same time, the model introduces the semantic features of context-rich mathematical formulas to improve the domain correlation of search results. The MathRetEval dataset was created based on about 7.9 × 105 arXiv documents and about 1.5 × 108 mathematical formulas. The scalability of the system is verified using this data set. The mathematical formulas can be written in the language TEX or MathML. When queried in the TEX language, it can be converted to a tree representation of the MathML representation and then indexed. This MFIRS is an information retrieval system for mathematical formulas with the features of mathematical perception, which can use the search for the similarity of partial formulas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.