Abstract:Digital mathematical libraries assemble the knowledge of years of mathematical research. Numerous disciplines (e.g., physics, engineering, pure and applied mathematics) rely heavily on compendia gathered findings. Likewise, modern research applications rely more and more on computational solutions, which are often calculated and verified by computer algebra systems. Hence, the correctness, accuracy, and reliability of both digital mathematical libraries and computer algebra systems is a crucial attribute for m… Show more
“…The results t s (e, X) are in semantic L A T E X which is in L C . For the second step (the mapping), we rely on the original L A CAST implementation (from semantic L A T E X to CAS syntaxes) for t m (e) and presume that t m (e) is complete and appropriate [7], [8].…”
Section: Methodsmentioning
confidence: 99%
“…These steps are: (1) pre-processing Wikipedia articles to enable natural language processing on it, (2) constructing an annotated mathematical dependency graph, (3) generating semantic enhancing replacement patterns, and (4) performing CASspecific translations (see Figure 2). In addition, we perform automatic symbolic and numeric computations on the translated expressions to verify equations from Wikipedia articles [7], [8]. We show that the system is capable of detecting potential errors in mathematical equations in Wikipedia articles.…”
Section: Via the Hypergeometric Functionmentioning
confidence: 99%
“…Our proposed pipeline tangents several well-known tasks from MathIR, namely descriptive entity recognition for mathematical expressions [15], [17]- [20], math tokenization [21], [22], math dependency recognition [23], [24], and automatic verification [7], [8]. Existing approaches to translate mathematical formulae from presentational languages, e.g., L A T E X or MathML, to content languages, e.g., content MathML or CAS syntax, do not analyze the context of a formula [24]- [26].…”
Section: Related Workmentioning
confidence: 99%
“…CAS, such as Maple [5] and Mathematica [6], are complex mathematical software tools that allow users to manipulate, simplify, plot, and evaluate mathematical expressions. Hence, translating mathematics in Wikipedia to CAS syntaxes enables automatic numeric and symbolic verification checks on complex mathematical equations [7], [8]. Integrating such verifications into the existing ORES system can significantly reduce the overload of moderating mathematical content and increasing credibility in the quality of Wikipedia articles at the same time [9].…”
Wikipedia combines the power of AI solutions and human reviewers to safeguard article quality. Quality control objectives include detecting malicious edits, fixing typos, and spotting inconsistent formatting. However, no automated quality control mechanisms currently exist for mathematical formulae. Spell checkers are widely used to highlight textual errors, yet no equivalent tool exists to detect algebraically incorrect formulae. Our paper addresses this shortcoming by making mathematical formulae computable. We present a method that (1) gathers the semantic information surrounding the context of each mathematical formulae, (2) provides access to the information in a graph-structured dependency hierarchy, and (3) performs automatic plausibility checks on equations. We evaluate the performance of our approach on 6,337 mathematical expressions contained in 104 Wikipedia articles on the topic of orthogonal polynomials and special functions. Our system, LaCASt, verified 358 out of 1,516 equations as error-free. LaCASt successfully translated 27% of the mathematical expressions and outperformed existing translation approaches by 16%. Additionally, LaCASt achieved an F1 score of .495 for annotating mathematical expressions with relevant textual descriptions, which is a significant step towards advancing searchability, readability, and accessibility of mathematical formulae in Wikipedia. A prototype of LaCASt and the semantically enhanced Wikipedia articles are available at: https://tpami.wmflabs.org.
“…The results t s (e, X) are in semantic L A T E X which is in L C . For the second step (the mapping), we rely on the original L A CAST implementation (from semantic L A T E X to CAS syntaxes) for t m (e) and presume that t m (e) is complete and appropriate [7], [8].…”
Section: Methodsmentioning
confidence: 99%
“…These steps are: (1) pre-processing Wikipedia articles to enable natural language processing on it, (2) constructing an annotated mathematical dependency graph, (3) generating semantic enhancing replacement patterns, and (4) performing CASspecific translations (see Figure 2). In addition, we perform automatic symbolic and numeric computations on the translated expressions to verify equations from Wikipedia articles [7], [8]. We show that the system is capable of detecting potential errors in mathematical equations in Wikipedia articles.…”
Section: Via the Hypergeometric Functionmentioning
confidence: 99%
“…Our proposed pipeline tangents several well-known tasks from MathIR, namely descriptive entity recognition for mathematical expressions [15], [17]- [20], math tokenization [21], [22], math dependency recognition [23], [24], and automatic verification [7], [8]. Existing approaches to translate mathematical formulae from presentational languages, e.g., L A T E X or MathML, to content languages, e.g., content MathML or CAS syntax, do not analyze the context of a formula [24]- [26].…”
Section: Related Workmentioning
confidence: 99%
“…CAS, such as Maple [5] and Mathematica [6], are complex mathematical software tools that allow users to manipulate, simplify, plot, and evaluate mathematical expressions. Hence, translating mathematics in Wikipedia to CAS syntaxes enables automatic numeric and symbolic verification checks on complex mathematical equations [7], [8]. Integrating such verifications into the existing ORES system can significantly reduce the overload of moderating mathematical content and increasing credibility in the quality of Wikipedia articles at the same time [9].…”
Wikipedia combines the power of AI solutions and human reviewers to safeguard article quality. Quality control objectives include detecting malicious edits, fixing typos, and spotting inconsistent formatting. However, no automated quality control mechanisms currently exist for mathematical formulae. Spell checkers are widely used to highlight textual errors, yet no equivalent tool exists to detect algebraically incorrect formulae. Our paper addresses this shortcoming by making mathematical formulae computable. We present a method that (1) gathers the semantic information surrounding the context of each mathematical formulae, (2) provides access to the information in a graph-structured dependency hierarchy, and (3) performs automatic plausibility checks on equations. We evaluate the performance of our approach on 6,337 mathematical expressions contained in 104 Wikipedia articles on the topic of orthogonal polynomials and special functions. Our system, LaCASt, verified 358 out of 1,516 equations as error-free. LaCASt successfully translated 27% of the mathematical expressions and outperformed existing translation approaches by 16%. Additionally, LaCASt achieved an F1 score of .495 for annotating mathematical expressions with relevant textual descriptions, which is a significant step towards advancing searchability, readability, and accessibility of mathematical formulae in Wikipedia. A prototype of LaCASt and the semantically enhanced Wikipedia articles are available at: https://tpami.wmflabs.org.
“…Another feature we added to L A CAST is the support of packages in Maple. Some functions are only available in modules (packages) that must be preloaded, such as QPochhammer in the package QDifferenceEquations 16 . The general simplify method in Maple does not cover q-hypergeometric functions.…”
Digital mathematical libraries assemble the knowledge of years of mathematical research. Numerous disciplines (e.g., physics, engineering, pure and applied mathematics) rely heavily on compendia gathered findings. Likewise, modern research applications rely more and more on computational solutions, which are often calculated and verified by computer algebra systems. Hence, the correctness, accuracy, and reliability of both digital mathematical libraries and computer algebra systems is a crucial attribute for modern research. In this paper, we present a novel approach to verify a digital mathematical library and two computer algebra systems with one another by converting mathematical expressions from one system to the other. We use our previously developed conversion tool (referred to as L A CAST) to translate formulae from the NIST Digital Library of Mathematical Functions to the computer algebra systems Maple and Mathematica. The contributions of our presented work are as follows: (1) we present the most comprehensive verification of computer algebra systems and digital mathematical libraries with one another; (2) we significantly enhance the performance of the underlying translator in terms of coverage and accuracy; and (3) we provide open access to translations for Maple and Mathematica of the formulae in the NIST Digital Library of Mathematical Functions.
Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computational expensive experiments. In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written. This article outlines caching recommendations for developing research software in data science projects. Our recommendations provide a perspective to circumvent common problems such as propriety dependence, speed, etc. At the same time, caching contributes to the reproducibility of experiments in the open science workflow. Concerning the four guiding principles, i.e., Findability, Accessibility, Interoperability, and Reusability (FAIR), we foresee that including the proposed recommendation in a research software development will make the data related to that software FAIRer for both machines and humans. We exhibit the usefulness of some of the proposed recommendations on our recently completed research software project in mathematical information retrieval.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.