2024
DOI: 10.1007/s40593-024-00403-3
|View full text |Cite
|
Sign up to set email alerts
|

GPT-4 in Education: Evaluating Aptness, Reliability, and Loss of Coherence in Solving Calculus Problems and Grading Submissions

Alberto Gandolfi

Abstract: In this paper, we initially investigate the capabilities of GPT-3 5 and GPT-4 in solving college-level calculus problems, an essential segment of mathematics that remains under-explored so far. Although improving upon earlier versions, GPT-4 attains approximately 65% accuracy for standard problems and decreases to 20% for competition-like scenarios. Overall, the models prove to be unreliable due to common arithmetic errors.Our primary contribution lies then in examining the use of ChatGPT for grading solutions… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 33 publications
0
0
0
Order By: Relevance
“…The tension between potentially transformative impacts (e.g., as intelligent assistants [1]) and inaccuracy (e.g., hallucination, biases) or improper use (e.g., plagiarism [2]) is one of the most pressing research questions of our times. For example, well-known tools such as ChatGPT and GPT-4 can be helpful mathematical assistants, with a level currently equivalent to an undergraduate student on several tasks [3], but they can lose focus when prompted repeatedly, and they still make frequent mistakes on seemingly simple tasks such as identifying whether some numbers are contained within a given interval [4].…”
Section: Introductionmentioning
confidence: 99%
“…The tension between potentially transformative impacts (e.g., as intelligent assistants [1]) and inaccuracy (e.g., hallucination, biases) or improper use (e.g., plagiarism [2]) is one of the most pressing research questions of our times. For example, well-known tools such as ChatGPT and GPT-4 can be helpful mathematical assistants, with a level currently equivalent to an undergraduate student on several tasks [3], but they can lose focus when prompted repeatedly, and they still make frequent mistakes on seemingly simple tasks such as identifying whether some numbers are contained within a given interval [4].…”
Section: Introductionmentioning
confidence: 99%