Generative intelligence technologies like ChatGPT hold
significant
promise across various sectors, particularly in education. This study
assessed ChatGPT’s proficiency in responding to questions from
University Entrance Exams typically administered to senior secondary
students. Our findings indicate that ChatGPT version 4.0 consistently
outperformed students, achieving higher average scores across exams
from the past four years. However, it still committed errors in about
20% of its responses. Despite this, ChatGPT 4.0 demonstrated a robust
capability to comprehend and produce natural language within a chemical
context. Consequently, by applying diverse prompt engineering techniques,
this AI was able to create short-answer questions and numerical problems
that closely mimic the format and conceptual content of University
Entrance Exams. We also confirmed that ChatGPT 4.0 could grade exams,
showing a significant correlation with scores given by human evaluators
but lower than that among human graders. This discrepancy and other
practical considerations limit its application in grading exams.