GSM8K
TakeAway Message:
数据集难度不高,均为小学数学题,只涉及初等数学运算,但是需要多步推理。
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
GSM8K(小学数学 8K)是一个 8.5K 高质量语言多样化的小学数学单词问题的数据集。创建该数据集是为了支持对需要多步推理的基本数学问题进行问答的任务。
These problems take between 2 and 8 steps to solve.
这些问题需要 2 到 8 个步骤来解决。
Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the final answer. 解决方案主要涉及使用基本算术运算 (+−×÷) 执行一系列初等计算以得出最终答案。
A bright middle school student should be able to solve every problem: from the paper, "Problems require no concepts beyond the level of early Algebra, and the vast majority of problems can be solved without explicitly defining a variable."
一个聪明的中学生应该能够解决所有问题:从论文中可以看出,“问题不需要超出早期代数水平的概念,绝大多数问题都可以在不明确定义变量的情况下解决。
Solutions are provided in natural language, as opposed to pure math expressions. From the paper: "We believe this is the most generally useful data format, and we expect it to shed light on the properties of large language models’ internal monologues"
解决方案以自然语言提供,而不是纯数学表达式。来自论文:“我们相信这是最普遍有用的数据格式,我们希望它能揭示大型语言模型内部独白的特性”
{
'question': 'Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?',
'answer': 'Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.\n#### 72',
}