Training Verifiers to Solve Math Word Problems
数据集 GSM8K 和 论文 Training Verifiers to Solve Math Word Problems
openai在2021年一篇文章,当时语言模型在多步的数学推理上表现很差,文章中采用了一种generator+verifier的方式来提升模型效果。当前使用的GSM8K(grade school math 8.5K)评测集也是这篇文章提出的。
GSM8K:共有8.5K的高质量中学数学题,7.5K的训练集和1K的测试集,每个问题的解题步骤在2-8步,通过普通的加减乘除就能算对。
考古OpenAI,Anthropic论文1 : Training Verifiers to Solve Math Word Problems
数据格式,输入形式为:
If Buzz bought a pizza with 78 slices at a restaurant and then decided to share it with the waiter in the ratio of 5:8, with Buzz's ratio being 5, what's twenty less the number of slices of pizza that the waiter ate?
Step 1: The total ratio representing the pizza is 5+8 = <<5+8=13>>13. ки
Step 2: The waiter ate 13 x 8 / 13 = <<13*8/13=6>>6 slices of the pizza. ки
Step 3: Buzz ate 78 - 6 = <<78-6=72>>72 slices of the pizza. ки
Step 4: The waiter ate 20 less than the number of slices that Buzz ate which is 72 - 20 = 52. ки
Step 5: The waiter ate 52 slices of the pizza. The answer is: 52 ки
标签: