Introduction
Online RL
Offline RL
BCQ
TD3+BC
CQL
IQL
XQL
DT
Model-Based RL
MBPO
TD-MPC
Offline2Online
PessimisticQ-Ensemble
FamO2O
Imitation Learning
GAIL
AIRL
IQ-Learn
IRL&LM
RLHF
BasicKnowledge
LoRA
Flash_Attention
Datasets&Benchmarks
IMDB
GSM8K
RewardBench
UltraFeedback
UltraInteract
DeepSpeed
DPO
r2Q*
TDPO
Prospect Theory
CoT
SomeNewPapers
TokenLevelReward
GenRM
Self-correction
IRL+LLM
Prompt-OIRL
MathVerifiers
Let’sVerifyStepbyStep
Math-Shepherd
Openai-o1
Diffusion
DDPM
DDIM
Score-Matching
SDE
Guided Diffusion
Diffuser
DiffusionQL
SfBC
QGPO
SRPO
SomeNewPapers
InContextLearning
scatteredPapers
RewardShift
STaR
scatteredNotes
Published with GitBook
Datasets&Benchmarks
Datasets&Benchmarks
为什么要写这个部分呢?
很多时候我们并没有意识到数据集的重要性,总是以为算法设计很重要,当下的LLM的训练和测评结果,其实很大程度跟数据集的选取有关,不同数据集的特点以及适用的任务。
results matching "
"
No results matching "
"