Introduction
Online RL
Offline RL
BCQ
TD3+BC
CQL
IQL
XQL
DT
Model-Based RL
MBPO
TD-MPC
Offline2Online
PessimisticQ-Ensemble
FamO2O
Imitation Learning
GAIL
AIRL
IQ-Learn
IRL&LM
RLHF
BasicKnowledge
LoRA
Flash_Attention
Datasets&Benchmarks
IMDB
GSM8K
RewardBench
UltraFeedback
UltraInteract
DeepSpeed
DPO
r2Q*
TDPO
Prospect Theory
CoT
SomeNewPapers
TokenLevelReward
GenRM
Self-correction
IRL+LLM
Prompt-OIRL
MathVerifiers
Let’sVerifyStepbyStep
Math-Shepherd
Openai-o1
Diffusion
DDPM
DDIM
Score-Matching
SDE
Guided Diffusion
Diffuser
DiffusionQL
SfBC
QGPO
SRPO
SomeNewPapers
InContextLearning
scatteredPapers
RewardShift
STaR
scatteredNotes
Published with GitBook
SomeNewPapers
RLHF&LLM一些比较新的论文
多关注Google DeepMind等机构的一些新论文,这些论文往往是最新的研究成果,也是最具有前瞻性的。
results matching "
"
No results matching "
"