Introduction
Online RL
Offline RL
BCQ
TD3+BC
CQL
IQL
XQL
DT
Model-Based RL
MBPO
TD-MPC
Offline2Online
PessimisticQ-Ensemble
FamO2O
Imitation Learning
GAIL
AIRL
IQ-Learn
IRL&LM
RLHF
BasicKnowledge
LoRA
Flash_Attention
Datasets&Benchmarks
IMDB
GSM8K
RewardBench
UltraFeedback
UltraInteract
DeepSpeed
DPO
r2Q*
TDPO
Prospect Theory
CoT
SomeNewPapers
TokenLevelReward
GenRM
Self-correction
IRL+LLM
Prompt-OIRL
MathVerifiers
Let’sVerifyStepbyStep
Math-Shepherd
Openai-o1
Diffusion
DDPM
DDIM
Score-Matching
SDE
Guided Diffusion
Diffuser
DiffusionQL
SfBC
QGPO
SRPO
SomeNewPapers
InContextLearning
scatteredPapers
RewardShift
STaR
scatteredNotes
Published with GitBook
IRL&LM
IRL&LM(Inverse Reinforcement Learning & Language Model)
这个章节我们会涉及几篇比较新的、主要介绍IRL(Inverse Reinforcement Learning)和LM(Language Model)结合的paper。
Imitating Language via Scalable Inverse Reinforcement Learning
results matching "
"
No results matching "
"