Introduction
Online RL
Offline RL
- BCQ
- TD3+BC
- CQL
- IQL
- XQL
- DT
Model-Based RL
- MBPO
- TD-MPC
Offline2Online
- PessimisticQ-Ensemble
- FamO2O
Imitation Learning
- GAIL
- AIRL
- IQ-Learn
- IRL&LM
RLHF
- BasicKnowledge
  - LoRA
  - Flash_Attention
- Datasets&Benchmarks
- DeepSpeed
- DPO
- r2Q*
- TDPO
- Prospect Theory
- CoT
- SomeNewPapers
Diffusion
- DDPM
- DDIM
- Score-Matching
- SDE
- Guided Diffusion
- Diffuser
- DiffusionQL
- SfBC
- QGPO
- SRPO
- SomeNewPapers
InContextLearning
scatteredPapers
- RewardShift
- STaR
scatteredNotes
Published with GitBook

PessimisticQ-Ensemble

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

CSDN上写得还行的博文

results matching ""

No results matching ""