← Explainers

What happens after LLM pre-training?

This map explains the post-training pipeline: supervised fine-tuning, preference data, reward modelling, RLHF, DPO, PPO, GRPO, RLVR, evaluation, and where extra compute is going.