Lexicon

Pretraining vs fine-tuning

Two stages, two objectives: broad statistical modeling first; task-shaped behavior second—often including preferences and safety policies.

What pretraining optimizes

Pretraining pushes a model to compress patterns in large text corpora: syntax, genre conventions, factual associations (with no guarantee of truth), and shallow reasoning regularities that emerge from next-token prediction. The result is a powerful generative prior—useful, but not inherently aligned with instructions, refusals, or citation discipline. Scale shifts which skills emerge first; it does not automatically inject judgment. For how scaling interacts with marginal returns, see compute curves and scaling laws.

What fine-tuning changes

Instruction tuning, supervised fine-tuning (SFT), and preference optimization reshape behavior: tone, format, tool use, and refusal boundaries. Reinforcement learning from human feedback (RLHF) is one popular family, but not the only one. The key conceptual point: fine-tuning aligns behavior in situations humans rated, not a complete moral theory. For nuance on what RLHF does and does not promise, read RLHF: alignment of behavior, not beliefs.

Fine-tuning can also specialize models to domains—medicine, law, coding—where the base model’s priors are too generic. That specialization can improve metrics while leaving rare failure modes intact; evaluation must follow the domain, not the leaderboard.

Where harms can originate

Toxic or biased generations can reflect pretraining data, under-specified fine-tuning objectives, or mismatches between deployment contexts and rating sets. Privacy risks can arise when fine-tuning memorizes user content—see fine-tuning on conversations. Synthetic data loops, used to augment training, can amplify systematic errors unless carefully filtered—discussed in synthetic data: amplifier or echo chamber.

How to read product claims

When a vendor says “we aligned the model,” ask which stage changed, with what data, and against what adversarial evaluation—not only whether the chatbot sounds polite. Politeness and correctness are different axes; benchmarks often measure only one. Our essay on benchmarks as proxies explains why leaderboard scores diverge from messy workflows.

Continued pretraining and domain adaptation

Teams often insert a "continued pretraining" phase on curated corpora before instruction tuning—useful when generic web text under-represents a domain (medicine, law, codebases). The risk is catastrophic forgetting if learning rates or data mixes are wrong; evaluation on both general and specialist tasks is mandatory.

Parameter-efficient fine-tuning

Adapters, LoRA, and prefix tuning update small parameter subsets— lowering cost and making per-tenant customization feasible. They also change failure modes: multiple LoRAs composed poorly can interfere; routing and capacity planning become product concerns, not notebook details—overlapping open-weights debates about who may ship which adapter on whose base model.