ExplainerTechnical AI Knowledge

How AI Models Are Trained: Pre-training, Fine-tuning and Alignment Explained

A clear explanation of how large language models are built — covering pre-training, supervised fine-tuning, reinforcement learning from human feedback, and alignment techniques.

By Edison NguFounder, Edison AI30 May 20265 min read
Quick answer

Quick answer

Large language models are not programmed with rules — they are trained on data. The training process has three distinct phases: pre-training, fine-tuning, and alignment. Each phase shapes a different aspect of the model's behaviour, and understanding the distinction helps leaders assess vendor claims, make sense of capability differences between models, and evaluate when customisation through fine-tuning is genuinely warranted.

What this means

Training an LLM is the process of adjusting hundreds of billions of numerical parameters — the model's weights — so that the model produces useful, accurate, and appropriately behaved outputs. This is not programming in the conventional sense: no one writes rules specifying what the model should say. Instead, the model learns statistical associations from enormous quantities of text, then is progressively shaped toward useful and safe behaviour through subsequent training phases.

The three phases are sequential, each building on the previous:

  1. Pre-training — establishing general language and world knowledge
  2. Fine-tuning (instruction tuning) — teaching the model to follow instructions and perform tasks
  3. Alignment (RLHF and related techniques) — shaping the model to be helpful, honest and safe

Why it matters for business

Understanding this training pipeline matters for three practical reasons.

First, it explains model differences. Two models with similar parameter counts can behave very differently based on their training data, fine-tuning approach and alignment technique. Benchmark scores do not fully capture these differences — behaviour in context does.

Second, it frames the fine-tuning decision correctly. Fine-tuning is often proposed as a solution for making a model perform well on domain-specific tasks. But fine-tuning changes the model's weights — it is a training operation requiring labelled data, compute, ongoing maintenance, and re-evaluation after each update. For most business use cases, well-designed retrieval and prompting outperforms fine-tuning at a fraction of the cost and operational overhead.

Third, it grounds alignment expectations. A model's safety behaviour, tone, and tendency to follow instructions is not inherent — it is trained. Understanding this means that different deployment configurations, different model versions, and different providers will behave differently in ways that reflect their respective alignment choices.

How it works technically

Pre-training: The model is trained on a very large dataset — commonly several trillion tokens — using a self-supervised objective: predict the next token given all preceding tokens. No human labels are required. The model learns grammar, factual associations, logical patterns, coding conventions, and linguistic structures from this data. Pre-training is computationally intensive — frontier model pre-training runs require thousands of specialised processors over weeks or months and costs in the tens to hundreds of millions of dollars.

Supervised fine-tuning (SFT): A pre-trained model is further trained on a curated dataset of (prompt, ideal response) pairs. This teaches the model to follow instructions rather than simply complete text. The dataset is human-generated or human-curated and represents the instruction-following behaviour the developers want the model to exhibit. SFT is much cheaper than pre-training — it uses the same pre-trained weights as a starting point and requires far less compute and data.

Reinforcement learning from human feedback (RLHF): Human evaluators compare pairs of model outputs and indicate which is preferred. These preferences are used to train a reward model — a separate neural network that scores outputs according to human preferences. The language model is then further fine-tuned using reinforcement learning to maximise the reward model's score, producing outputs that humans find more helpful, less harmful and more accurate. Variants include RLHF, RLAIF (using AI feedback instead of human feedback), and Constitutional AI (used by Anthropic), which encodes principles directly into the alignment process.

LoRA and parameter-efficient fine-tuning (PEFT): For organisations undertaking their own fine-tuning, full fine-tuning of all parameters is rarely practical. Low-rank adaptation (LoRA) and its quantised variant QLoRA update only a small subset of parameters, making domain-specific fine-tuning computationally accessible. These approaches are widely used in enterprise fine-tuning projects.

Practical implementation considerations

For most Australian mid-market organisations, the relevant decision is not whether to run pre-training — that remains the domain of major AI labs — but whether fine-tuning is warranted for a specific use case, and if so, which fine-tuning approach is appropriate.

The general guidance: fine-tune when the required behaviour cannot be reliably achieved through retrieval-augmented generation and well-structured prompting, and when a high volume of labelled examples of the desired behaviour is available. Fine-tuning for style, tone, or format is often warranted. Fine-tuning to inject factual knowledge is usually a worse choice than RAG, because knowledge fine-tuned into weights becomes stale as the world changes.

Edison AI's AI training programmes include structured guidance on the pre-training / fine-tuning / RAG decision framework, helping teams evaluate options with reference to their specific data, task requirements and operational constraints rather than vendor marketing claims.

Common mistakes

  • Fine-tuning to inject knowledge rather than to shape behaviour — fine-tuned knowledge is static and will go stale; RAG retrieves current information dynamically and is almost always preferable for knowledge-grounding.
  • Underestimating the ongoing cost of fine-tuning — fine-tuned models require re-training when the base model updates, when data distributions shift, or when requirements change. This ongoing cost is rarely included in initial business cases.
  • Conflating instruction-following capability with alignment — a model that follows instructions well is not necessarily safe or aligned. Instruction tuning and alignment are separate phases with distinct objectives.
  • Assuming all fine-tuning uses the same approach — LoRA, QLoRA, full fine-tuning, and instruction tuning differ in cost, capability impact, and suitability. The choice matters.
  • Not evaluating fine-tuned models on representative production queries — fine-tuned models can excel on the training distribution and fail on variations common in production. Evaluation must use realistic inputs.

What leaders should do next

  1. Before committing to fine-tuning, require a written justification that includes: why RAG plus structured prompting cannot achieve the same result, what labelled data is available, and what the ongoing maintenance cost is estimated to be.
  2. When evaluating AI vendors, ask about their alignment approach — not just their benchmark scores. Models with different alignment techniques will behave differently in sensitive or ambiguous situations.
  3. Ensure your AI governance framework includes a policy on when organisational data may be used for model fine-tuning, under what data handling conditions, and with what privacy review.
  4. Include the knowledge cutoff and training data recency of any model under evaluation — this directly affects its reliability on current regulatory, market and product information.

Edison AI runs practical AI training that turns this understanding into day-to-day team capability.

Frequently asked

Questions, answered.

  • What is pre-training in AI?

    Pre-training is the initial phase of training a large language model, in which the model learns to predict the next token across a very large dataset — typically hundreds of billions of tokens drawn from text sources including books, websites, code and documents. Pre-training builds the model's general language capability and world knowledge.

  • What is fine-tuning and when is it used?

    Fine-tuning is a further training phase that adjusts a pre-trained model on a smaller, task-specific dataset to specialise its behaviour — for example, to follow instructions more reliably, to adopt a particular tone, or to perform well on domain-specific tasks. Fine-tuning is appropriate when prompting alone cannot reliably produce the required behaviour, but it requires labelled data, compute budget and careful evaluation.

  • What is RLHF and why does it matter for AI safety?

    Reinforcement learning from human feedback (RLHF) is a training technique in which human evaluators rate model outputs, and those ratings are used to train a reward model. The AI is then trained to maximise that reward. RLHF is a primary mechanism for aligning AI behaviour with human preferences — reducing harmful outputs, improving helpfulness, and making models more likely to follow instructions rather than exploit literal interpretations of prompts.

Take the next step

Ready to put this into practice?

Edison AI helps Australian businesses move from AI curiosity to practical implementation, with workflow design, team training and measurable outcomes. Tell us about your setup and we'll come back with a sequenced plan grounded in the same thinking you just read.

Article: How AI Models Are Trained: Pre-training, Fine-tuning and Alignment Explained