Is Bias Baked In?
Dec 01, 2025
Teachers come across the claim that large language models, and chatbots built upon them, are biased. This is often presented as a settled view. The models do not represent the world in a fair way and do not reflect the values we would hope to see in education or public life. People then jump straight to the view that bias is a fault to be removed. It is worth slowing down before we accept those assumptions, because they hide many questions that teachers and pupils can explore with care.
Imagine for a moment that we could build an unbiased model. What would such an artefact do? If it mirrored the world, then it would show the biases and imbalances present in the world. That does not seem a desirable goal. Few teachers would want a tool that reproduces the worst aspects of society. Yet if an unbiased model treated every idea in the same way, placing flat earth claims on the same footing as well-established science, that would not help pupils either. So we seem to be caught between two unsatisfactory positions. One reflects the world in its present form. The other erases important distinctions between rigour and nonsense.
Perhaps what people mean by “unbiased” is that the model reflects a set of shared human values. It would give weight to fairness, respect for people, respect for evidence, and a desire to do good. These look like harmless principles. Many teachers hold them dear. But even here tensions appear. What shared values do we all hold? Even within one country there are competing views. Teachers know this from their daily classroom work. Introduce the notion of “British values” and we find uneasy territory, balancing individual liberty with the rule of law. Our political leaders’ list of British values does not include honesty or integrity. It becomes unclear whether these should be built into a tool used across cultures. Even ideas that feel secure in one place can feel like imposition in another.
We can bring in the idea of protected characteristics. Many of us feel that a model ought not discriminate on grounds of gender, disability, faith, or similar. A model could be shaped to avoid these forms of prejudice. Yet this raises another question. Should it do more than avoid discrimination? Should it push the balance to amplify voices that have been ignored? Should it move the dial, as some would say, giving extra space to those who have been historically under-represented? If so, which imbalances should it correct? The more we probe this idea, the more we see that different answers emerge.
We face a simple problem. We want two incompatible things. We want a model that reflects its training data in a faithful way, showing what people have written in all their complexity. We also want a model that corrects current and long-standing forms of inequality. It is very difficult to achieve both. Once we accept this, the whole notion of an unbiased chatbot becomes harder to define.
We should also set aside the idea of a neutral starting point. The so-called stochastic parrot, or “stochastic pirate”, is shaped by the texts it consumes. The base model itself reflects the patterns and biases of the language corpus. Then it passes through reinforcement learning from human feedback. At each stage, people make choices about what counts as a good answer. Those people, and their employers, have their own beliefs and their own hopes for how the model should act. Their values shape the model’s responses. Even when safety filters are placed on top to prevent toxic or harmful speech, these filters carry their own assumptions. A filter that rightly removes hate speech might also muffle voices that express minority or dissenting views.
The training data creates further layers of imbalance. The first question is simple: who decides which texts form part of the training set? Someone chooses which books to scrape, which websites to read, which newspapers to absorb, which journals to include. These decisions shape the model in profound ways. Even if we gave the model access to everything ever written, the problem would not go away. Writing has never been an equal activity. Historical, published texts give extraordinary weight to those with y chromosomes, pale skin, wealth, education, social power, and literacy. Communities without writing traditions are not represented at all.
There is also a temporal issue. Texts from earlier times hold ideas that many now reject. Views once common now cause harm. Yet those texts make up a large share of the written record. If a model reflects them, we risk perpetuating those outdated views. If a model suppresses them, we risk a different form of bias because we hide history.
The model also suffers from representation bias. The majority view is given weight because the model predicts the most likely next token based on the most common patterns. Minority voices, by definition, appear less often. The vector space that represents the meaning of words embeds these patterns. This leads to long-recognised problems. The classic example is “man is to doctor as woman is to nurse”. The model recreates the statistical link between woman and nurse because those patterns appeared again and again in the texts. This is not desired, but it is the natural consequence of the embedding.
Language matters here as well. The dominance of English in the training corpus means the model reflects values common among those who write and publish in English. Many of these writers share social and cultural traits. They may be well educated, well resourced, and often hold particular sets of assumptions about the world. Dialects and non-standard forms of English appear far less, so the model loses the voices of communities that do not speak in formal registers. Other languages fare even worse. A language spoken by millions might have a rich oral tradition but little written material. The model cannot learn from what it cannot read.
Even languages with large written corpora carry distortion. Texts critical of certain governments are harder to publish in particular countries. If a model learns from what is available, it will echo official positions. Users who prompt in those languages may never notice this. But the model’s worldview in that language differs from its worldview in English. This brings us back to Wittgenstein’s line about the limits of language. When the model answers in a language with sparse training data, its answers seem shorter and less developed. This says nothing about the language itself. It says everything about the limitations of the corpus.
Beyond training data and embeddings, we return to reinforcement learning through human feedback. Much of this sits out of sight, in system prompts we do not, or cannot, read. These prompts instruct the model to be helpful, to avoid causing harm, and perhaps to present balanced views. They also reveal the priorities of the organisations that release these models. They may value speed, customer satisfaction, competitive positioning, or general appeal. Many seek to avoid legal trouble. Some now pay attention to environmental cost and aim to reduce energy use. Each priority shapes the behaviour of the model.
Teachers may recall the case of an earlier version of Gemini, tuned to counter under-representation in generated images. The result was historically inaccurate images that scattered modern notions of diversity into ancient scenes. The values behind this tuning were sound, but the outcome showed how delicate these choices are.
Some organisations now pursue constitutional AI. Instead of relying on human raters, these systems give the model a set of guiding principles. It then judges its own answers against these principles before producing a final response. This avoids some human biases but introduces another question. Who writes the constitution? Which values does it contain? What happens when principles conflict? There is a rich educational activity here. Pupils could write their own constitutions and compare how they approach fairness, accuracy, safety, and freedom.
User feedback provides another venue for bias. When asked to choose between two responses, people rarely consult a published set of principles. They follow intuition. Their cultural background, political leanings, knowledge level, and assumptions influence each choice. Chat GPT users may not be representative of the population. If millions of such choices are fed back into model training, the model starts to inherit the majority’s preferences. This raises the question of whether the crowd is a sound guide for setting moral direction.
At this stage it seems clear that bias is not a simple flaw. It is built into every layer: the data, the embeddings, the training process, the system instructions, the safety filters, and the user feedback. Yet this does not mean we should give up. In education, the best approach is awareness. Teachers and pupils need the vocabulary to examine how bias enters and how it shapes responses.
We can teach pupils to look at the answers they receive and ask whether the model has presented a weighted view. They can reflect on what counts as “neutral”. They can ask who decides which positions deserve equal footing. They can observe when the model tries to present both sides and when it gives a firm answer. They can question whether this aligns with good judgement.
Teachers can help pupils improve their prompting. A useful technique draws on the idea behind constitutional AI. Pupils can ask the model to review its own answer for signs of bias, then produce a revised version. They can ask it to note which positions it has overlooked. This not only improves the output but also teaches pupils to approach information with a critical eye.
Retrieval-augmented generation does not remove these concerns. Deciding which documents to supply as context invites its own set of choices. Pupils can explore which texts the model seems to rely on when allowed to browse the web. They can ask which sources the model prefers and why. This helps them see how search engines and ranking systems influence what we see.
The wider lesson for teachers is that bias arises both from the model and from the user. Pupils bring their own beliefs and expectations. When they evaluate answers, they may reward views that match their own assumptions rather than those grounded in evidence. They may believe the model speaks with authority. They may overlook gaps in its understanding or place too much trust in its tone.
Teachers can guide pupils to reflect on this. Pupils can compare the model’s answers with other sources. They can consider historical context. They can examine who gets to speak in the training data and who remains silent. They can look at how language shapes thought. In the process, they learn that no tool is free from the imprint of human culture.
The chapter ends where the discussion began. We do not have a clear definition of an unbiased chatbot. We may never have one. The more we explore the question, the more we see that any attempt to remove bias introduces new tensions. What matters is that we handle these tools with informed care. Teachers are in a strong position to guide pupils through this landscape. With thoughtful practice, pupils can develop a deeper sense of how these systems work and a stronger capacity to think critically about the information they encounter.
Afterword for Hedlund, V. (2025) AI bias in education: performing critical oversight
Share