---
title: "WTF is AI actually doing?"
author: "Connor Adams"
date: 2026-03-06
tags: ["ChatGPT", "Claude", "Claude Code", "Gemini", "GPT", "LLM", "AI"]
canonical: "https://connoradams.co.uk/blog/wtf-is-ai-actually-doing/"
---

# WTF is AI actually doing?

## Intro

AI can really feel like magic. It can surprise us in how both "clever" and "stupid" it can be. But it's not magic, it's just maffs init™.

Forming your own mental model of how LLMs work helps you use them well. I like to think of LLMs as four ingredients combined, and this mental model helps me anywhere from prompting ChatGPT to setting up Claude Code for agentic engineering - and I hope it helps yous too.

Let's think about what happens when we ask ChatGPT about the weather: it ["thinks" 🧠](#-3-predict-thought-and-reflection), [calls a weather API 🛠️](#%EF%B8%8F-4-predict-taking-actions), [gives a helpful answer 💬](#-2-predict-an-assistant-response), [word by word 🎱](#-1-gpt-predict-the-next-word).

<div id="weather-gif">
  <ReplayGif
    client:visible
    src="/assets/media/ai-weather-conversation.gif"
    alt="ChatGPT checking today's weather - thinking, then a tool call, then an answer"
    class="w-full rounded-xl my-8"
  />
</div>

_I've simplified things here - if you spot anything misleading, please [drop me a message](/#contact-form)._

> **Interactive post**
>
> This post has interactive elements throughout. Keep an eye out for buttons 🔵 to tap.

---

## 🎱 1. GPT: Predict the next word

GPT stands for Generative Pre-trained Transformer. That sounds a bit technical and the name doesn't matter much, but there's a reason it's called "generative AI" - it's generating words. Or rather tokens. A token is roughly ¾ of a word. I'll say "words" and "tokens" interchangeably but technically they're tokens.

You give it the start of a sentence, or a prompt, and it predicts the next word - over and over until it thinks that it's done. That's it, that's the whole thing. But from that, we can do some very useful things.

<TokenPredictor client:visible />

Notice that last prediction? `<EOS>` is a special end-of-sequence token that tells the model to stop generating.

The architecture that made this work is the Transformer (the T in GPT), introduced in Google's 2017 paper Attention is All You Need.

What makes it work so well is a "self-attention" mechanism - it's learnt what words mean in context. The word "bank" next to "river" means something completely different to "bank" next to "money".

<AttentionVis client:visible />

---

## 💬 2. Predict an assistant response

Let's put the **chat** in ChatGPT. We've made a text prediction machine, but it's not that useful yet. The base model was trained on loads of long bits of text so it's eager to keep writing. This means that when you ask it a question it just keeps going, beyond a useful answer - or it might generate more questions instead of an answer.

<ChatVis client:visible />

Try the middle tab above `🎱 Chat formatting`. That was the original hack: slap "User:" and "Assistant:" labels on your prompt and hope the base model gets the hint. It sort of works, but without stop sequences the model generates both sides of the conversation. It doesn't know when to shut up.

So the model creators take the base model and then apply instruction tuning by providing examples of chats between a user and assistant and then the LLM learns to predict responses that are useful to the user and to stop generating at an appropriate time (using that special end-of-sequence token).

---

## 🧠 3. Predict thought and reflection

Even after instruction tuning, these models can still be surprisingly stupid and bullshit confidently. Maths problems, multi-step logic, anything a human would need to think about - the models try to predict an answer in one-shot and can often get it wrong if the answer doesn't already appear in the training data.

One fix is to let the model "think" before it answers. The idea of [thinking step by step](https://arxiv.org/abs/2205.11916) was first shown as a prompting technique - you ask the model to reason through a problem. This is still the same next-word prediction, but now it's predicting a thinking process rather than jumping straight to a conclusion.

Modern reasoning models do this automatically. Depending on the app/harness you're using, you can see the model "thinking" about a problem before answering.

<ReasoningVis client:visible />

This extends beyond maths. Planning, debugging, coding - anything a human would think about before acting, the models benefit from thinking about too.

---

## 🛠️ 4. Predict taking actions

The models don't know what the weather is. They have no internet access, no clock, no ability to run code. They're still just a token prediction machine. But the tokens don't have to be prose, we can have it predict structured actions that the app/harness can interpret.

Here's the trick: you tell the model in its system prompt what tools it has available. The model predicts a structured tool call (e.g. `get_weather("London")`). Something outside the model executes that call, gets the result, and feeds it back in. The model then predicts a response using the real data.

This is exactly what's happening in [the weather GIF at the top](#weather-gif). All four ingredients are there: reasoning mode produces the thinking bubble, tool calling hits the weather API behind the scenes, and the response is next-word prediction using the real data.

<ToolCallVis client:visible />

The bash tool in Claude Code is a Swiss army knife. Instead of building a specific tool for every task, you just give the model a shell. It can run any command, read any file, install packages. That's why Claude Code feels so capable - it's a powerful model equipped with decades of tooling built with the unix philosophy.

---

Those are the four ingredients. But there are a couple of things worth knowing about how these models work day-to-day.

## Knowledge cutoff

The model was trained on data up to a certain date. Ask it about anything after that and it either doesn't know or, worse, hallucinates.

Ask a model with a 2024 cutoff and no internet access who the UK Prime Minister is in February 2026 and it'll confidently get it wrong.

This is why web search is such a useful tool: models can stay up to date without pre-training a new model - which takes a lot of time and compute 💸

---

## Context window

Everything we've talked about: your input/prompts, the model's reasoning, tool calls and their results, the final output - it all lives in a fixed-size workspace called the **context window**.

Modern models have large windows (Claude typically has 200k tokens, roughly 150,000 words). But they're not infinite. Fill it with irrelevant stuff - a long conversation from an unrelated topic, or thousands of lines of code you won't actually use - and you dilute the model's [attention](#-1-gpt-predict-the-next-word).

<ContextWindowVis client:visible />

Context management and context engineering are important skills for working well with LLMs and agents. If you're not thinking about context and you're getting bad results, that might be why. You ought to be deliberate about what goes in the context. Techniques like subagents help minimise context rot - more about that another time.

**Compaction**: most agent apps/harnesses automatically summarise the conversation when it gets towards the limit of the context. This can work well, but important details for the current task may get lost. If something seems off after a long session, starting fresh often helps.

---

## LLMs have amnesia

Between conversations, the model remembers nothing. No persistent memory. Every conversation starts from a blank slate.

The memory features in ChatGPT, Claude, and others are just automated context management - they retrieve relevant summaries and inject them into your next conversation's context window.

For coding agents, the equivalent is markdown files. `CLAUDE.md` or [`AGENTS.md`](http://agents.md/), these get injected into context at the start of every session. The model's "memory" of your project is just a file it reads. Which means you can edit it, version control it, and reason about exactly what the model knows when it starts each conversation.

> **Use memory files intentionally**
>
> If you use coding agents, your `CLAUDE.md` or `AGENTS.md` is literally the model's memory. Keep it useful, concise and accurate. 💡 Each time your model misbehaves or surprises you. Either fix your prompt, or your environment/code/tools (i.e. improve architecture, add tests/coverage/linters), or add a specific instruction to your memory files to steer the agent next time. ⚠️ Watch out for `/init` commands, it can make memory files that [perform worse](https://arxiv.org/abs/2602.11988).

Claude Code now has auto memory `MEMORY.md` which tries to automatically manage your memory, read more about it [here](https://code.claude.com/docs/en/memory#auto-memory).

---

## Summary

ChatGPT, Claude, Gemini - they're all the same four ingredients:

1. 🎱 **Next-word prediction** trained on massive text data
2. 💬 **Instruction tuning** to make it helpful and conversational
3. 🧠 **Reasoning** to think before answering
4. 🛠 **Tool calling** to interact with the real world

LLMs are just very capable text predictors, trained to be helpful, with thinking time and tools bolted on. Understanding this helps you use these models better - you'll recognise why they fail, how to prompt them effectively, and what your coding agents are actually doing.
