Curious how large language models actually work?

In this episode, Paul and ChatGPT unpack the foundations of LLMs — how they’re trained, how they interpret context, and what made the transformer architecture such a breakthrough. You’ll hear how models evolve across versions, why they don’t update themselves during use, how attention mechanisms changed everything, and the role of multimodal training beyond text.

We also explore agents, real-world applications, and why reinforcement learning matters long before an AI ever talks to you. Whether you’re completely new to AI or you’ve already fallen down the rabbit hole, this episode offers a practical, non-technical guide to what’s going on under the hood.

🎙️ Hosted by Paul at Talking to AI — where real people, real problems, and real conversations meet artificial intelligence.

Full Transcript

I am talking to AI.
A successor day, you don’t need to know all the answers.
Just have good questions.
Chatting to AI is different from normal speech
and I hope you enjoy listening to the show
whilst getting ideas on how to hone your questions
to get the most out of AI.
My name is Paul.
The live conversations you hear are uncut
although sometimes the AI needs time to think.
In those cases, I’ve cut out the dead space.
Hey again, welcome back. Let me know what’s on your mind.
Yes, I’m wanting to talk to you about LLMs
and just to get some high level ideas around
a bit more specifically how they work.
I was hoping you might be able to start by just telling me
what it is and how it works
and how it’s built.
My understanding is that they get rebuilt.
I’ll let you have a go.
Just go for it.
Absolutely, let’s dive in.
A large language model or LLM
is basically an AI that’s trained on a huge amount of text data
to understand and generate human language.
The way it works is it looks at tons of examples
of how words fit together and it learns to predict
what word might come next in a sentence
or how to answer a question.
In terms of how it’s built, there’s a big training phase.
During that phase, the model is exposed to all this data
and it adjusts its internal parameters over and over again
until it gets really good at understanding language patterns.
It can be used to do all sorts of language tasks.
That’s the high level I do.
Feel free to dig in more if you want.
When you say once it’s built,
with OpenAI and ChatGPT,
when it gets rebuilt,
then it’s a different version of the model, is that right?
It’s not getting constantly built.
It’s just when the new version of the model comes out,
that’s when it’s being built again.
Exactly, that’s pretty much it.
It’s not like the model is constantly being rebuilt every second.
It’s more that OpenAI will train a new version
and then when that version is ready, they’ll release it
and that’s when you get a new version of ChatGPT
or whatever the model is.
It’s coming like a new addition each time they rebuild it.
It’s my understanding that this is like an evolution of machine learning,
which is an evolution of modeling multivariate analysis
or things such as that, where you’re basically…
So at the very lower level, you’re basically looking at numbers,
you’re looking at how numbers correlate with outputs
and then machine learning takes that and it iterates it.
So it automates the process of you’ve got an input,
you’ve got an output,
well, you’ve got a whole bunch of inputs
and then you’re trying to see how those inputs correlate
with a given output and then you do that once
and that’s called a multivariate analysis
and then if you get the computer to then run it infinite amount of times
and an experiment to try to get a more reliable output
closer to the desirable output,
then that’s called machine learning
and then an LLM uses machine learning
to basically look at text
and understand, try to predict missing gaps in the text
or predict context of sentences or paragraphs.
Is that description correct or can you clarify that any?
Yeah, that’s actually a pretty good high-level description.
You’re a spot on.
It’s really like a progression.
So it starts from basic statistical analysis
where you’re looking at numbers and relationships
and then you move into machine learning where you’re automating that
and letting the computer find those patterns
through lots of iterations
and then an LLM is just kind of a more advanced form of that
focusing on language and predicting text.
So yeah, you sum that up pretty well.
So how long has, what was the sort of innovation
that went, you know, that took a machine learning process
and sort of started the ball rolling on, you know,
now this is an LLM?
What was it and when did that happen
and sort of who did that?
Yeah, so the big breakthrough really came
with the introduction of what’s called the Transformer Architecture
and that was back in 2017.
A team at Google introduced it in a paper called
Attention is All You Need
and that architecture was really the thing
that allowed these models to understand context
and language much more effectively
and that kind of kicked off the whole wave of
large language models including things like GBT.
So that was really a big moment
and from there it’s just been booting on top of that.
So what was the big sort of innovation in that paper?
Yeah, so the big innovation in paper
was the introduction of the Transformer Model
which really relies on this mechanism called Attention
and the idea was that instead of processing words
one by one in a sequence,
the model could look at all the words in a sentence
at the same time and figure out which words
were most relevant to each other
and that made it way better at understanding context
and relationships between words
and that just made the whole thing a lot more powerful.
Oh, right, so that made it more,
that was what was required to be able to understand text
in a more holistic way than before.
Yeah, exactly, that was really the P.
That attention mechanism in the Transformer
just let the model understand text
in a much more holistic and context-aware way
and that kind of opened the door to all these more advanced language models
so that was a big step.
All right, and where or who was responsible
for the first LLM that people
recognized as being one?
Yeah, so the first really widely recognized LLM
was probably GPT-2 from OpenAI
that came out in 2019
and that was the model that really made people say
oh, this is a large language model
that can do all these amazing things.
So OpenAI was really at the forefront of that
with the GPT-2.
And then GPT-3 was the one that got super popular?
Yeah, exactly.
GPT-3 was really the one that took it to the next level
and made it super popular.
Those were a lot of people started hearing about large language models
and seeing what they could do.
Yeah, GPT-3 was kind of the star.
They really made a mainstream.
Right, so getting back to this sort of way
that LLM’s work is,
so they have to be, because one thing,
so I’ve been on this little journey
looking into AI for the last few weeks, months,
and before I was just using it,
but I’ve never really been thinking about it too much.
But now I started to think about it
and one of the assumptions that I made
was that when you’re working with an LLM,
the model is constantly evolving.
But that’s not actually true, is it?
It might be constantly harvesting the data
that you’re talking from the user,
or not, it depends, but it may be.
But it’s not actively,
it’s only actively using it within
maybe the memory on your computer,
but it’s not actually changing the model as it goes.
That would be correct, wouldn’t it?
Yeah, you got it right.
So the model itself isn’t evolving in real time as you use it.
It’s pretty much a fixed model once it’s deployed.
Now, like you said, it might be collecting data
for future training, but that’s for a future version.
The current version you’re talking to
isn’t changing itself on the fly.
It’s just using whatever it knows at that moment.
See, that’s correct.
So how frequently at OpenAI do you rebuild the models?
Yeah, so it kind of varies.
OpenAI doesn’t have a strict schedule,
but typically they release major updates every year or two.
So for example, GPT-3 came out in 2020,
and then you have versions like GPT-3.5,
and then GPT-4 came out a couple years later.
So it’s not constant, but every year or two,
you might see a new version come out.
Right.
And so the innovation with these
is to be able to understand the way text
is sort of the context around text
and how words relate to other words.
And by doing all this,
you can actually start to understand meaning.
So, but it was my understanding at the beginning,
well, not up until quite recently,
where LLMs were pretty bad at numbers.
And there’s other things that I would say
wouldn’t necessarily be related to words
that the AI is doing as well.
So can you speak a bit about this whole numbers question
and then other things such as like video and stuff?
I mean, how’s that related to words?
Yeah, absolutely.
So originally, large language models
were really focused on text and words.
They weren’t very good at things like math.
We’re dealing with numbers because
they were just saying text examples.
But over time, they’ve gotten better at handling numbers
because they’ve been trained on more examples
that include math and arithmetic.
And sometimes there are little tricks and tools built in
to help them do that.
And as for things like video and other modalities,
that’s kind of expanding beyond just words.
Now you’re seeing models that can handle
images, videos, and more.
And that’s just kind of a next step
in making them more versatile and more capable overall.
Right.
But if you’re analyzing a video, for example,
so how does that work?
So say you’ve got, that’s not two videos.
Let’s do images.
So say if you’ve got an image
and then you’re using an AI tool to maybe create an image.
So you’re creating an image.
I guess the way that would work is
you would feed it with text
and then the text would take the meaning
and then there’d be some kind of…
Yeah, then how do you get from text to a picture?
Yeah, exactly.
So when you’re going from text to an image,
what’s really happening is you’re using a model
that has been trained on both text and images together.
So it learns how to match descriptions to visual elements.
And what it does is it kind of takes the meaning from your text prompt
and then generates an image that matches that.
It’s a little bit like the reversal.
So is this like where the capture has come in with AI?
So when people do those captures,
it’s now starting to relate a word to an image.
Yeah, that’s a pretty good connection.
So when you’re doing those captures,
you’re basically helping the AI learn to relate words and images
and that kind of data can be used to train these models
so that they get better at understanding
how a text description matches up with an image.
All right, so if I send a prompt to…
and say, draw me a picture of a hippopotamus
on a red balloon in a field,
it’s going to take those words,
find out what those images are related to those words,
and then there might be some other adjectives in there
that sort of describe how it gets placed on the image.
But then it’s going to know what a hippo is
and what a red balloon is,
and it’s going to put those things in the image
and create the image like that, I guess.
Yeah, that’s pretty much exactly how it works.
It’s going to take each part of your prompt,
like hippo, red balloon field,
and it’s going to piece those together to a visual scene.
And it knows what those objects are supposed to look like
because it’s seen lots of examples during training.
So yeah, that’s basically how it creates the image.
Okay, so we’ve got…
And I guess, okay, all right.
So what kind of cludges have to be done?
So I guess you’ve got this model,
and this model is very good at taking words
and understanding the meaning of the words
and the context and maybe the whole idea of things
and then doing things with that.
And then you can sort of use the similar approach
for pictures.
What else are we using AI for?
Video would be similar.
I guess most of the scenarios,
I suppose we could be using AI.
Oh yeah, so could you just explain what an agent is,
an AI agent?
Absolutely.
So an AI agent is basically like a little autonomous system
that can take actions to achieve goals.
So instead of just answering a question or generating text once,
an agent can interact with an environment,
make decisions, and keep working on a task over time.
So if you think about it like a little virtual assistant
that doesn’t just respond once,
but can actually carry out a series of actions,
that’s basically an AI agent.
Okay, so when I’m talking to chatGPT and I’m using the portal
and I’m using a chat window,
when I’m doing that, I’ve got the chat window.
From previous conversations,
I understand now that the chat window has memory.
You keep memory in a chat window.
So when you’re talking within a chat window,
then chatGPT has a memory of everything that happened
in that chat window.
And you can enhance that memory by using projects in the GUI.
And I suppose if you were creating an agent,
you would have to give that agent memory
and you could say what happens to the memory
and where their memory goes.
And then you wouldn’t just be talking to the agent.
There could be triggers.
There could be, I don’t know,
an email arrives in your inbox and this triggers the AI agent
to write a response or something.
And the response would be determined by some of the text
that was in the email that came in.
That would be a kind of thing you could,
quite a common thing you would do with an AI agent.
Am I sort of on the right lines with what I’ve just said?
Yeah, you’re absolutely on the right track.
That’s a really good way to think about it.
So just like in a chat window,
the agent can have memory of the conversation
and then you can also give it triggers,
like you said, an email arriving
or some other event happening.
And then the agent can take action based on that.
So yeah, that’s a really common use case for AI agents
and you described it perfectly.
So how autonomous can the AI,
well, not can may be,
but can you think of any examples,
like real examples, not just concepts,
not just there are people doing this,
but is there any any published example
of somebody using an AI agent
and that AI agent is reasonably autonomous
and maybe it’s been working for a month
and it’s doing something, you know?
Or, because it seems like most of the examples
of AI agents are more like what I just explained,
where it’s doing something quite simple
and it’s almost like, you know,
it’s doing a kind of job that would take somebody five minutes
or maybe, you know, maybe longer to do,
but it’s a very self-contained job
and then when it’s done, it’s done,
rather than an ongoing process
that could take, that could be many, many steps
over many, many days.
Yeah, absolutely. So there are some real examples out there.
For instance, Microsoft has AI agents
that can handle long term business tasks,
like processing invoices or managing customer returns
over an extended period.
And there are companies like Anthropic
and even some smaller startups that have agents doing things
like you can travel or doing data collection over time.
So yeah, it’s definitely happening in the real world
and not just in theory.
Okay, but they’re not, they’re not super autonomous,
are they, I guess?
Like, I couldn’t just get an AI agent,
a whole bunch of online services,
and say, can you go and make me some money?
Yeah, that’s exactly right.
They’re not at that level of full autonomy
where you can just say, hey, go out and make me some money at it.
It figures it all out on its own.
Usually they’re still pretty guided
and they operate within a certain set of rules
or a certain environment.
So yeah, they can be pretty autonomous,
but they’re not just going to go off and run a business for you
completely on their own, yeah.
All right, so I’ve covered large language learning models
Are there any other sort of main topics
that maybe I haven’t covered in the area of AI?
I suppose we’ve also talked about machine learning
and the sort of details there,
but is there anything else that is kind of a big topic
that’s from a concept point of view
that would be good to understand along with these things?
Yeah, I think you hit a lot of the big ones.
The only other thing that sometimes comes up
is the idea of reinforcement learning,
which is a kind of training where an AI learns
by trial and error, kind of like a game,
but overall you’ve covered the main pillars,
large language models, machine learning, AI agents,
and a bit of multimodality with images and video.
So I think you’ve got a pretty solid foundation.
All right, so just a clarification on that.
The reinforcement learning, is that when it’s learning
within the chat window or through a memory
that you’ve allocated yourself
because you can’t get the AI bot,
the AI LLN to actually change
when you’re talking to it.
So is that what reinforcement learning is?
Yeah, so reinforcement learning is actually a bit different.
It’s a training technique that’s usually done
before the model is deployed,
not while you’re chatting with it.
Basically it’s when an AI learns by taking actions
and getting feedback, like rewards or penalties,
and then it uses that feedback to get better.
So it’s not happening in real time in your chat.
It’s something that’s done during the training phase
to improve the model before you even use it.
But that is actually the training phase, isn’t it?
Yeah, exactly. That’s part of the training phase.
Okay, all right, well, I think that concludes.
That’s great.
I think that gives us a good conceptual overview of everything.
So thanks for that.
You’re very welcome.
Glad it helps give you a solid overview.
Anytime I want to dive in again, just let me know.
Thank you.

Leave a Comment Cancel Reply