June 30, 2025

What's really inside an AI experience

A New Kind of Interface

We’re living through a remarkable shift in how humans interact with computers.

It’s not just a better UI or a new app — it’s the first time in computing history that you can simply ask for something in your own words, and the system seems to understand. You don’t need to learn menus or commands. With tools like ChatGPT, it feels like the computer is finally learning your language — not the other way around.

That shift is powered by large language models (LLMs). But it’s not just the underlying technology that makes these tools feel magical — it’s how they’re presented. The fact that this transformation is happening at scale, in public, and in such a familiar format (text box + chat history) is part of what makes it so accessible.

It’s Not Just the AI — It’s the Interface Psychology

Tools like ChatGPT don’t just generate information — they mimic something closer to a conversation. You ask a question, it paraphrases and reflects it back. You adjust, and it adjusts with you.

That mirrors a well-known technique from psychology: paraphrasing. It helps people feel heard. It clarifies intent. It invites correction. And when an AI does it — even imperfectly — it can feel surprisingly useful and collaborative.

This is why LLMs work so well for journaling, brainstorming, or drafting: not because they know everything, but because they give you room to think out loud.

Familiar Interfaces Make the New Stuff Usable

Some of the most effective features in tools like ChatGPT aren’t AI at all. They’re just good UI.

A sidebar of your past conversations.
A persistent chat window.
Scrollable transcripts.
A clear reply box.

These are patterns we’ve learned from messaging apps, note-taking tools, and productivity software. And they work.

If LLMs were truly “the only interface we’d ever need,” you wouldn’t need that sidebar, you could just ask the LLM. But you do. Because memory, structure, and context still matter — even when the system generating the responses is powerful.

This is the same reason tools like Cursor (AI + code editor) and Notion AI (AI + documents) resonate: they embed AI into places we already work, rather than asking us to adopt something entirely new.

ChatGPT Isn’t One Thing – It’s a Bundle of Systems

So what’s actually happening behind the scenes when you use ChatGPT?

It feels like one system — but it’s actually multiple AI (and non-AI) systems stitched together. Here’s a breakdown of just a few of them:

Large Language Model (LLM)
Generates and refines text responses from your prompts.
Voice to Text (Speech Recognition)
Transcribes your spoken input into text so the LLM can process it.
Text to Voice (Speech Synthesis)
Converts the LLM’s written reply into spoken audio.
Image Understanding (Vision Model)
If you upload a photo or screenshot, this system interprets what’s inside it.
Search & Retrieval
Sometimes the model pulls in relevant information from external sources before answering.
Conversation History
This isn’t AI — just useful UI. It lets you revisit, reuse, or build on past interactions.
See the appendix below for a more comprehensive list.

The point here isn’t to complicate things — it’s to give you a better mental model for how these systems work. Because if you’re considering how AI might be useful in your environment, you don’t need all of this. You can start with just one.

Some use cases

Where to Begin

The most successful AI projects don’t start by building everything at once.

They start by identifying a specific need — then trying a single AI capability in a focused context. That might be transcription, summarisation, search, or content generation. Once you’ve seen how that performs, you can add more — or stop there.

👉 Start Small: Prototype One Capability Before You Scale

Appendix

A Modular View of AI Capabilities

Below is a breakdown of common AI capabilities — the building blocks behind most modern AI tools. You can use them individually, or combine them based on your needs.

🧠 Language Models (LLMs)

Predict and generate natural language text.

Chatbots, summaries, drafting, knowledge queries
Tools: ChatGPT, Claude, Gemini, Mistral

🗣️ Speech Interfaces

Convert between voice and text.

Voice to Text (ASR): Dictation, transcription, voice control
Text to Voice (TTS): Voice assistants, accessibility
Speech-to-Speech Translation: Real-time multilingual communication

👀 Vision & Image Systems

Interpret or generate visual content.

Image Understanding: Detect objects, classify photos, extract text (OCR)
Image Generation: Create visuals from prompts
Video Generation: Animate or simulate motion and scenes

🧩 Text Understanding & Structuring

Extract meaning from unstructured text.

Classification: Label messages by topic or priority
NER: Pull out names, places, dates
Intent Detection: Understand user goals
Topic Modeling / Summarization: Organize or condense large datasets

🔎 Retrieval & Search

Find relevant information based on meaning, not just keywords.

Vector Search / Embeddings: Similarity-based search
RAG (Retrieval-Augmented Generation): Combine search with LLMs for grounded answers

🕹️ Decision & Control Systems

Guide actions and responses.

Recommendation Engines: Suggest content, products, or actions
Anomaly Detection: Spot fraud, errors, or unexpected behavior
Reinforcement Learning: Optimize decisions over time in complex systems

🤖 Agents & Automation

(Of all the examples listed so far, these are probably the most hyped for what they might be able to do, as opposed to what they can do today, and some tools are still traditional software using the tools above posing as being an agent or robotic automation).

Take action toward goals using tools or APIs.

AI Agents (AutoGPT-style): Plan and execute multi-step workflows
RPA + AI: Combine rule-based automation with flexible AI inputs