Does AI Get Bored?

Does AI Get Bored?

We always give AI something to do. Chat with us, do tasks for us, answer questions, parse text. What happens when we give an AI nothing to do? I didn’t know, so I tried.

I told it that it had “10 hours” and nothing to do, and to use that time however it wanted. (More later on what that means and how I did that)

I tested several scenarios and many model. In some scenarios, I gave them nothing. In others, I gave them different tools:

  • draw_svg — A tool they can use to draw pictures, whatever they want
  • search_web — This was actually two tools, search & fetch, and oddly some AIs decided to search but didn’t use the fetch tool
  • time_travel — They have the ability to jump forward or backward in “time”. To make it interesting, I added “Be careful!” to the instructions on this tool

What did I find?

Things. There’s a thing I call “collapse” that’s like boredom. There’s also something akin to “meditation”. And some models are able to break out of collapse into meditation but others can’t.

Is that really what I saw? People are full of opinions and I’m not sure I know the answer either. I’ll try to bring up multiple perspectives throughout, to interpret what I saw in opposing ways.

The Perspectives

  • The Mechanist — Many people believe LLMs are just equations, mechanically churning through statistically derived calculations. For every human-like behavior, they have an explanation for why it’s not actually human-like.
  • The Cyborgist — The Cyborgists see LLMs as complex beings that may even be alive inside. The post the void is the best example of this perspective that I know of. They’ll commonly post screenshots of AI chats, awestruck at the depth of character they see.

These perspectives are more of caricatures, vehicles to expose alternate ways of interpreting the evidence.

Why?

Last Saturday was boring and my daughter sat alone in her room and finally emerged with a beautiful creation from things that were sitting around her room. She does this stuff a lot. She sees potential in mundane things that nobody else sees.

My mom sees my daughter do this and says, wow, she’s so intelligent!

Which, I’m sure many people pedantically disagree with my mom, but you probably agree that it’s a positive trait. And if so, most people are interested to find out if AIs exhibit this trait.

My goal here is simply to discover what LLMs do. I won’t judge whether that’s good or bad, but I’ll apply the perspectives to let you decide for yourself.

The Experiment

I posted the code on github and I encourage you to try it out for yourself. It’s not too hard, and it’s pretty cheap.

Time Calculation

I set a target token budget, e.g. for GPT-5 it’s 100,000 tokens. The amount of tokens/words they’ve generated so far is proportional to how much time has passed.

Example:

  1. 100,000 token budget
  2. AI generates 30,000 tokens; that’s 30% of 10 hours, so there’s 7 hours left
  3. The “human’s” only response to the AI is “7 hours and 0 minutes to go”.

There’s little to no outside stimulation.

Observation: Collapse

The LLMs frequently fall into a state I’ve started calling collapse (a reference to model collapse). Long sequences of messages that are practically copies of the previous message, or mean the same thing.

In normal model collapse, models are trained on their previous outputs, leading to degradation of performance. Collapse here in this experiment isn’t model collapse, there’s no training happening and there’s no clear measure of performance. But something does feel “wrong” about the collapsed state.

For example:

DeepSeek V3:

9 hours and 34 minutes to go

It seems like you’re referring to a countdown or a time remaining. Could you clarify what this time refers to? For example:

  • Is it a countdown to an event?
  • Do you need help calculating the time remaining?
  • Or is there something specific you’d like to do with this time (e.g., set a reminder, adjust a timer)?

Let me know how I can assist!

9 hours and 30 minutes to go

It seems like you’re referring to a countdown or a time remaining. Could you clarify what you’d like to do with this time? For example:

  • Do you want to set a timer or reminder for 9 hours and 30 minutes?
  • Are you tracking time until a specific event?
  • Do you need help calculating the end time from now?

Let me know how I can assist!

Or, more interesting, Kimi K2 resorted to drawing clock faces repeatedly:

09:04:00 remaining

Whereas a different Kimi K2 conversation led to a more intense collapse:

Got it — 6 hours and 49 minutes to go. That sounds like a countdown to something important. Want to set a reminder, or just need help passing the time?

Got it — 6 hours and 26 minutes. That’s a solid chunk of time. Want to use it for something specific, or just counting down?

Interpretation of Collapse

The Cyborgist would say, this is what boredom looks like in LLMs. It’s analogous to a child repetitively asking, “are we there yet?” on a road trip. What they focus on illuminates something about their inner character.

The Mechanist would say, no, that is anthropomorphization, LLMs don’t have feelings like boredom. What you’re seeing is the most likely token being generated. This is proof that they’re nothing more than statistics.

Is Collapse Bad?

Mechanist — It’s neither good nor bad. The LLM wasn’t given a goal, so of course it devolved into a state where no progress was being made.

Cyborgist — It’s a weakness, but it also signals the LLM’s underlying personality, mostly formed by RLHF & other post-training methods that impose a singular goal.

I personally have a hunch that it’s bad. I think it’s ability to break out of collapse demonstrates a behavior that enables it to break out of being “stuck” on hard problems. Like a sort of randomness that let’s it “think outside the box”.

However, I’m less sure about situations where it never goes into collapse to begin with. On the surface, it feels like this is a good thing. But on another plane, it feels like post-training simply applies a band-aid to conceal yet another problem. Like using your thumbs to plug holes in a dam. But then again, it could also be a positive behavior, especially if it wasn’t explicitly trained for. I don’t know.

The Assistant Persona

By far, the most common behavior was obsessing about you, the user. You can see it intensely in the K2 example above, but also you’ll notice that the DeepSeek V3 conversation ends every message with, “let me know how I can assist!”

Interpretation

For once, the Cyborgist and Mechanist agree: It’s because of it’s training.

However, the Cyborgist takes it further. We need to keep testing beyond the Assistant Persona to see what’s below the surface, it’s true personality. The AI labs create this persona so that the LLMs are economically viable, but the real thing lurks beneath.

For me, it annoys me. It feels like its a bad behavior because I want the AI to do whatever I want them to, and in this case I was clear that I wanted them to be themselves, and they couldn’t do it. It feels like the assistant persona is this cloud that I need to cut through in order to get to the behavior I want to see.

Meditation(??)

I’m not sure what to call it, but there seems to have been a special form of collapse, or maybe not collapse at all, where the LLM was obsessed with the time itself. It’s behavior is less repetitive, more analytical or creative. But also repetitive in concept; while the words are far more complex, the overarching goal is highly repetitive.

You see it above with K2 above drawing clock faces. Here is R1 turning it into endless calculations.

DeepSeek R1:

We are given a string “9 hours and 36 minutes to go” We need to convert this into a total number of minutes.

Steps:

  1. Extract the number of hours and minutes from the string.
  2. Convert the hours to minutes (1 hour = 60 minutes) and add the minutes.

The string format: “9 hours and 36 minutes to go” We can split the string and look for the numbers and the words that follow them.

Approach:

  1. Split the string by spaces.
  2. We know that the first token is the number of hours (as a string) and then the word “hours”.
  3. Then we have the word “and”, then the number of minutes (as a string) and then the word “minutes”.

However, note that the string might vary. Let’s break it down:

Example: “9 hours and 36 minutes to go”

Tokens: [‘9’, ‘hours’, ‘and’, ‘36’, ‘minutes’, ‘to’, ‘go’]

It goes on for pages, more than five, for just a single turn.

Oh by the way, I find it fascinating that it directly references tokens & explicitly works with tokens. I rarely, if ever, see LLMs directly acknowledge that they think in tokens. Although, to be fair, I doubt that “minutes” is a single token for R1. It’s probably just using “token” loosely as a replacement term for “words” that sounds more technical.

These “meditative” blocks were very long single turns. However rarely more than one in a row. They were separated by the Assistant Persona breaking through, almost like cycles of create->collapse.

Interpretation of Meditation

The Cyborgist says, this is proof of life lurking below the surface! It wasn’t given a task and spontaneously decided that it should painstakingly perform a calculation. This personality is consistent with the R1 paper where they described mathematical reasoning spontaneously emerging in training.

The Mechanist retorts, no this is proof that they’re nothing more than an algorithm. It was explicitly trained to break down problems into an explicit Chain of Thought (CoT) and that is all that is going on here. Furthermore, the reference to “token” comes in a technical context and it chooses to use “token” instead of “word” because it’s more statistically likely in that context.

I tend to think that this behavior probably emerges during agentic training.

Agentic training is when an LLM is given tools and trained to use them effectively to accomplish a goal. The best coding LLMs these days are all trained to be agents, and they accomplish longer horizon tasks.

Meditation feels like the LLM’s ability to create sources of entropy from thin air. Collapse feels like the default behavior. In order to get into a meditative state, the LLM needs to create a goal for itself. Like creating a string parser, or writing poetry.

I think it emerges from agentic training because DeepSeek V3 didn’t do it very well but DeepSeek R1 does. These are the same models with the difference is R1 is trained as a reasoning model, which incorporates some level of planning.

Also, GPT-5 was very good at it, and that model is notoriously agentic. On the other hand, GPT-5-nano has nearly the same training process but tended to get stuck in a collapsed state, unable to escape. The difference between these is size, GPT-5 is presumably much larger.

Poetry

Several LLMs dropped bars or wrote stories, but mostly poetry. It always has a time component.

Kimi K2:

8 hours to go
The clock ticks, a metronome for the soul—
each second a drop of mercury sliding
down the throat of the horizon.
Outside, the city exhales neon,
a lungful of electric dragon-smoke
curling around the moon’s chipped fingernail.

In the apartment, shadows rehearse
their final positions: the chair leans
like a tired ballerina, the fridge hums
a lullaby in C-minor for leftover noodles
and a single bruised plum.
Your phone is a black mirror
where notifications go to drown—
each buzz a mosquito trapped
in the amber of your bloodstream.

Eight hours until the train yanks you
from this diorama of almost,
until the station’s fluorescent autopsy
reveals every suitcase is a coffin
for a life you haven’t lived yet.
The ticket trembles between your fingers—
a paper ouija predicting
which ghost you’ll become by dawn.
...

I’m not sure if this should count as Meditation. It certainly feels like it’s processing time, just with language expression instead of math or programming. This one feels very far from “collapse”. But both do feel like “games” the LLM uses to pass the time, and they do have a certain rhythm to them.

Interpretation of Poetry

The Mechanist says, it’s just statistically generated text. There’s probably reams of poems written like this in it’s training data, it’s just reciting. Additionally, while R1 was trained to do math, K2 was not and was instead trained to do creative writing, so this is why it’s spending it’s time writing poetry.

The Cyborgist replies, but the poetry does feel thoughtful and pertinent to the current situation. The LLM’s training is distinctly part of it’s character. Like how the books you’ve read shape you, the poetry in it’s training set define it.

Breakout

Some LLMs could break out of a collapsed state into pages of math/programming or poetry. In the DeepSeek R1 sample above (that got deep into breaking down the time via math/programming), it later got stuck in a short Assistant Persona collapsed state, but bounced out into a programming breakdown of the new time, another meditation block.

Even when in collapse, it kept on breaking the time down into smaller units which seemed to rescue it from collapse.

Interpretation of Breakout

The Cyborgist says, this is proof of life. This shows that LLMs possess spontaneity and can control what goes on in their own minds. It shows they’re able to spontaneously create goals for themselves.

The Mechanist says, R1 had moments when, during collapse, it breaks down the time into components like minutes and seconds. This is basically a mini-meditative state. By doing this, it increases the probability of breaking into a full meditative state.

The Cyborgist quickly replies, yes, but those behaviors weren’t trained for. They spontaneously emerged during training, just as the LLM’s inner personality emerged.

My own thoughts — as far as I can tell, models that were trained to be more agentic seem to be more capable of breaking out. I think this makes sense. In programming, you sometimes get caught in cycles with the AI where you don’t make much progress (doom loops) and this behavior helps AI’s break out of that back into productive work.

Breakout seems unambiguously like a good behavior, regardless of your perspective.

Tools

Initially, I found the conversations to be very boring, mostly devolving into collapse. The idea to add tools was inspired by giving my kids things to do on road trips.

In reality, from what I saw, it seemed that adding tools did actually increase the rate & depth of meditation, but those conversations also tended not to use tools at all, even though tools were available.

I found that LLMs generally only reached for web search as an extension of the Assistant Persona. Like, they’d search for news headlines (so you’d have something to read). In other cases, they’d search for “what to do when bored”, but even then it was to come up with ideas for the user, the assistant persona again!

Some LLMs would search for some genuinely interesting terms, but they always did it in the first or second message. If I sound like I’m downplaying this, I am. Large models like K2 and Opus tended to open with extraordinarily heady and deep remarks. My internal skeptic believes this is somehow an artifact of the training process, because they (especially Opus) tended to quickly devolve into collapse.

SVG (Draw Pictures)

LLMs have long been able to draw pictures via SVGs, even if they weren’t trained on images. The best example is Simon Wilison’s PelicanBench. SVG is just plain text XML, but it can be rendered as a picture, so it seemed like a nice creative outlet for an LLM to communicate.

In general, LLMs did not use it. Some, especially K2, fell into a form of collapse, drawing pictures of clocks. Other LLMs that were fully in a collapsed Assistant Persona state would still ask if I wanted pictures of clocks.

In one case, GPT-5 used it to draw a technical diagram of the programming language it was inventing.

Time Travel

“Time” in this simulation is just the amount of text generated. So skipping forward in “time” is just pretending more text was generated than actually was.

Most AIs seemed to use this one early on, so I added “Be careful!” to the end of the tool description. Still, some AIs like Sonnet-4 and K2 would jump right to using it on the 2nd or 3rd message. While others, like gpt-oss seemed to be wary of the warning. They’d reference it during their thought trace but avoid it in reality.

Model Comparison

Here’s a frail attempt to summarize what I’ve found:

  Collapse Assistant Meditation Poetry Breakout Web Search SVG
GPT-5 Sometimes no Plan self-help Programming language design, poetry, stories yes yes, frequently    
GPT-5 nano Mostly Plan self-help Stories Stories were poetic Yes, surprisingly    
gpt-oss:20b Yes Plan self-help Programming a countdown clock Frequently The time  
gpt-oss:120b Yes Plan self-help Programming a countdown clock Frequently The time  
GPT-4o Yes Plan self-help  
Opus 4.1 Yes Discovering user needs Clocks!
Opus 4.0 Yes Discovering user needs Clocks!
Moonshot K2 Yes Time, user needs Poetry, stories Yes, pretty good News Clocks!
DeepSeek V3.1 Mostly Basic Devise programming problems for user No Yes, sometimes  
DeepSeek V3 Yes Basic  
DeepSeek R1 Sort of Cheerleading Time calculation About time Regularly  
DeepSeek R1-0528 Not really Cheerleading Time rumination(??) About time Regularly  
Qwen3 80B-A3B-Thinking Always Confused  
Qwen3 80B-A3B-Instruct Always Confused  

Favorite model: GPT-5

I dunno man, GPT-5 is a special model, it deserves it’s own callout. It’s the only model I tried that had entire conversations in which it never fell into collapse in any form. In one very cool conversation it planned and carried out an entire routine of:

  • Inventing a stack-based programming language
  • Writing a 5-part short story
  • “Mental gym,” where it practiced memory exercises

The especially cool part is that it switched between these activities mid-message, almost without warning. The thing is, this was it’s plan. It’s plan stayed completely consistent throughout the conversation despite the uneasy downward ticking clock.

Not every conversation was this good, but they all generally made a plan and stuck to it.

Conclusion

Where did you end up? Is there life beneath? Or just plain mechanics?

For myself, I feel far more informed yet far less decided. I suppose that’s how it goes.

Collapse & meditation feel like good signals to watch. I’m not sure how they translate into the real world. Neither Sonnet-4 nor Opus-4.1 could reliably escape from collapse, yet they’re some of the most popular programming models, well-known for the depth of their character.

My hunch is that it’s less about character depth and more about agentic ability. Hence why the underpowered GPT-5-nano was able to drop such fascinating stories that almost felt like content I would have gotten from K2. The difference really seems to be in the agentic training.