A lot of people talk about ChatGPT as a creativity tool. This paper made me think of it more like a cognitive prosthetic with a hidden cost.
The paper's core claim is simple: ChatGPT can boost creative performance while you're using it, but that boost disappears when the tool is taken away. What remains is something stranger — people's ideas become more similar to each other, and that sameness persists even after they stop using the model.
That is a much more interesting result than "AI helps" or "AI hurts." It suggests a capability tradeoff that short-term benchmarks can easily miss. If you only measure whether people perform better with AI in the moment, the system looks like a success. If you care about diversity of thought, dependence, or whether users are actually developing stronger independent cognition, the picture changes.
What the paper asked
The authors were trying to answer a question that feels increasingly important: when ChatGPT improves creative performance, is that a lasting form of augmentation, or just a temporary lift?
A lot of the public conversation around generative AI assumes the stronger story. The tool helps you do better work, so maybe over time it helps you become better at the work itself. This paper pushes hard against that assumption.
Instead, it asks whether the benefits persist once the tool is removed — and whether using the tool changes not just the quality of what people produce, but the shape of their thinking.
What they actually did
The study followed 61 college students in a pre-registered seven-day lab experiment, with a 30-day follow-up. One group used ChatGPT for creative tasks. The control group worked without it.
Across the study, participants produced thousands of ideas and hundreds of creative solutions. The point was not just to see whether ChatGPT users performed better while the model was available, but whether those gains stayed with them later.
That second question is what makes the paper interesting.
The result that matters most
During the days when participants had access to ChatGPT, their creative performance improved. But once the tool was removed, that advantage vanished. On the later "AI down" day and again at the 30-day follow-up, performance dropped back toward baseline.
So the boost did not appear to become an internalized skill.
But the more surprising result is that the homogenization effect remained. The ideas produced by ChatGPT users became increasingly similar to one another, and that reduced diversity persisted even after the tool was gone.
That is the part I keep coming back to.
The performance gain was temporary. The sameness was not.
Why this is bigger than it looks
In practice, "homogeneity persists" means the model may not just be helping people create. It may be shaping the space of ideas they reach for.
If many users rely on the same system, and the system tends to surface statistically likely, polished, broadly acceptable outputs, then people may start converging on the same patterns. Not because they explicitly choose to imitate one another, but because they are all thinking with the same tool.
That matters far beyond a creativity study.
It matters for writing, learning, brainstorming, research, education, design, even political discourse. A system can increase local productivity while quietly compressing global variation. And if that compression persists after the tool is removed, then the effect is not just on workflow. It is on cognition.
This is exactly the kind of tradeoff that standard evaluations miss. Most benchmark culture is built around immediate performance: faster, better, higher-scoring outputs. But a system can win on those metrics and still be doing something corrosive underneath — narrowing the range of human thought, creating dependence, or reducing the chance that people develop original approaches on their own.
If you only ask "did performance improve?", you miss the more important question: what kind of thinker does this tool produce over time?
Caveats — what not to overclaim
This does not mean "ChatGPT destroys creativity," and I do not think this paper supports that kind of sweeping claim.
The study used a relatively small sample of 61 college students, over a fairly bounded time horizon. That is enough to make the result interesting, but not enough to settle what happens across professions, age groups, or longer-term real-world use.
It also does not show that every AI interface will have the same effect. Different prompting strategies, different workflows, or tools explicitly designed to provoke divergence rather than smoothness could matter a lot.
So I would not read this as a universal verdict on AI and creativity.
I would read it as a warning shot.
My take
What makes this paper important to me is that it reframes the alignment question slightly.
We often talk about aligning AI systems to human intentions. But there is another dynamic here: humans may begin aligning themselves to the model.
Not politically or ideologically — at least not in this paper. Cognitively.
When a model becomes a default partner in creative or intellectual work, people may begin adapting to its patterns — its preferred framings, its average-case style, its smoothness. The danger is not only that the model outputs something wrong. It is that people start orbiting its center of gravity.
That has implications for AI safety, education, and model evaluation.
For safety, it suggests that one under-discussed risk is loss of cognitive diversity. If many people increasingly rely on the same underlying systems, society may become more coordinated in some ways, but also more brittle and less original.
For education, it raises a real question about what it means to "use AI as a study aid." If the performance boost disappears when the tool goes away, then we should be careful about confusing assisted output with learning.
For evaluation, it means we need better metrics than short-term task success. We should be measuring retention, independence, divergence, transfer, and whether a tool expands the user's thinking or merely scaffolds it temporarily.
That feels especially relevant right now, because we are deploying these systems into classrooms and workplaces long before we have good ways to measure their downstream cognitive effects.
AI may give people a temporary lift — but if we are not careful, it could leave us all thinking in narrower, more similar ways.