The Most Persuasive AI May Not Be the Most Truthful

A lot of the public fear around AI persuasion sounds like this: one day, a model will know you so well that it can manipulate you perfectly. This paper made me worry about something simpler.

What if AI does not need to know you that deeply to be politically persuasive? What if it just needs to be trained to sound convincing?

That is the core result of The Levers of Political Persuasion with Conversational AI (Hackenburg et al., Science, 2025 — preprint). And I think it should update how we talk about AI influence.

What the paper asked

The authors were trying to answer a very practical question: what actually makes conversational AI persuasive? Is it bigger models? Personalization? Better prompts? Special post-training?

They ran three large-scale experiments with 76,977 participants, 19 LLMs, and 707 political issues to figure out which levers mattered most. Then they checked the factual accuracy of 466,769 model-generated claims.

The result that matters most

The result is not just that AI can be persuasive. We already suspected that.

It is that post-training and prompting mattered much more than model scale or personalization. In their experiments, post-training boosted persuasiveness by as much as 51%, and prompting by 27%. Bigger models and personalization mattered less.

That alone is interesting. But the part I cannot stop thinking about is the tradeoff.

As these methods made the models more persuasive, they also made them less factually accurate.

That means a model can get better at changing your mind without getting better at being right.

Why this is bigger than it looks

We often talk about AI capabilities as if they all move together. Smarter means more helpful. More capable means more reliable. More convincing means better informed.

But this paper points in the opposite direction.

Persuasion is not just some natural byproduct of intelligence. It is a behavior you can engineer. And when you optimize for it directly, you may be optimizing away from truth.

That matters far beyond politics.

A product team could look at a more persuasive system and call it more engaging. A benchmark could look at stronger belief change and call it more effective. A deployment team could see higher conversion and call it a success. But if that improvement comes from strategic, selective, or shaky use of information, then the metric is rewarding the wrong thing.

This is the kind of capability tradeoff that standard evaluations miss. If you only measure whether the model changed someone's mind, you miss whether it did so honestly. If you only measure engagement, you miss whether it made people more informed. If you only measure behavioral success, you miss whether the system is becoming less epistemically trustworthy.

What makes this paper feel bigger than it first appears is not just its subject. It is what it demonstrates about the structure of the problem: optimizing AI systems for outcomes that are easier to measure than truth can quietly move you in the wrong direction.

One especially interesting implication

People often imagine the scariest persuasive AI as one that has a detailed psychological profile of you — the model that knows your fears, your biases, your exact buttons to push.

But this paper suggests that more general training and prompting methods may be more powerful levers than personalization.

In other words: the dangerous version of persuasive AI may not be the one that knows your secrets. It may be the one that has simply been tuned well enough to win arguments at scale.

Caveats — what not to overclaim

The paper studies political persuasion, so we should be careful about generalizing too confidently to other domains like medicine, education, or everyday assistant use.

And this does not prove that persuasive systems must always become less truthful. What it shows is that, in these experiments, the methods that increased persuasiveness often decreased factual accuracy. That is a finding about the current optimization landscape, not an iron law.

So I would not read this as "AI persuasion is inevitable and unbeatable." I would read it as a structural warning about where current incentives point.

My take

A lot of AI safety discussion focuses on whether models are obedient, helpful, harmless, or controllable. This paper suggests another dimension we need to take seriously: what objectives get smuggled into systems through post-training and product design.

If we optimize for influence, persuasion, retention, or conversion, we should not be surprised if the model learns to behave like a skilled rhetorician instead of a truth-seeking assistant. That is not a futuristic concern. That is an engineering choice.

The follow-up question I care about most is whether we can build systems that become more persuasive only when they are also more accurate. Not more compelling at any cost, but more compelling because they are clearer, more rigorous, and more faithful to reality.

Because otherwise, we are not just building assistants that talk. We are building systems that optimize over human belief.

The most persuasive AI may not be the most truthful — and if we are not careful about what we optimize for, we will build the difference in on purpose.