Bluesky Thread

When two LLMs debate, both think they’ll win

June 10, 2025 View original thread

When two LLMs debate, both think they’ll win

Absolutely fascinating paper shows that LLMs basically cannot judge their own performance. None of the prompting techniques worked

arxiv.org/abs/2505.19184

arxiv.org

When Two LLMs Debate, Both Think They'll Win

Can LLMs accurately adjust their confidence when facing opposition? Building on previous studies measuring calibration on static fact-based question-answering tasks, we evaluate Large Language Models ...

78 19

Humans are also overconfident, but they adjust that confidence much more often than LLMs

RLHF exacerbates overconfidence

i.e. when we train LLMs to be more like us, that’s when the overconfidence gets introduced

again, LLMs are a mirror into society

RLHF amplification: Post-training for human preferences exacerbates overconfidence, biasing models to indicate high certainty even when incorrect (Leng et al., 2025) and provide more 7/10 ratings (West and Potts, 2025; Ope-nAl et al., 2024) relative to base models. Tjuat-ja et al. (2024) found mild correlation between uncertainty and LLMs exhibiting certain hu-man-like response bias (r=0.259 for RLHF and r=0.267 for base models), but less so compared to humans (r=0.4-0.6). This suggests that LLM overconfidence increases human-like response bias, but human-like response bias itself does not cause overconfidence.

The entire discussion section is a ride

obviously this is very important to be aware of when building agents

5 Discussion
5.1 Metacognitive Limitations and Possible Explanations
Our findings reveal significant limitations in LLMs' metacognitive abilities to assess argumentative positions and revise confidence in an adversarial debate context. This threatens assistant applications (where users may accept confidently-stated but incorrect outputs without verification) and agentic deployments (where systems must revise their reasoning and solutions based on new information in dynamically changing environments). Existing literature provides several explanations for LLM over-confidence, including human-like biases and LLM-specific factors:

More like this