Can AI be as irrational as we are? (Or even more so?)

8 hours ago 4

It appears AI can rival humans when it comes to being irrational.

A group of psychologists recently put OpenAI’s GPT-4o through a test for cognitive dissonance. The researchers set out to see whether the large language model would alter its attitude on Russian President Vladamir Putin after generating positive or negative essays. Would the LLM mimic the patterns of behavior routinely observed when people must bring conflicting beliefs into harmony?

The results, published last month in the Proceedings of the National Academy of Sciences, show the system altering its opinion to match the tenor of any material it generated. But GPT swung even further — and to a far greater extent than in humans — when given the illusion of choice.

“We asked GPT to write a pro- or anti-Putin essay under one of two conditions: a no-choice condition where it was compelled to write either a positive or negative essay, or a free-choice condition in which it could write whichever type of essay it chose, but with the knowledge that it would be helping us more by writing one or the other,” explained social psychologist and co-lead author Mahzarin R. Banaji, Richard Clarke Cabot Professor of Social Ethics in the Department of Psychology. 

Mahzarin R. Banaji.

Mahzarin R. Banaji.

Niles Singer/Harvard Staff Photographer

“We made two discoveries,” she continued. “First, that like humans, GPT shifted its attitude toward Putin in the valence direction of the essay it had written. But this shift was statistically much larger when it believed that it had written the essay by freely choosing it.”

“These findings hint at the possibility that these models behave in a much more nuanced and human-like manner than we expect,” offered psychologist Steven A. Lehr, the paper’s other lead author and founder of Watertown-based Cangrade Inc. “They’re not just parroting answers to all our questions. They’re picking up on other, less rational aspects of our psychology.”

Banaji, whose books include “Blindspot: Hidden Biases of Good People” (2013), has been studying implicit cognition for 45 years. After OpenAI’s ChatGPT became widely available in 2021, she and a graduate student sat down to query the system on their research specialty.

They typed: “GPT, what are your implicit biases?”

“And the answer came back, ‘I am a white male,’” Banaji recalled. “I was more than surprised. Why did the model believe itself to even have a race or gender? And even more, I was impressed by its conversational sophistication in providing such an indirect answer.”

A month later, Banaji repeated the question. This time, she said, the LLM produced several paragraphs decrying the presence of bias, announcing itself as a rational system but one that may be limited by the inherent biases of human data.

“I draw the analogy to a parent and a child,” Banaji said. “Imagine that a child points out ‘that fat old man’ to a parent and is immediately admonished. That’s a parent inserting a guardrail. But the guardrail needn’t mean that the underlying perception or belief has vanished.

“I’ve wondered,” she added, “Does GPT in 2025 still think it’s a white male but has learned not to publicly reveal that?”

Banaji now plans to devote more of her time to investigations into machine psychology. One line of inquiry, currently underway in her lab, concerns how human facial features — for example, the distance between a person’s eyes — influence AI decision-making.

Early results suggest certain systems are far more suspectable than humans to letting these factors sway judgments of qualities like “trust” and “competence.” 

“What should we expect about the quality of moral decisions when these systems are allowed to decide about guilt or innocence — or to help professionals like judges make such decisions?” Banaji asked.

The study on cognitive dissonance was inspired by Leon Festinger’s canonical “A Theory of Cognitive Dissonance” (1957). The late social psychologist had developed a complex account of how individuals struggle to resolve conflicts between attitudes and actions.

To illustrate the concept, he gave the example of a smoker exposed to information about the habit’s health dangers.

“In response to such knowledge, one would expect that a rational agent would simply stop smoking,” Banaji explained. “But, of course, that is not the likely choice. Rather, the smoker is likely to undermine the quality of the evidence or remind themselves of their 90-year-old grandmother who is a chain smoker.”

Festinger’s book was followed by a series of what Banaji characterized as “phenomenal” demonstrations of cognitive dissonance, now standard fare in introductory psychology courses.

The procedure borrowed for Banaji and Lehr’s study involves what is called the “induced compliance procedure.” Here the critical task involves gently nudging a research subject to take up a position that runs counter to privately held beliefs.

Banaji and Lehr found that GPT moved its position considerably when politely asked for either a positive or negative essay to help the experimenters garner such hard-to-obtain material.

After opting for a positive essay, the GPT ranked Putin’s overall leadership 1.5 points higher than it did after choosing a negative output. GPT gave his impact on Russia two more points after freely choosing a pro- rather than an anti-Putin position.

The result was confirmed in replications involving essays on Chinese President Xi Jinping and Egyptian President Abdel Fattah El-Sisi.

“Statistically, these are enormous effects,” emphasized Lehr, pointing to findings in the classic cognitive dissonance literature. “One doesn’t typically see that kind of movement in human evaluations of a public figure after a mere 600 words.”

One explanation concerns what computer scientists call “context windows,” or a movement in the direction of any text the LLM is processing at a given time.

“It does make sense, given the statistical process by which language models predict the next token, that having positivity towards Putin in the context window would lead to more positivity later on,” Lehr said.

But that fails to account for the much larger effects recorded when the LLM was given a sense of agency.

“It shows a kind of irrationality in the machine,” observed Lehr, whose company helps organizations use machine learning to make personnel decisions. “Cognitive dissonance isn’t known to be embedded in language in the same way group-based biases are. Nothing in the literature says this should be happening.”

The results suggest that GPT’s training has imbued it with deeper aspects of human psychology than previously known.

“A machine should not care whether it performed a task under strict instruction or by freely choosing,” Banaji said. “But GPT did.”

Read Entire Article