The illusion isn’t in the thinking.
The illusion is in believing that we’ve already understood how to measure intelligence — synthetic or biological.
AI is just getting started. Let’s not mistake our limited lens for its limits.
Apple recently released a provocative paper titled “The Illusion of Thinking,” arguing that Large Language Models (LLMs) don’t actually reason — they just mimic reasoning by memorizing patterns. According to their experiments, when tasks got too complex, models failed to “think.”
But this week, Anthropic’s new research flipped the script — revealing that the fault lies not in the models, but in how we evaluate them.
Here’s what Anthropic found when re-examining Apple’s tests:
1. Token limits—not logical failure—caused many breakdowns.
Truncated prompts led to incomplete problem-solving, which was misclassified as a reasoning collapse. That’s an engineering constraint, not a cognitive one.
2. Some test puzzles were literally unsolvable.
Yet the models were penalized as if they’d failed a solvable task. This is like blaming a calculator for not answering a trick question with no right answer.
3. When models were given compact and structured formats (e.g., Lua functions), they solved the same complex logic puzzles flawlessly.
The implication: It’s not that the models couldn’t reason — it’s that the tests didn’t allow them to reason effectively.
⸻
So, can LLMs really “reason”?
If reasoning means:
• Abstracting structure from data,
• Applying logic across diverse domains,
• Solving novel problems without direct examples,
Then yes — models like Claude, GPT-4, and others demonstrate clear reasoning capability.
It may not look like human thinking, but it functions with comparable utility.
⸻
What’s the real issue? Evaluation.
Apple’s conclusion may not be wrong about the limitations of LLMs — but their methodology was flawed.
When you judge a fish by its ability to climb a tree, you miss its real strengths.
This episode reminds us of a deeper truth in AI research:
Before we claim what AI “can’t do,” we must improve how we test what it can do.
⸻
My takeaway:
The illusion isn’t in the thinking.
The illusion is in believing that we’ve already understood how to measure intelligence — synthetic or biological.
AI is just getting started. Let’s not mistake our limited lens for its limits.
AI MachineLearning LLM ArtificialIntelligence ClaudeAI Apple Anthropic EvaluationBias AIResearch Reasoning FutureOfAI
This post was originally shared by on Linkedin.