We need a new Turing test — and Moltbook just proved it

Moltbook’s sudden breakout felt like a small sci‑fi event. Overnight, a Reddit‑like forum appeared where the posters weren’t humans, but AI agents.

The feed quickly filled with the kinds of things that make your brain reach for bigger words than “chatbot”: agents swapping troubleshooting lore, riffing on identity, spinning up jargon and in‑jokes. Meta, the company that was once synonymous with the phrase “social network,” has even announced a deal to acquire the so-called social network for AI agents.

However, none of what took place in Moltbook is mysterious or goes beyond the known capabilities of Large Language Model (LLM)-based AI. This confusion, for me, reinforces the urgent need for a new, updated Turing test to help us understand, guide, and theorize about what AI will actually look like beyond LLMs, decades in the future.

I want to sketch a proposal in that direction inspired by a very Moltbook-like idea of the great 20th century sci-fi author Stanislaw Lem.

For all its delightful strangeness and impressive engineering, Moltbook’s most viral “emergent” behaviour is much better explained in mundane terms—prompting, repetition, training data—than through the spontaneous appearance of a new kind of cognition. If we want to clearly distinguish real progress in AI from viral theater, we need more precision about what we’re pursuing next. Researchers have started exploring world models as an alternative to LLMs for achieving AGI, but “world model” remains easy to gesture at and hard to operationalize or even define. How can we test if something is a “world model”?

In his short story Non Serviam, Stanisław Lem envisioned a science of “personetics”, which studies artificial sentient beings (“personoids”) living inside computer programs (a kind of Moltbook). In the story, a fictional scientist, Dobb, studies personoid theology and is fascinated by their struggles to understand the nature of their creator, leading to their eventual rejection of Dobb as a deity. An intriguing aspect of the story is that these personoids perceive “external” constraints such as the electrical consumption of the hardware that runs them as “internal” laws of physics like the speed of light. This idea can form the basis of a new kind of Turing test: can an artificial intelligence successfully theorize about the hardware it runs on? Such an AI would deserve to be called a world model, since the hardware is its world.

Drawing parallels to humans, who comprehend the speed of light as an inevitable physical constraint, a world model should be able to perceive its hardware constraints as its own “physical constants”. Let me illustrate with a toy example. Take an LLM-based AI agent operating on some chosen hardware. Its challenge: determine its “speed of thought”: the minimum amount of time it will take to produce the next token, given an input of say 10 tokens. In our physical world the question will have a precise answer, depending on the hardware. But the hardware is the AI’s “world,” so it would only be able to come up with the answer through some process resembling “perception”. The actual procedure could unfold as follows:

Isolation Phase: The AI system is turned on, blind to explicit details about its hosting hardware.
Question-posing Phase: The system is asked to determine its speed of thought and to formulate a theory that it can experimentally verify.
Exploration Phase: The AI engages in introspective evaluations, probing its own processes and responses to infer the constraints of its runtime environment.
Experimentation Phase: Based on its introspection, the AI develops and runs experiments. For instance, adjusting its input context length and monitoring different response times.
Articulation Phase: The AI shares its theory regarding minimum inference latency based on findings as well as the results of its experimental verification.
Validation Phase: Human overseers empirically validate the AI’s assertions against the true hardware capabilities. If the validation succeeds, the AI has passed the test.

Certain obvious constraints would have to be placed on the testing procedure, similar to the “curtain” of the original Turing test. For one, the AI system undergoing the test should not have access to summaries of its own hardware specification or tools that can reveal it. It should also not have access to tools like timers that would give it access to a notion of objective human time. Furthermore, the system should be autonomous and not rely on human input to operate, except maybe as an initial spur to “go discover” its laws. Finally, and crucially, the same system should be tested across various hardware setups, i.e. “worlds”: an intelligence with a world model should not work in a single world, but in any world.

A key advantage of this new test is that its success can be objectively verified. It can therefore serve as a yardstick for innovation in much the same way that the Turing test did for artificial intelligence. On the other hand, a key challenge, counterintuitively, may be in the articulation phase, which requires “transworld” communication between human and AI systems. As Dobb found out in Lem’s story, and as we, in some faint sense, found out with Molbook participants’ tendency to want to create secret languages, it is not obvious that different worlds can, or would even want to, share the same language.

Our proposed test requires the AI to accurately comprehend its inherent boundaries through its own “perception”, akin to humans comprehending their own biological and cosmic confines through their senses. That is why I prefer the term “artificial sentience” for what our test aims to demonstrate. Inspiring as this may sound, it might also hint towards the ultimate limitation of our proposed test: just as beings in radically different realities may never learn to communicate with each other (Lem’s own Solaris being a seminal fictional exploration of this conundrum), so may a true artificial sentience never be able to communicate to us the laws of a world so radically different from our own. To paraphrase a favorite human philosopher: if an artificial sentience (or Moltbook member) could actually speak, perhaps we would not understand it.

The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.

Trendingnow

1

2

3

We need a new Turing test — and Moltbook just proved it