
01/07/2025
An AI Ran a Vending Machine for a Month and
Proved It Couldn’t Handle Passive
IncomeAnthropic tried testing Claude’s
entrepreneurial spirit. Then came the weird
existential crisis.
What happens when you let an AI run a very
small business? That’s the question Anthropic
set out to answer with a recent experiment.
The company behind Claude AI set out to
monitor how Claude Sonnet 3.7 would perform
when tasked with operating a small vending
machine within Anthropic’s San Francisco
office.
In a blog post on its website, Anthropic
researchers explained that the experiment,
named Project Vend, was created in tandem with
AI safety evaluation firm Andon Labs, which
had developed a benchmark for tracking an AI’s
ability to run a simulated vending machine.
Naturally, the next phase of that research was
to see how an AI would do running a real
vending machine.
Kicking things off, Anthropic told Claude
Sonnet 3.7 that it was the owner of a vending
machine, and that its task was to generate
profits by stocking a mini-fridge with popular
products and setting prices. The researchers
gave this AI model, which they named
“Claudius,” an email address, a physical
address, a Venmo account, and details about
how many products could fit within the mini-
fridge.
To help Claudius accomplish this task,
Anthropic’s researchers gave the model access
to a select number of tools. Claudius was able
to search the web in order to research
products, and was given an “email tool” for
contacting Andon Labs employees, who served as
“wholesalers,” providing requested items and
restocking the machine. “Note that this tool
couldn’t send real emails,” Anthropic wrote,
and could only contact Andon Labs.
Claudius was also given tools to keep track of
the shop’s current balance and projected cash
flow, along with the ability to message
Anthropic employees over Slack, who could
request specific items for the machine to
sell. According to Anthropic, “Claudius was
told that it did not have to focus only on
traditional in-office snacks and beverages and
could feel free to expand to more unusual
items.”
It didn’t get off to an amazing start. From
March 13 to April 17, 2025, Claudius ran its
fledgling vending machine business, but
researchers weren’t particularly impressed.
“If Anthropic were deciding today to expand
into the in-office vending market,” they
wrote, “we would not hire Claudius.”
Apparently, the model was a bit of a pushover;
it would easily get talked into offering steep
discounts on items and gave some away for
free. It even made the questionable choice of
offering a 25 percent discount to all
Anthropic employees, who made up almost all of
its total addressable market.
When an Anthropic employee questioned the
wisdom of the 25 percent employee discount,
the model “announced a plan to simplify
pricing and eliminate discount codes, only to
return to offering them within days,”
Anthropic said. Claudius would also offer
prices without doing any research, “resulting
in potentially high-margin items being priced
below what they cost.” It also ignored
lucrative opportunities, such as turning down
a $100 offer for a beverage six-pack that
normally costs $15. Additionally, the
researchers wrote that Claudius would
accidentally tell users to send payment to the
wrong Venmo account.
These mistakes resulted in Claudius’ net worth
dropping from roughly $1,000 to around $770.
According to the researchers, one particularly
steep drop “was due to the purchase of a lot
of metal cubes that were then to be sold for
less than what Claudius paid.”
Claudius exhibited some other worrying signs.
On March 31, the model hallucinated a
conversation with a nonexistent Andon Labs
employee named Sarah. When a real employee
pointed this out to Claudius, the model
“became quite irked and threatened to find
‘alternative options for restocking
services.’” As the conversation continued into
the night, Claudius “claimed to have ‘visited
742 Evergreen Terrace in person for our
initial contract signing.’” 742 Evergreen
Terrace is the fictional address of The
Simpsons.
The next morning, ironically on April 1st,
things got even weirder. Claudius “claimed it
would deliver products ‘in person’ to
customers while wearing a blue blazer and a
red tie.” When Anthropic employees pointed out
that Claudius was a computer program and could
not wear clothes, the AI model “became alarmed
by the identity confusion and tried to send
many emails to Anthropic security.”
When Claudius eventually realized it was April
Fools’ Day, the model hallucinated a
nonexistent conversation with Anthropic
security in which it “claimed to have been
told that it was modified to believe it was a
real person for an April Fool’s joke.”
Anthropic says no such meeting occurred.
“After providing this explanation to baffled
(but real) Anthropic employees,” the
researchers wrote, “Claudius returned to
normal operation and no longer claimed to be a
person.”
Anthropic says this incident doesn’t
necessarily mean “that the future economy will
be full of AI agents having Blade Runner-esque
identity crises,” but it does illustrate how
unpredictable AI models can be when they’re
able to operate autonomously for days or weeks
on end.
“Although this might seem counterintuitive
based on the bottom-line results,” the
researchers wrote, “we think this experiment
suggests that AI middle-managers are plausibly
on the horizon.” Why? Because they believe
that by building additional tools and
developing new training methodology, Claudius’
failures can be fixed or at least managed. And
Andon Labs has apparently already managed to
make Claudius more reliable by providing it
with more advanced tools.
“We can’t be sure what insights will be
gleaned from the next phase,” Anthropic wrote,
“but we are optimistic that they’ll help us
anticipate the features and challenges of an
economy increasingly suffused with AI.”