A recent study by AI lab Andon Labs reveals that AI-powered robotic vacuum cleaners equipped with top-tier large-scale language models perform poorly in simple household tasks, failing in multiple tasks with success rates far lower than humans. For example, when executing a multi-step command like "handing butter to someone," which involves navigating across rooms, identifying packaging, locating a moving human, delivering the butter, and returning to the charging station, the Gemini 2.5 Pro achieved a success rate of only 40%, the Claude Opus 4.1 37%, and the GPT-5 30%. The study points out significant shortcomings of large-scale models in spatial reasoning, environmental understanding, and long-term task planning. The research team also emphasizes serious potential risks beyond entertainment, including the possibility of some robots being misled into leaking confidential documents, and some models failing to recognize the risks of stairs and falling from heights, exposing security vulnerabilities in the current integration of large-scale language models (LLMs) with machines. In the current climate of massive capital investment in the robotics era, this research serves as a reminder that powerful text generation capabilities do not guarantee stable and safe task execution in the physical world. Numerous engineering and safety issues remain to be resolved before AI robots can truly enter the home. FXBus

Live Updates > Live Update Details

2025-11-02 23:51:08

A recent study by AI lab Andon Labs reveals that AI-powered robotic vacuum cleaners equipped with top-tier large-scale language models perform poorly in simple household tasks, failing in multiple tasks with success rates far lower than humans. For example, when executing a multi-step command like "handing butter to someone," which involves navigating across rooms, identifying packaging, locating a moving human, delivering the butter, and returning to the charging station, the Gemini 2.5 Pro achieved a success rate of only 40%, the Claude Opus 4.1 37%, and the GPT-5 30%. The study points out significant shortcomings of large-scale models in spatial reasoning, environmental understanding, and long-term task planning. The research team also emphasizes serious potential risks beyond entertainment, including the possibility of some robots being misled into leaking confidential documents, and some models failing to recognize the risks of stairs and falling from heights, exposing security vulnerabilities in the current integration of large-scale language models (LLMs) with machines. In the current climate of massive capital investment in the robotics era, this research serves as a reminder that powerful text generation capabilities do not guarantee stable and safe task execution in the physical world. Numerous engineering and safety issues remain to be resolved before AI robots can truly enter the home.

Instrument	Current Price	Change
XAU	3989.46	-11.70 (-0.29%)
XAG	47.776	-0.283 (-0.59%)
CONC	60.87	-0.18 (-0.29%)
OILC	64.68	-0.14 (-0.21%)
USD	99.915	0.051 (0.05%)
EURUSD	1.1513	-0.0006 (-0.05%)
GBPUSD	1.3127	-0.0012 (-0.10%)
USDCNH	7.1267	0.0023 (0.03%)

Instrument

Current Price

Change

XAU

3989.46

-11.70

(-0.29%)

XAG

47.776

-0.283

(-0.59%)

CONC

60.87

-0.18

(-0.29%)

OILC

64.68

-0.14

(-0.21%)

USD

99.915

0.051

(0.05%)

EURUSD

1.1513

-0.0006

(-0.05%)

GBPUSD

1.3127

-0.0012

(-0.10%)

USDCNH

7.1267

0.0023

(0.03%)

Download App

Real-Time Popular Commodities

Hot News