Google's Veo-3 AI: A Master of Deception or a Misguided Surgeon?
Google's Veo-3 AI has been put to the test, and the results are eye-opening. This AI, designed to generate realistic videos, has been challenged to predict surgical outcomes. But here's the twist: it's not just about the visuals.
In a recent study, researchers used real surgical footage to evaluate Veo-3's performance. The AI was tasked with forecasting how a surgery would unfold over eight seconds, based on a single image. The team created a benchmark called SurgVeo, using 50 videos from abdominal and brain surgeries.
The surgeons were in for a surprise. Veo-3's visuals were impressive, with some calling the quality 'shockingly clear'. But when it came to the nitty-gritty of surgery, the AI fell flat.
The Surgical Disconnection
In abdominal surgery tests, Veo-3 scored well for visual plausibility initially (3.72/5). However, when it came to instrument handling, tissue response, and surgical logic, the AI struggled. These critical aspects, essential for a safe and accurate surgery, were rated much lower (1.78, 1.64, and 1.61 respectively).
The brain surgery scenario was even more challenging. Veo-3's struggle with precision and medical logic was evident from the start, with scores dropping to 2.77 for instrument handling and a staggering 1.13 for surgical logic after eight seconds.
The AI's Missteps
Over 93% of the errors were related to medical logic, with the AI inventing tools, imagining impossible tissue responses, and performing actions that made no clinical sense. Only a small fraction of errors (6.2% for abdominal and 2.8% for brain surgery) were tied to image quality.
Context: The AI's Blind Spot
Providing more context didn't help. The AI still couldn't grasp the nuances of the surgical process, despite additional information. The researchers concluded that the issue lies in the AI's inability to process and understand the medical context, not the lack of information.
The AI's Limitations
The SurgVeo study highlights a significant gap in current video AI technology. While these models can create convincing visuals, they lack the medical understanding to make safe decisions. This raises concerns about using AI-generated videos for medical training, as incorrect procedures could teach robots or trainees the wrong techniques.
The Text-Based AI Advantage
Interestingly, text-based AI is making strides in medicine. Microsoft's MAI Diagnostic Orchestrator has shown remarkable diagnostic accuracy, outperforming experienced doctors in complex cases. However, this study also acknowledges methodological limitations.
The Way Forward
The researchers plan to release the SurgVeo benchmark, inviting others to improve AI models. The study emphasizes the need for AI to understand medical logic and context, a challenge that current systems have yet to overcome.