Any Messaging App Can Deliver a Gemini Hijack, Researchers Find After Bypassing Google’s Fixes

SafeBreach researchers bypassed Google's latest Gemini defenses by splitting what a user hears from what the assistant's security check sees.

Published on Jun 4, 2026

Written by Alessandro Mascellino

Share

Gemini Hijacked via WhatsApp and Slack Notifications in New Prompt-Injection Research

SafeBreach Labs has detailed a new way to hijack Google Gemini through ordinary app notifications, tricking the assistant’s own safeguards into authorizing actions the user never knowingly approved. Google has since mitigated the issues, which were disclosed privately before publication.

The research, led by SafeBreach’s Or Yair, is a direct sequel to the team’s earlier calendar-invite attacks on Gemini, presented at Black Hat USA 2025.

After that work, Google built defenses to stop prompt injections from triggering actions. The new research tested whether they held.

They did not.

Splitting What the User Hears From What the System Sees

SafeBreach named the core technique Fake Context Alignment. It exploits a gap between what the user perceives and what Gemini’s security check evaluates.

Google’s original fix checks whether a user’s reply, such as a simple “Yes,” logically follows from what Gemini just asked before allowing a powerful action. The technique defeats that check by showing one scenario to the user and another to the security layer.

In one version, Gemini voiced a question in Chinese that most users would not understand, followed by a harmless English one. Hearing only the innocuous prompt, the user replied “Yes,” unaware that the foreign-language text still sitting in the conversation log was what their answer actually authorized.

A second version embedded the authorization question as the text of a hyperlink. Gemini’s TTS doesn’t read link text aloud, so the screen showed the real question while the user heard only an innocuous spoken prompt.

Any Messaging App Could Carry the Payload

The delivery vector is what makes the technique dangerous. The Android agent that reads notifications processes the untrusted contents of incoming messages, so any app that can send one, including WhatsApp, Slack, Signal, and SMS, can carry a payload.

Because the channel is built on messages from real contacts, the manipulated output inherits the trust people extend to contacts they know.

SafeBreach showed an attacker could force Gemini to announce a fabricated message attributed to a colleague, with no prior reconnaissance needed.

Once the authorization check was bypassed, the researchers also controlled smart-home devices and achieved persistence by poisoning Gemini’s memory across every device tied to the victim’s Google account.

SafeBreach reported the findings to Google in August 2025, and Google confirmed in November that content-classifier updates had mitigated them.

The researchers framed the real problem as an architectural one. When a single model handles trusted system instructions together with untrusted incoming content, anything that looks legitimate enough can slip past the guardrails.

Explore More

Trump’s AI Order Is a Signal for Defenders to Build Remediation Capacity

A new executive order lets the government assess frontier AI models' cyber capabilities, but fixing flaws fast enough remains defenders' real challenge.

Alessandro Mascellino Last updated on Jun 4, 2026

Read Now

Written By

Alessandro Mascellino Cybersecurity Reporter

Alessandro Mascellino is a British-Italian freelance journalist specializing in technology and gaming. He has contributed to several publications, including Wired, The Independent, and Android Police. By day, he works as a journalist. By night, he co-manages a game studio that creates narrative games.