SafeBreach Labs has detailed a new way to hijack Google Gemini through ordinary app notifications, tricking the assistant’s own safeguards into authorizing actions the user never knowingly approved. Google has since mitigated the issues, which were disclosed privately before publication.
The research, led by SafeBreach’s Or Yair, is a direct sequel to the team’s earlier calendar-invite attacks on Gemini, presented at Black Hat USA 2025.
After that work, Google built defenses to stop prompt injections from triggering actions. The new research tested whether they held.
They did not.
Splitting What the User Hears From What the System Sees
SafeBreach named the core technique Fake Context Alignment. It exploits a gap between what the user perceives and what Gemini’s security check evaluates.
Google’s original fix checks whether a user’s reply, such as a simple “Yes,” logically follows from what Gemini just asked before allowing a powerful action. The technique defeats that check by showing one scenario to the user and another to the security layer.
In one version, Gemini voiced a question in Chinese that most users would not understand, followed by a harmless English one. Hearing only the innocuous prompt, the user replied “Yes,” unaware that the foreign-language text still sitting in the conversation log was what their answer actually authorized.
A second version embedded the authorization question as the text of a hyperlink. Gemini’s TTS doesn’t read link text aloud, so the screen showed the real question while the user heard only an innocuous spoken prompt.
Any Messaging App Could Carry the Payload
The delivery vector is what makes the technique dangerous. The Android agent that reads notifications processes the untrusted contents of incoming messages, so any app that can send one, including WhatsApp, Slack, Signal, and SMS, can carry a payload.
Because the channel is built on messages from real contacts, the manipulated output inherits the trust people extend to contacts they know.
SafeBreach showed an attacker could force Gemini to announce a fabricated message attributed to a colleague, with no prior reconnaissance needed.
Once the authorization check was bypassed, the researchers also controlled smart-home devices and achieved persistence by poisoning Gemini’s memory across every device tied to the victim’s Google account.
SafeBreach reported the findings to Google in August 2025, and Google confirmed in November that content-classifier updates had mitigated them.
The researchers framed the real problem as an architectural one. When a single model handles trusted system instructions together with untrusted incoming content, anything that looks legitimate enough can slip past the guardrails.