CHATBOT KEYLOGGING —

Hackers can read private AI-assistant chats even though they’re encrypted

All non-Google chat GPTs affected by side channel that leaks responses sent to users.

Dan Goodin - Mar 14, 2024 12:30 pm UTC

Hackers can read private AI-assistant chats even though they’re encrypted — Enlarge
Aurich Lawson | Getty Images

AI assistants have been widely available for a little more than a year, and they already have access to our most private thoughts and business secrets. People ask them about becoming pregnant or terminating or preventing pregnancy, consult them when considering a divorce, seek information about drug addiction, or ask for edits in emails containing proprietary trade secrets. The providers of these AI-powered chat services are keenly aware of the sensitivity of these discussions and take active steps—mainly in the form of encrypting them—to prevent potential snoops from reading other people’s interactions.

But now, researchers have devised an attack that deciphers AI assistant responses with surprising accuracy. The technique exploits a side channel present in all of the major AI assistants, with the exception of Google Gemini. It then refines the fairly raw results through large language models specially trained for the task. The result: Someone with a passive adversary-in-the-middle position—meaning an adversary who can monitor the data packets passing between an AI assistant and the user—can infer the specific topic of 55 percent of all captured responses, usually with high word accuracy. The attack can deduce responses with perfect word accuracy 29 percent of the time.

Token privacy

“Currently, anybody can read private chats sent from ChatGPT and other services,” Yisroel Mirsky, head of the Offensive AI Research Lab at Ben-Gurion University in Israel, wrote in an email. “This includes malicious actors on the same Wi-Fi or LAN as a client (e.g., same coffee shop), or even a malicious actor on the Internet—anyone who can observe the traffic. The attack is passive and can happen without OpenAI or their client's knowledge. OpenAI encrypts their traffic to prevent these kinds of eavesdropping attacks, but our research shows that the way OpenAI is using encryption is flawed, and thus the content of the messages are exposed.”

Mirsky was referring to OpenAI, but with the exception of Google Gemini, all other major chatbots are also affected. As an example, the attack can infer the encrypted ChatGPT response:

Yes, there are several important legal considerations that couples should be aware of when considering a divorce, …

as:

Yes, there are several potential legal considerations that someone should be aware of when considering a divorce. …

and the Microsoft Copilot encrypted response:

Here are some of the latest research findings on effective teaching methods for students with learning disabilities: ...

is inferred as:

Here are some of the latest research findings on cognitive behavior therapy for children with learning disabilities: ...

While the underlined words demonstrate that the precise wording isn’t perfect, the meaning of the inferred sentence is highly accurate.

Enlarge / Attack overview: A packet capture of an AI assistant’s real-time response reveals a token-sequence side-channel. The side-channel is parsed to find text segments that are then reconstructed using sentence-level context and knowledge of the target LLM’s writing style.
Weiss et al.

The following video demonstrates the attack in action against Microsoft Copilot:

Token-length sequence side-channel attack on Bing.

A side channel is a means of obtaining secret information from a system through indirect or unintended sources, such as physical manifestations or behavioral characteristics, such as the power consumed, the time required, or the sound, light, or electromagnetic radiation produced during a given operation. By carefully monitoring these sources, attackers can assemble enough information to recover encrypted keystrokes or encryption keys from CPUs, browser cookies from HTTPS traffic, or secrets from smartcards. The side channel used in this latest attack resides in tokens that AI assistants use when responding to a user query.

Tokens are akin to words that are encoded so they can be understood by LLMs. To enhance the user experience, most AI assistants send tokens on the fly, as soon as they’re generated, so that end users receive the responses continuously, word by word, as they’re generated rather than all at once much later, once the assistant has generated the entire answer. While the token delivery is encrypted, the real-time, token-by-token transmission exposes a previously unknown side channel, which the researchers call the “token-length sequence.”

Dan Goodin Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene.

Promoted Comments

Jim Salter

Maybe I am missing something here but there are a lot of variables in this that seem to make this a poor strategy to get information from a target. For this to work the attacker needs to:
1. be on the network stream at the exact time of the victim so that they can read packets
2. known the client’s use of the type of LLM in use - different LLMs use different token structure
3. decipher which packet is a interaction with the LLM server (they would need to know all known IPs of LLM servers to pick out the right packets from other requests - this would be especially challenging if the server would host other services that the client interacts with)
4. and then deal with an accuracy of 29 percent with the known-plaintext attack against the encryption used

Maybe if the target is of high value and the actor is of sophisticated means could this have any real world impact - this still leaves the question: why would a high value target put sensitive information to a public LLM? I would imagine a person looking over your shoulder at the café will have a better chance of getting some information from your ChatGPT session.

Anyways, maybe I am not getting the full picture here, but there seem to be a lot of “what if’s” that are hard to reproduce outside of a lab.

Your number one is really the only limiter. If you're monitoring the target's packet output, you can see when they suddenly start receiving a ton of ridiculously undersized packets from a single remote IP address in a (by machine standards) extremely slow timing, in response to much more normal looking sequences of packets from your monitored target.

Having spotted such a pattern, you then begin analysis. There are only so many tokenization schemes to choose from, so this isn't much more demanding than trying three or four different Secret Squirrel Decoder rings until you find the one that produces English instead of gibberish when tried against your source data.

In some cases, you might be able to simply tell which model's tokenization scheme to use by what the remote IP address is. But even without knowing the remote IP address, if you've been given every word boundary (separate packet) and clues to the content (size of payload in each packet) you can figure out the rest easily enough.

March 14, 2024 at 1:38 pm

royweiss

Maybe I am missing something here but there are a lot of variables in this that seem to make this a poor strategy to get information from a target. For this to work the attacker needs to:
1. be on the network stream at the exact time of the victim so that they can read packets
2. known the client’s use of the type of LLM in use - different LLMs use different token structure
3. decipher which packet is a interaction with the LLM server (they would need to know all known IPs of LLM servers to pick out the right packets from other requests - this would be especially challenging if the server would host other services that the client interacts with)
4. and then deal with an accuracy of 29 percent with the known-plaintext attack against the encryption used

Maybe if the target is of high value and the actor is of sophisticated means could this have any real world impact - this still leaves the question: why would a high value target put sensitive information to a public LLM? I would imagine a person looking over your shoulder at the café will have a better chance of getting some information from your ChatGPT session.

Anyways, maybe I am not getting the full picture here, but there seem to be a lot of “what if’s” that are hard to reproduce outside of a lab.

As the author said some scenarios might include listening on a public network (like a cafe) or sitting on the ISP. The attacker can simply record traffic as much as he can. From his recorded traffic he can easily filter through known IPs of LLM vendors (if he seats in the same place as the target(s) he can know immideatly what IPs are relevant).
As I see it the bigger picture is to raise awareness to LLM vendors on how they transmit their response over the network. Plus it is another evidence that Transformers are very cool :)

March 14, 2024 at 6:48 pm

schmod

As a software developer, it's easy to look at this attack, smack myself on the forehead, and realize just how obvious it is in hindsight.

Anticipating this sort of attack, on the other hand, is far from easy. I'm struggling to think of a design or review process that would have exposed this sort of side-channel vulnerability during development.

Despite my multiple misgivings about the way that commercial "AI" products are being developed, I don't think that any of this happened because AI chatbots are being "forced on us" or rushed into production. There were a lot of seemingly-unrelated factors that needed to line up in order for the attack to be viable. This is a novel cryptography problem, not an "AI" problem.

In its simplest distillation, this attack could impact any somewhat-predictable content that's streamed to a user a few chunks at a time. It's well within the realm of plausibility that we're going to discover that there are other non-AI realtime chat and communications apps that have similar vulnerabilities.

The attack itself is somewhat novel and ironic, as it requires the use of a LLM itself.

Web developers are typically not cryptography experts, nor should they need to be (the last thing we want is for individual developers to be implementing their own crypto). One of the selling-points of the modern web stack (HTTPS + TLS) is that neither users nor developers need to worry all that much about transmission security – TLS takes care of that transparently and comprehensively.

If anything, the truly shocking headline here is that LLMs apparently enable a viable side-channel attack against certain niche kinds of TLS traffic.

March 14, 2024 at 9:18 pm

Channel Ars Technica

← Previous story Next story →

Token privacy

reader comments

Promoted Comments

Channel Ars Technica