User Data Leaks via GIFs in Messaging Apps

by Dan Herlihy | Oct 28, 2020

This blog was written in October 2020. Current vulnerability status may be different since original posting.

An investigation into how Teams, Discord, and Signal handle Giphy integrations

When everyone is working from home, a well-timed GIF sent to coworkers can lighten the mood. So, when I arrived at SRA and found that they had disabled Teams’ Giphy search feature, I was disappointed. I asked a colleague why it was disabled and was told “we thought it could be a privacy issue.” Determined to prove that GIFs were harmless, I started examining the Teams/Giphy interaction. My findings led me to investigate how two other popular applications, Discord and Signal, handle user privacy with Giphy searches. Unfortunately, I found that it’s easy to leak user information when searching for the perfect GIF.

Summary

Here is a brief overview of the platforms investigated:

	Teams	Discord	Signal
Owned By	Microsoft	Discord Inc.	Signal Technology Foundation (non-profit)
Pricing	Per-user licensing with Microsoft 365	Free with premium options	Free
Target Audiences	Primarily business and education customers	"Anyone who could use a place to talk with their friends and communities"	People who like privacy

For actions performed on each platform, the following information is disclosed to Giphy:

	Teams	Discord	Signal
Search Results	Search term + ??	Search term + ??	Search term
Search Previews	IP address, tracking token	IP address, Discord channel, tracking token	Tracking token
Loading GIF Messages	IP address, tracking token	None	Tracking token

How Giphy Tracks Users

Here is an example URL returned from the Giphy Search API:

https://media0.giphy.com/media/Ju7l5y9osyymQ/giphy.gif?cid=de9bf95evmdivzh16orm7svyp9ticugu4abuyc3ty2df5y9i&rid=giphy.GIF

We’re going to focus on the “cid” parameter, which appears to be an analytics token. It is unmentioned in Giphy’s API documentation. Here is what I have found out about the cid parameter:

When you make a search request, every GIF returned will have the same cid
The first 8 characters (“app id”) are consistent across every search made with the same API token. For example:
- Teams searches start with “de9bf95e”
- Discord searches start with “73b8f7b1”
- Signal (on Android) searches start with “c95d8580”
The remaining 40 characters (“search id”) vary based on the following factors
- - Search string
  - Number of results requested
  - Results offset
  - Results rating
  - Geographic region (not simply IP address)
  - Time (duration unknown)

For example, take a search for “cats”. Every image returned in the results will have the same search id. If the same API key is used, subsequent searches from the same host will return the same search id, although this does not occur indefinitely. The search ID will change occasionally with time. A friend down the street in the same region but with a different IP would also get the same search id if they used the same API key. However, use of a proxy to simulate requests from other countries confirmed that the search id does change with enough distance.

Since the cid parameter is returned in search results, it tends to be “sticky”. Unless it’s explicitly stripped out by someone, it will be included wherever you send the GIF URL and will persist in the message history.

Signal

Signal has documented its quest for user privacy with GIF searches in two blog posts: here and here. According to their blog posts and a review of their apps’ code on GitHub, here is how Signal apps interact with Giphy:

In the flow above, the user never communicates directly with Giphy’s servers; all communication is tunneled through Signal’s servers first. The URLs all still have the cid tracking parameter, but since no other data such as the user’s IP is ever revealed, the parameter has no effect on user privacy. Additionally, as mentioned in their blog posts, Signal ensures that the Signal servers can’t see any traffic passing through, giving you near-complete privacy.

How can Signal improve user privacy?

Signal has put a lot of thought into how to serve GIFs privately and it shows. There are no obvious privacy improvements they could make to the user flow. If Signal wanted to go above and beyond, they could make fake search and download requests from their servers to hide overall trends in user activity hours, but that has marginal benefits for a lot of extra bandwidth.

Discord

On Discord, the main built-in GIF picker widget uses Tenor, not Giphy. However, there is a built-in /giphy command that allows users to search and send GIFs using Giphy which we will investigate. Here is a diagram of the requests made during a Giphy search on Discord as mapped using Burp:

Note that some of these requests can be WebSocket messages under normal circumstances, but I represent them all as synchronous requests for simplicity.

As you can see, the only time the user connects directly to the Giphy servers is step 6 when previews of all the GIFs are fetched during a user’s search. This is a screenshot of that preview request:

By making a request directly to Giphy, your IP address is exposed. The request also includes two other pieces of trackable information: the cid parameter, and a referer header. As discussed above, the cid parameter is based on your original search request, so including this parameter could allow Giphy to match your IP with your search requests even though the original search request went through Discord’s servers first. However, the larger data leak is the referer header. It includes the exact channel URL that you are currently talking in, whether it’s a public server or a DM. If you and your friends like to send GIFs to each other in servers and DMs, then Giphy could use this information to build a map of who you talk to, who is in which channels, etc.

Besides the preview requests, everything else in the flow respects your privacy. The search requests and the retrieval of full-size GIFs in messages are proxied through Discord’s servers. When a user sends a message with an embedded GIF, Discord appears to use the Giphy API to retrieve the full-resolution URLs. This is independent of any search request so the cid tracking parameter is rendered useless.

Additionally, when Discord reaches out to 3^rd party servers to retrieve media, the request is very clean and does not include any tracking information:

How can Discord improve user privacy?

Strip the cid parameter from preview URLs before returning search results
Do not send referer headers when fetching GIF previews (or on any external request really)
Proxy GIF preview requests through the Discord servers like other external media

Teams

Teams includes an on-by-default Giphy search feature that can be disabled at an organizational level. Here is a diagram of the requests made during a Giphy search on Teams as mapped using Burp:

The user connects directly to Giphy’s servers in steps 6, 8, and 14. This, combined with the cid tracking token could lead to a large amount of data being leaked. Say for example you have a private message with your friends at work. You search for a GIF that relates to a specific inside joke you share and then you send it to the group. Now thanks to the cid token, when each of those people download your attached gif, Giphy can identify which IP addresses belong to the people you’re talking to. Facebook could then use this information along with their data collection from other sites to build a profile of who you talk to at work, what you talk about, how you feel, etc.

Teams also allows custom search extensions. I was curious if I could make one to search Giphy more privately, so I built a prototype. Here is a diagram of requests made by a normal 3^rd party extension in Teams:

With Teams 3^rd-party extensions, the user never directly interacts with media servers because in step 4, Microsoft turns all returned URLs to proxied URLs. This means that Microsoft treats their 1^st party feature differently from normal 3^rd party extensions. Even more surprising is that Giphy URLs are explicitly exempted from proxying. For my 3^rd party Giphy extension clone, when I return raw Giphy URLs, there is no proxying applied. But when I return my custom domain, the URLs are proxied.

Here is when my custom extension returns raw Giphy URLs. There is no proxying.

And here is when it returns my own domain instead of Giphy. The URLs have been changed to proxied URLs.

It appears that Giphy has been given a free pass by Microsoft to bypass their image proxying altogether whether it’s from their built-in features or a third-party extension.

How can Teams improve user privacy?

Strip the cid parameter from returned Giphy URLs
Treat Giphy like a normal search extension and proxy their images

Conclusions

Everyone likes GIFs. Everyone wants privacy. But combining the two is no easy feat. Unintentional leaks of user data will occur unless you design applications with user privacy in mind from the beginning.

Discord and Teams both opaquely proxy the initial search request to Giphy. There is no way for us to know what additional information they are sending to Giphy when that request is forwarded. The Giphy API includes an optional “random_id” parameter that can be sent with search requests to personalize search results per user. I hope that Discord and Teams are not using these parameters and aren’t sending any additional information in headers.

Discord and Teams also have web clients. These can leak even more information because of cookies that Giphy may have set in your browser.

Signal, living up to its privacy-focused mission, has implemented a gif searching solution that preserves user privacy without sacrificing usability.

Discord proxies almost all requests, but the GIF previews still leak quite a lot of information that can be used to paint a picture of user habits and who they’re talking to.

Teams explicitly bypasses their proxies for Giphy media requests, despite already having a privacy-preserving flow for normal 3^rd party integrations.

So, be careful where you search for cat GIFs. You never know who could be watching.

Benchmarked Threat Resilience

Cut Cloud Technology Costs

Vulnerability Management Simplified

Intelligence by Design