The not-so-silent type: Vulnerabilities across keyboard apps reveal keystrokes to network eavesdroppers


Report by Jeffrey KnockelMona Wang, and Zoë Reichert: “Typing logographic languages such as Chinese is more difficult than typing alphabetic languages, where each letter can be represented by one key. There is no way to fit the tens of thousands of Chinese characters that exist onto a single keyboard. Despite this obvious challenge, technologies have developed which make typing in Chinese possible. To enable the input of Chinese characters, a writer will generally use a keyboard app with an “Input Method Editor” (IME). IMEs offer a variety of approaches to inputting Chinese characters, including via handwriting, voice, and optical character recognition (OCR). One popular phonetic input method is Zhuyin, and shape or stroke-based input methods such as Cangjie or Wubi are commonly used as well. However, used by nearly 76% of mainland Chinese keyboard users, the most popular way of typing in Chinese is the pinyin method, which is based on the pinyin romanization of Chinese characters.

All of the keyboard apps we analyze in this report fall into the category of input method editors (IMEs) that offer pinyin input. These keyboard apps are particularly interesting because they have grown to accommodate the challenge of allowing users to type Chinese characters quickly and easily. While many keyboard apps operate locally, solely within a user’s device, IME-based keyboard apps often have cloud features which enhance their functionality. Because of the complexities of predicting which characters a user may want to type next, especially in logographic languages like Chinese, IMEs often offer “cloud-based” prediction services which reach out over the network. Enabling “cloud-based” features in these apps means that longer strings of syllables that users type will be transmitted to servers elsewhere. As many have previously pointed out, “cloud-based” keyboards and input methods can function as vectors for surveillance and essentially behave as keyloggers. While the content of what users type is traveling from their device to the cloud, it is additionally vulnerable to network attackers if not properly secured. This report is not about how operators of cloud-based IMEs read users’ keystrokes, which is a phenomenon that has already been extensively studied and documented. This report is primarily concerned with the issue of protecting this sensitive data from network eavesdroppers…(More)”.