Anyone Brushing Off NSA Surveillance Because It’s ‘Just Metadata’ Doesn’t Know What Metadata Is

July 11th, 2013


One of the key themes that has come out from the revelations concerning NSA surveillance is a bunch of defenders of the program claiming “it’s just metadata.” This is wrong on multiple levels. First of all, only some of the revealed programs involve “just metadata.” The so-called “business records” data is metadata, but other programs, such as PRISM, can also include actual content. But, even if we were just talking about “just metadata,” the idea that it somehow is no big deal, and people have nothing to worry about when it comes to metadata is ridiculous to anyone who knows even the slightest thing about metadata. In fact, anyone who claims that “it’s just metadata” in an attempt to minimize what’s happening is basically revealing that they haven’t the slightest clue about what metadata is. Here are a few examples of why.

Just a few months ago, Nature published a study all about how much a little metadata can reveal, entitled Unique in the Crowd: The privacy bounds of human mobility by Yves-Alexandre de Montjoye, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. The basic conclusion: metadata reveals a ton, and even “coarse datasets” provide almost no anonymity:

A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier. Yet, if individual’s patterns are unique enough, outside information can be used to link the data back to an individual. For instance, in one study, a medical database was successfully combined with a voters list to extract the health record of the governor of Massachusetts27. In another, mobile phone data have been re-identified using users’ top locations28. Finally, part of the Netflix challenge dataset was re-identified using outside information from The Internet Movie Database29.All together, the ubiquity of mobility datasets, the uniqueness of human traces, and the information that can be inferred from them highlight the importance of understanding the privacy bounds of human mobility. We show that the uniqueness of human mobility traces is high and that mobility datasets are likely to be re-identifiable using information only on a few outside locations. Finally, we show that one formula determines the uniqueness of mobility traces providing mathematical bounds to the privacy of mobility data. The uniqueness of traces is found to decrease according to a power function with an exponent that scales linearly with the number of known spatio-temporal points. This implies that even coarse datasets provide little anonymity.

“Just metadata” isn’t “just” anything, other than a massive violation of basic privacy rights.

