de-anonymization [English]

InterPARES Definition

n. (also reidentification, n.) ~ A process to determine the original values of information in sanitized datasets.

General Notes

De-anonymization is typically accomplished by combining an anonymized, de-identified, obfuscated, or similar dataset with other datasets that share elements. Public (government) information made accessible through open-data initiatives increases sources of information that can be cross-referenced to reveal information thought to have been protected anonymized datasets. The use of additional datasets is not always required if the data anonymization techniques make unwarranted assumptions. For example, removing name, street address, and other personally identifiable information may still allow those individuals to be discovered through the combination of other elements; birth date, sex, and ZIP code can identify many individuals, even in large data sets (Sweeney, 2000, 2).

Other Definitions


  • EPIC 2015 (†680 ): In each of the above cases [Netflix study and AOL release of user data], data was re-identified by combining two datasets with different types of information about an individual. One of the datasets contained anonymized information; the other contained outside information – generally available to the public – collected on a daily or routine basis (such as voter registration information), and which includes identifying information (e.g., name). The two datasets will usually have at least one type of information that is the same (e.g., birthdate), which links the anonymized information to an individual. By combining information from each of these datasets, researchers can uniquely identify individuals in the population. (†1557)
  • Ohm 2010 (†679 p. 1705): Reidentification combines datasets that were meant to be kept apart, and in doing so, gains power through accretion: Every successful reidentification, even one that reveals seemingly nonsensitive data like movie ratings, abets future reidentification. Accretive reidentification makes all of our secrets fundamentally easier to discover and reveal. (†1552)
  • Ohm 2010 (†679 p. 1707): Easy reidentification will topple the edifices of promise and expectation we have built upon anonymization. (†1553)
  • Ohm 2010 (†679 p. 1707): The reverse of anonymization is reidentification or deanonymization. A person, known in the scientific literature as an adversary, reidentifies anonymized data by linking anonymized records to outside information, hoping to discover the true identity of the data subjects. (†1554)