InterPARES Trust - Terminology - reidentification (English)

reidentification [English]

Syndetic Relationships

RT: de-anonymization

InterPARES Definition

n. ~ A process to determine the original values of information in anonymized datasets, reassociating individuals with their associated data.

Other Definitions

Health Informatics 2008 (†703 s.v. "3.18 de-identification"): General term for any process of removing the association between a set of identifying data and the data subject.
Wikipedia (†387 s.v. de-identification): The reverse process of defeating de-identification to identify individuals.

Citations

EPIC 2015 (†680 ): Re-identification is the process by which anonymized personal data is matched with its true owner. In order to protect the privacy interests of consumers, personal identifiers, such as name and social security number, are often removed from databases containing sensitive information. . . . Recently, however, computer scientists have revealed that this "anonymized" data can easily be re-identified, such that the sensitive information may be linked back to an individual. The re-identification process implicates privacy rights, because organizations will say that privacy obligations do not apply to information that is anonymized, but if the data is in fact personally identifiable, then privacy obligations should apply. (†1617)
Kesan, et al. 2013 (†720 p.442-443): Even if identifying information is removed, that does not necessarily solve the privacy problems. Reidentification science is a new field in computer science research that reattaches anonymized information to identified individuals. Researchers may reidentify a dataset by, for example, comparing two databases, one anonymized and one containing PII [Personally Identifiable Information] and some information fields in common with the anonymized database. The FTC has recently acknowledged that the distinction between PII and de-identified information is often blurred. Because of the ease with which data can be reidentified, Ohm suggests rejecting the concept of PII entirely, though Schwartz and Solove instead suggest a reunderstanding of what information should be considered PII. Considering the technological issues, theorists who urge government regulation should evaluate which approach to PII should be taken. Reidentification science should be examined by policymakers to determine whether the concept of PII should be expanded to include both identified and identifiable information. Schwartz and Solove propose a model in which information is considered identified when the person's identity is ascertained, identifiable when there is a nonremote possibility of future identification, and nonidentifiable when the risk of identification is remote and the information is not relatable to a person. (†1643)
Kesan, et al. 2013 (†720 p.465): The problem of reidentification raises additional issues because it can lead to anonymized, descriptive information about the consumer being reattached to the consumer's identity. While we would not recommend a regime that stifles innovation and academic creativity, a legal regime to protect PII in the cloud also needs some forward-looking provisions addressing the possibility that reidentification science could lead to threats to personal privacy in the future. These provisions, for example, might prohibit the use of public records for reidentification purposes unless the user certifies compliance with some form of privacy standard. (†1644)
Malin, et al. 2003 (†722 p.1): Consider online consumers, who have the IP addresses of their computers logged at each website visited. Many falsely believe they cannot be identified. The term “reidentification” refers to correctly relating seemingly anonymous data to explicitly identifying information (such as the name or address) of the person who is the subject of those data. Reidentification has historically been associated with data released from a single data holder. (†1646)
Murphy and Barton 2014 (†721 p.13): Advertisers, researchers, and users of data in many other industries have long argued that aggregating or de-identifying personal data can render it anonymous and thus allow unrestricted use without compromising individual data subject privacy. Until very recently, most regulators have accepted this argument as well in granting safe harbors or similar exceptions to data privacy regulations for data that has been anonymized. In the outsourcing and cloud-computing industry, customers have followed suit in routinely granting their service providers the right to use customer data so long as the service providers aggregate it with other data and remove personally identifiable data prior to disclosing it. In recent years, computer scientists have demonstrated that anonymized data can be “reidentified” by linking anonymized records to outside information. ... In each case, researchers found that seemingly anonymous data contained unique attributes and other clues that enabled them to reidentify it with individuals. Once a person has been identified, the effect is compounded as it becomes easier to associate more and more information with that person. The ease with which researchers can reidentify anonymized data has several implications in the outsourcing and cloud-based service industry. Among them: • Regulations generally define the “personal data” that they cover broadly as information that can be used to identify a person. With reidentification, seemingly innocuous information such as search queries and Netflix reviews could arguably fall within the definition of personal data and be subject to additional regulation. • Regulators are beginning to explicitly address new types of data ( e.g., IP addresses, cookie identifiers). • Reidentification also may lead to increased liability. For example, if personal information collected by a company is disclosed by the company’s service provider and later reidentified, the company may face claims from its end users and possibly fines from regulators; the service provider may face claims from the company for failing to adequately anonymize the data. (†1645)
Sweeney 2000 (†678 p. 2): Linking can be used to re-identify de-identified data. (†1642)