TrustArc Blog

Can You Legally do Analytics Under the GDPR?

July 17, 2017

by Gary LaFever, CEO of Anonos
Taking the “personal” out of Personal Data®

Many companies aren’t yet aware that they are or will be doing anything wrong processing analytics or using historical data bases under the GDPR. While many companies are understandably focused on conducting data inventories and data protection impact assessments, it is critical to note that inventories and assessments will not support new legal bases required under the GDPR for processing data analytics or for using historical databases involving EU personal data.

An important aspect of the GDPR is the new requirement that “consent” must be specific and unambiguous to serve as a valid legal basis. In order for “consent” to serve as lawful basis for processing personal data, it must be “freely given, specific, informed and an unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her.”[1] These GDPR requirements for specific and unambiguous consent are impossible to satisfy in the case of iterative data analytics where successive analysis, correlations and computations are not capable of being described with specificity and unambiguity at the time of consent. In addition, the GDPR has no “grandfather” provision allowing for continued use of data collected using non-compliant consent prior to the effective date of the GDPR.

To lawfully process data analytics, and to legally use historical databases, containing EU personal data, new technical measures that support alternate (non-consent) GDPR-compliant legal bases are required. After May 25, 2018, companies that continue to rely on consent for analytics, AI and use of historical databases involving EU personal data will be noncompliant with GDPR requirements and therefore subject themselves, as well as co-data controller and data processor partners,[2] to the risk of well-publicized fines of up to 4% of global turnover or 20 Million Euros, whichever is greater. The good news is that new technical requirements under the GDPR – Pseudonymisation and Data Protection by Default – help to satisfy alternate (non-consent) legal bases[3] for data analytics and use of historical databases involving EU personal data.

GDPR-Compliant Pseudonymisation

The GDPR embraces a new risk-based approach to data protection and shifts the primary burden of risk for inadequate data protection from individual data subjects to corporate data controllers and processors. Prior to the GDPR, the burden of risk was born principally by data subjects because of limited recourse against data controllers and the lack of direct liability for data processors.

The GDPR recognizes that static (persistent) purportedly “anonymous” identifiers used to “tokenize” or replace identifiers are ineffective in protecting privacy. Due to increases in volume, variety and velocity of data combined with advances in technology, static identifiers can be linked or readily linkable due to the Mosaic Effect[4] leading to unauthorized re-identification of data subjects. Continued use of static identifiers by data controllers and processors inappropriately places the risk of unauthorized re-identification on data subjects. However, the GDPR encourages data controllers and processors to continue using personal data by implementing new technical measures to “Pseudonymise” [5] data to reduce the risk of unauthorized re-identification. GDPR compliant Pseudonymisation requires separation of the information value of data from the means of linking the data to individuals. In contrast to static identifiers which are subject to unauthorized relinking via the Mosaic Effect, dynamically changing Pseudonymous identifiers can satisfy requirements to separate the information value of personal data from the means of attributing the data back to individual data subjects.

Data Protection by Default

The GDPR imposes a new mandate to provide Data Protection by Default,[6] which goes further than providing perimeter only protection and is much more than merely “privacy by design.” It is the most stringent implementation of privacy by design. Data Protection by Default requires that data protection be applied at the earliest opportunity (e.g., by dynamically Pseudonymizing data) and requires that steps be affirmatively taken to make use of personal data. This is in stark contrast to common practices prior to the GDPR, when the default was that data was available for use and affirmative steps had to be taken to protect the data. Data Protection by Default requires granular, context sensitive control over data when it is in use so that only the data proportionally necessary at any given time, and only as required to support each authorized use, is made available.

GDPR Technical Requirements and Data Stewardship

Prior to the GDPR, risks associated with not fully comprehending broad grants of consent were borne by individual data subjects. Under the GDPR, broad consent no longer provides sufficient legal basis for data analytics or use of historical databases involving personal data. As a result, data controllers and processors must adopt new technical safeguards to satisfy an alternate legal basis. GDPR requirements may be satisfied by complying with new Pseudonymisation and Data Protection by Default requirements to help support alternate (non-consent) legal bases for analytics and use of historical databases.

Even in situations where a company is not required to comply with EU regulations, compliance with GDPR requirements for Pseudonymisation and Data Protection is evidence of state-of-the-art initiatives to serve as a good steward of data thereby engendering maximum trust with customers.

[1] See Recital 32 and Article 4(11).

[2] See Articles 26 and 82.

[3] See Articles 6(1)(b)-(f).

[4] The “Mosaic Effect” occurs when a person is indirectly identifiable due to a phenomenon referred to by the Article 29 Working Party as “unique combinations” where notwithstanding the lack of identifiers that directly single out of a particular person, the person is still “identifiable” because that information may be combined with other pieces of information (whether the latter is retained by the data controller or not) enabling the individual to be distinguished from others. See .

[5] See Article 4(5).

[6] See Article 25.