The Information Commissioner's Office (ICO) said insurers, credit checking businesses and those involved in recruitment are among the businesses that will need to explain the rationale behind automated decisions to the people affected by those decisions under the new General Data Protection Regulation (GDPR).
Under the GDPR, data subjects will have a qualified right "not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her". A recital in the Regulation states that individuals subject to decision-making solely by automated processing should, among other things, be provided with "an explanation of the decision reached after such assessment".
In a recent discussion paper on big data and data protection (114-page / 1.10MB PDF), the ICO said: "Some have suggested this right can be easily circumvented as it is restricted to ‘solely’ automated processing and decisions that ‘significantly’ affect individuals. However, there are still many situations where the right is very likely to apply, such as credit applications, recruitment and insurance. In such circumstances, it may be difficult to provide a meaningful response to an individual exercising their right to an explanation. This is because … when computers learn and make decisions, they do so 'without regard for human comprehension'."
"Big data organisations therefore need to exercise caution before relying on machine learning decisions that cannot be rationalised in human-understandable terms. If an insurance company cannot work out the intricate nuances that cause their online application system to turn some people away but accept others (however reasonable those underlying reasons may be), how can it hope to explain this to the individuals affected?" it said.
The ICO said businesses must also have measures in place to check that decisions taken by machine learning systems are not "producing discriminatory, erroneous or unjustified results".
"Detecting discriminatory decisions in hindsight will not be sufficient to comply with the accountability provisions of the GDPR," the ICO said. "Big data analysts will need to find ways to build discrimination detection into their machine learning systems to prevent such decisions being made in the first place."
The ICO recommended that organisations "implement innovative techniques to develop auditable machine learning algorithms. Internal and external audits should be undertaken with a view to explaining the rationale behind algorithmic decisions and checking for bias, discrimination and errors".
The ICO's paper is an updated version of a previous report that the watchdog released in 2014 on the topic of big data and data protection. The latest version included a number of additions to explain how big data operations, including in the context of artificial intelligence (AI) and machine learning, can correspond to the GDPR. However, the ICO insisted its paper is "not a complete guide to the relevant law" and is only "intended as a contribution to discussions on big data, AI and machine learning and not as a guidance document or a code of practice".
According to the ICO, it is "highly likely" that businesses will need to carry out a data protection impact assessment before processing personal data through big data applications under the GDPR.
It recommended that businesses "embed a privacy impact assessment framework into their big data processing activities to help identify privacy risks and assess the necessity and proportionality of a given project". It said the assessment should "involve input from all relevant parties including data analysts, compliance officers, board members and the public".
The ICO also said that businesses that provide big data analytics services to other organisations could be considered to be data controllers and not just data processors under the GDPR, even if the terms of their outsourcing contract states that they are data processors.
"When outsourcing big data analytics to other companies, careful consideration should be given to where control over the processing of personal data actually lies – this will have implications for compliance and liability," the ICO said. "If an organisation intends to conduct its big data outsourcing in a data controller-data processor relationship, it is important that the contract includes clear instructions about how the data can be used and the specific purposes for its processing."
"However, the existence of such a contract would not automatically mean that the company doing the data analysis is a data processor. If that company has enough freedom to use its expertise to decide what data to collect and how to apply its analytic techniques, it is likely to be a data controller as well," it said.
The ICO also said that the vast quantities of data often gathered from multiple sources can make it difficult for businesses to respond to subject access requests (SARs). It said organisations will face additional duties in handling SARs under the GDPR but that the complexity of data analytics operations "cannot be an excuse for disregarding legal obligations".
"The existence of the right of access compels organisations to practise good data management," the ICO said. "They need adequate metadata, the ability to query their data to find all the information they have on an individual, and knowledge of whether the data they are processing has been truly anonymised or whether it can still be linked to an individual."
The ICO predicted that more businesses would set up "a web portal that enables people to see the data held about them for marketing purposes and the sources of that data" to meet their obligations under the GDPR. It also said that "personal data stores" could help businesses meet their duties on data portability under the new Regulation.
"Personal data stores can offer individuals a degree of control over the re-use of their personal data across different services," the ICO said. "This can at least help to address the issues of fairness and lack of transparency that we have identified as potentially problematic in big data."