News

Building a resilient tech strategy: Data bias

18 May 2022

In recent years, data-driven technologies have transformed the world around us, we have computers that can translate for us, facial recognition systems to unlock our phones and algorithms to identify certain cancers. However, we have also seen what happens when data sets that are used to train the algorithms are not representative of all of us. This is data bias, and it has real world consequences for many of us.

Listen to and download the podcast on Apple podcasts, Google podcasts and Spotify.

In recent years, data driven technologies have transformed the world around us. We have computers that have learned to translate, facial recognition systems that unlock our phones and algorithms that can identify certain cancers. But we've also seen what happens when the data sets used to train the algorithms are not representative of all of us.

Data bias. Since it's created by humans, technology can reflect the biases, conscious or subconscious of its creators, and sometimes those biases only become apparent after the technology is deployed. And that has real world consequences for many of us.

One study in the US found that facial recognition software developed by multiple companies, the kind that’s used to unlock smartphones or check passports, made more mistakes with African-American or Asian faces than Caucasian ones. One case in 2020 demonstrated what an algorithm’s racial bias might mean when a faulty facial recognition match led to a black man in Michigan being arrested for a crime he didn't commit.

There's evidence of bias in relation to companies hiring processes too. When AI is used to screen and evaluate job applicants, it has been found to discriminate against women and minorities. Recently, big brands like CVS, Deloitte and Meta have come together to work to detect and combat algorithmic bias. They're not going to wait until the law compels them to do it, although that might be on the way. Right now, data bias might not be a legal problem, but it's certainly an ethical one.

We spoke to Allison Holt Ryan and Christine Gateau, both partners at Hogan Lovells about the way we use data and what it means in practice for companies to combat data bias.

Q: We've seen data bias in everything from college applications to credit card applications and mortgage loans. Is it everywhere?

Allison: I would hate to say that data bias is everywhere, but I think we can all fairly agree that big data is everywhere in almost every facet of our lives. And everywhere you are using large data sets, it's going to increase the likelihood that there could be data bias either in the data cell itself or in how that data is being used. So I don't think I'm going to go so far as to say that data bias is everywhere, but I think the potential for data bias is everywhere.

Q: Christine, do you have any real life examples?

Christine: Back in 2020 during the pandemic, a lot of school exams couldn’t take place in person. One example that comes to mind is in the United Kingdom. The government decided that, instead of students sitting their A-level exams, the schools would come up with a score for each child based on an algorithm. What happened was that almost 40 percent of students received lower grades than they had anticipated, and there was outrage and complaints from parents and students. The reason for that discrepancy between the grades given by the algorithm and the grade that would have been expected is that there was some bias in the algorithm. That algorithm took into consideration three types of data - first a teacher assessment, second a class ranking and third the best performing kids at the school. But apparently, even with those three sets of data taken, there was still a huge bias in the grades given by the algorithm. As you can imagine, the UK department which regulates the exam had to explain itself and in the end, had to retract the grades given by the algorithm.

Q: Exactly how does data bias creep into the equation?

Allison: We see two ways primarily, and obviously there may be more things to think about here, but really the first problem is with the underlying data itself. Here in the U.S., we have a saying ‘garbage in, garbage out.’ You put bad things in, you're likely to get bad things out. So if that initial data set that you're putting in has holes in it, is not representative of a full population or has bias in it already, that's going to be one place where bias is going to creep into the equation. The second place is in the algorithm that the company is using. Once you pull the data in, how you manipulate that data, how you choose to read that data, how the AI chooses to evaluate and interpret that data is the second place that we really can see bias creep in. And this is important because companies need to be vigilant, not just on what data set they're using, but also how they're interpreting that data and how they're using that data. And so it really creates two avenues for bias to enter the equation.

Q: In the Litigation Landscape report, 45 percent of businesses say they do not vet technology supplied to them for bias. Why is that?

Christine: What we found in our survey is that bias in data and programming is the second most important ethical issue. While it is taken into consideration when you invest as a company in technology, when you establish or when you publish principles that govern how the data will be used is another story. So you need to make sure that customers, employees and other stakeholders’ trust is taken into consideration when deploying innovative technology.

Allison: The other thing we could surmise as to why businesses are not vetting that technology is because our economy is moving so quickly and the advances made by certain companies are forcing the entire market to run at a breakneck pace. So having that ability to pause, to vet and do the necessary testing on either the data or the algorithm, may be something that companies don't feel like they have time to do. And so you might have a bit of an abrasion between business folks who are trying to push this forward as quickly as possible and either your legal or ethics teams who are trying to do it in a way that confirms the validity of the data and the way it's being used. And that's an age old problem, that. abrasion between the business folks who want to get the job done and frequently it feels like legal or ethics are trying to slow them down. So helping our clients navigate that in a way that is responsible but respectful of the business, I think is one key thing that we're trying to focus on right now.

Q: When does data bias become a legal issue?

Christine: At the EU level, there is draft regulation which was presented by the European Commission in April 2021. The aim of that regulation will be to establish harmonised rules on artificial intelligence. Earlier this year, when the French presidency of the European Union started, a new text was submitted and the plan is for that text to be presented before the European Parliament before June of this year. Concretely, this draft regulation establishes harmonised rules on artificial intelligence and will classify uses of AI according to the risk involved. And some AI systems in particular have been singled out. For example, human resources applications, credit allocation, medicine, justice and other AI systems will be considered to be too risky and simply banned.

Looking back at the high risk AI systems, those systems will have to comply with a set of obligations to ensure good ethics of the AI, which means non-discrimination, respect of privacy, robustness of the technology and quality of the training data. There is going to be a legal obligation to take those ethical data issues into consideration when you are devising putting an AI system on the market.

Another example is in France and a law which dates back to 2021. The name of the law is the Bioethics law. And this is a law regulating exclusively the field of medicine and medical research and the use of algorithms in this specific sector. The legal provision for the moment is only concerning transparency and explicability of algorithms. So we are not as far as a legal obligation to take into consideration data in an AI system, but at least you have the foundation to have explanations around the algorithm, and you can think that afterwards you might have more stringent obligations deriving from this law.

Allison: In the US, we are very reactive right now. It's not something that regulators are out leading out front, but instead are responding to what they see happening in the marketplace. I have two examples. First, the Federal Trade Commission in the U.S. has become more active in analysing issues regarding data bias. They put out guidance in 2020 and then again in early 2021 and then late last year they said they were going to put out some rules on this. We haven't seen those rules yet, but the rules will be made to ensure that there's not unfair discrimination in the use of data. So the FTC is becoming more and more active in this space.

Also, earlier this year, New York State passed a novel first of its kind law in the country to try to remedy some of the impacts in the employment context. It's a fairly straightforward law, but we don't know how it's going to play out just yet. The law requires that employers and employment agencies that use these AI tools conduct independent audits of the tools for bias and provide disclosures to candidates if those AI tools are being used to make employment decisions. Now the law doesn't go into effect until January 1, 2023. So stay tuned for what it's actually going to mean and the landscape of hiring law in New York. But those are just a couple of examples of what's happening in the U.S. and we expect for this area to become more and more active.

Q: When it comes to a company actually purchasing or developing its own software, what are some best practices for them to mitigate risk to avoid data bias?

Christine: It's important that companies discuss these issues as a group. Ethics is not going to be assigned to one specific function within the company, it must be a discussion with the entire business. It's important to have the views of all the various stakeholders on this issue. You may have views of your customers, your employees and other stakeholders. Having them take part in that conversation will also ensure that you increase the trust that they will have in the technology that you are developing or that you're buying. They will know you have developed a clear framework and that all aspects of data bias have been taken into consideration. Also, when you invest or when you develop a technology that raises ethical challenges, you might also think about establishing and publishing those principles that govern how the data will be used.

If you are purchasing software with a lot of data, you should ask the provider what they have done to eliminate those biases and make sure you study those materials. Again, if you are in this position of purchasing a technology, you might want to insert warranty assurances in the contract to make sure that the software does not contain biases and obviously to conduct your own due diligence to check whether this is accurate.

Allison: And don’t forget the second part of the equation that we talked about. What about the technology, what about the software that you're using? Where is your data set coming from? How representative is that data set? And is the software you're either developing or buying accounting for potential biases in the underlying data itself? So that's another thing I would suggest. Make sure you're thinking about both sides of the equation, the data and then the software or AI that is interpreting that data.

Authored by Christine Gateau and Allison Holt Ryan.