Data Science ethics


Gov.uk blog: “If Tesco knows day-to-day how poorly the nation is, how can Government access  similar  insights so it can better plan health services? If Airbnb can give you a tailored service depending on your tastes, how can Government provide people with the right support to help them back into work in a way that is right for them? If companies are routinely using social media data to get feedback from their customers to improve their services, how can Government also use publicly available data to do the same?

Data science allows us to use new types of data and powerful tools to analyse this more quickly and more objectively than any human could. It can put us in the vanguard of policymaking – revealing new insights that leads to better and more tailored interventions. And  it can help reduce costs, freeing up resource to spend on more serious cases.

But some of these data uses and machine-learning techniques are new and still relatively untested in Government. Of course, we operate within legal frameworks such as the Data Protection Act and Intellectual Property law. These are flexible but don’t always talk explicitly about the new challenges data science throws up. For example, how are you to explain the decision making process of a deep learning black box algorithm? And if you were able to, how would you do so in plain English and not a row of 0s and 1s?

We want data scientists to feel confident to innovate with data, alongside  the policy makers and operational staff who make daily decisions on the data that the analysts provide –. That’s why we are creating an ethical framework which brings together the relevant parts of the law and ethical considerations into a simple document that helps Government officials decide what it can do and what it should do. We have a moral responsibility to maximise the use of data – which is never more apparent than after incidents of abuse or crime are left undetected – as well as to pay heed to the potential risks of these new tools. The guidelines are draft and not formal government policy, but we want to share them more widely in order to help iterate and improve them further….

So what’s in the framework? There is more detail in the fuller document, but it is based around six key principles:

  1. Start with a clear user need and public benefit: this will help you justify the level of data sensitivity and method you use
  2. Use the minimum level of data necessary to fulfill the public benefit: there are many techniques for doing so, such as de-identification, aggregation or querying against data
  3. Build robust data science models: the model is only as good as the data it contains and while machines are less biased than humans they can get it wrong. It’s critical to be clear about the confidence of the model and think through unintended consequences and biases contained within the data
  4. Be alert to public perceptions: put simply, what would a normal person on the street think about the project?
  5. Be as open and accountable as possible: Transparency is the antiseptic for unethical behavior. Aim to be as open as possible (with explanations in plain English), although in certain public protection cases the ability to be transparent will be constrained.
  6. Keep data safe and secure: this is not restricted to data science projects but we know that the public are most concerned about losing control of their data….(More)”