Data mining without discrimination
Databases help the police and judiciary to fight crime. Data mining, however, should not result in ethnic profiling or discrimination. This project researched how discrimination on the basis of data mining by the police and judiciary can be prevented.The researchers demonstrated that the algorithms currently being used lead to a number of discriminatory effects. Simply removing sensitive data, such as gender and ethnicity, did not solve the issue. Newly developed algorithms tested with real data, however, do prevent discrimination but also provide a realistic picture of the distribution of crime and the risk that people will commit an offence.
Part of this project focused on identifying the discriminatory effects of the data-mining algorithms currently used by the police and judiciary. It emerged that various discriminatory effects can indeed occur.
Another important finding was that simply removing sensitive data, such as gender and ethnicity, from the databases was insufficient to prevent discriminatory patterns. Even when these sensitive data were removed, profiling identified the same groups but on the basis of other, indirect properties, for instance having a certain postcode. This phenomenon is known as “redlining”. In fact, this exacerbates the situation, as there is still discrimination but it can no longer be recognised as such.
New search algorithms
The purpose of this research was to determine to what extent legal and ethical guidelines can be incorporated into data mining algorithms using real data provided by the police and judiciary. Transparency and accountability with respect to discrimination were important design criteria in this research project. Many of the new search algorithms developed were found demonstrably to prevent discriminatory effects.
Although the new algorithms were able to prevent discriminatory effects, the disadvantage was that this decreased the reliability of the data mining results. Nonetheless, the new techniques can add value since they allow policymakers to find the optimal balance between non-discrimination and reliability.
In order to act efficiently and effectively, the police and the judiciary need information about people who are preparing or committing a crime, or may do so at some time in the future. An increasing amount of information is available in large databases, and data mining techniques help the police and judiciary to prevent planned crimes.
However, these new techniques can also lead to ethnic profiling, which is legally prohibited and poses ethical problems. Project leader Bart Custers illustrated the problem in an opinion piece in newspaper Trouw in June 2016.
“The fact that there is a statistical link between ethnicity and crime is evident in Dutch prisons, where certain (non-Western) ethnicities are over-represented. But that may in fact be the result of ethnic profiling: if the police focus on certain ethnicities, they are the ones that will enter the penal system more often. A causal relationship between ethnicity and crime is virtually indemonstrable, as a lot of criminological research is based on figures from the judiciary, and – if the police engaged in ethnic profiling – these figures will be distorted.”
Lawyers, computer scientists and other stakeholders are cooperating closely to achieve the objectives of this project:
- The researchers analysed the relevant legislature and ethical values, such as non-discrimination and privacy.
- Computer scientists incorporated these findings into new search algorithms developed by them. The results were tested using real data provided by the police and judiciary.
- Subsequently, the test results were screened for potential ethical risks (such as the occurrence of discrimination and privacy violations).
What does a search algorithm do?
Large quantities of personal data are collected, stored and processed. These data are increasingly being analysed through automated systems, which work on the basis of search algorithms. Because of these “data mining” techniques, it is possible to detect various statistical trends and patterns which organisations can use to develop policies.
discrimination, data mining algorithms , ethnic profiling, transparancy , transparancy , privacy by design, police, crime prevention, selective sampling, redlining , data maskingOfficial project title: