NBER paper by J. Aislinn Bohren, Kareem Haggag, Alex Imas, Devin G. Pope: “Discrimination has been widely studied in economics and other disciplines. In addition to identifying evidence of discrimination, economists often categorize the source of discrimination as either taste-based or statistical. Categorizing discrimination in this way can be valuable for policy design and welfare analysis. We argue that a further categorization is important and needed. Specifically, in many situations economic agents may have inaccurate beliefs about the expected productivity or performance of a social group. This motivates our proposed distinction between accurate (based on correct beliefs) and inaccurate (based on incorrect beliefs) statistical discrimination. We do a thorough review of the discrimination literature and argue that this distinction is rarely discussed. Using an online experiment, we illustrate how to identify accurate versus inaccurate statistical discrimination. We show that ignoring this distinction – as is often the case in the discrimination literature – can lead to erroneous interpretations of the motives and implications of discriminatory behavior. In particular, when not explicitly accounted for, inaccurate statistical discrimination can be mistaken for taste-based discrimination, accurate statistical discrimination, or a combination of the two….(More)”.
How to use data for good — 5 priorities and a roadmap
Stefaan Verhulst at apolitical: “…While the overarching message emerging from these case studies was promising, several barriers were identified that if not addressed systematically could undermine the potential of data science to address critical public needs and limit the opportunity to scale the practice more broadly.
Below we summarise the five priorities that emerged through the workshop for the field moving forward.
1. Become People-Centric
Much of the data currently used for drawing insights involve or are generated by people.
These insights have the potential to impact people’s lives in many positive and negative ways. Yet, the people and the communities represented in this data are largely absent when practitioners design and develop data for social good initiatives.
To ensure data is a force for positive social transformation (i.e., they address real people’s needs and impact lives in a beneficiary way), we need to experiment with new ways to engage people at the design, implementation, and review stage of data initiatives beyond simply asking for their consent.

(Photo credit: Image from the people-led innovation report)
As we explain in our People-Led Innovation methodology, different segments of people can play multiple roles ranging from co-creation to commenting, reviewing and providing additional datasets.
The key is to ensure their needs are front and center, and that data science for social good initiatives seek to address questions related to real problems that matter to society-at-large (a key concern that led The GovLab to instigate 100 Questions Initiative).
2. Establish Data About the Use of Data (for Social Good)
Many data for social good initiatives remain fledgling.
As currently designed, the field often struggles with translating sound data projects into positive change. As a result, many potential stakeholders—private sector and government “owners” of data as well as public beneficiaries—remain unsure about the value of using data for social good, especially against the background of high risks and transactions costs.
The field needs to overcome such limitations if data insights and its benefits are to spread. For that, we need hard evidence about data’s positive impact. Ironically, the field is held back by an absence of good data on the use of data—a lack of reliable empirical evidence that could guide new initiatives.
The field needs to prioritise developing a far more solid evidence base and “business case” to move data for social good from a good idea to reality.
3. Develop End-to-End Data Initiatives
Too often, data for social good focus on the “data-to-knowledge” pipeline without focusing on how to move “knowledge into action.”
As such, the impact remains limited and many efforts never reach an audience that can actually act upon the insights generated. Without becoming more sophisticated in our efforts to provide end-to-end projects and taking “data from knowledge to action,” the positive impact of data will be limited….
4. Invest in Common Trust and Data Steward Mechanisms
For data for social good initiatives (including data collaboratives) to flourish and scale, there must be substantial trust between all parties involved; and amongst the public-at-large.
Establishing such a platform of trust requires each actor to invest in developing essential trust mechanisms such as data governance structures, contracts, and dispute resolution methods. Today, designing and establishing these mechanisms take tremendous time, energy, and expertise. These high transaction costs result from the lack of common templates and the need to each time design governance structures from scratch…
5. Build Bridges Across Cultures
As C.P. Snow famously described in his lecture on “Two Cultures and the Scientific Revolution,” we must bridge the “two cultures” of science and humanism if we are to solve the world’s problems….
To implement these five priorities we will need experimentation at the operational but also institutional level. This involves the establishment of “data stewards” within organisations that can accelerate data for social good initiative in a responsible manner integrating the five priorities above….(More)”
We should extend EU bank data sharing to all sectors
Carlos Torres Vila in the Financial Times: “Data is now driving the global economy — just look at the list of the world’s most valuable companies. They collect and exploit the information that users generate through billions of online interactions taking place every day.
But companies are hoarding data too, preventing others, including the users to whom the data relates, from accessing and using it. This is true of traditional groups such as banks, telcos and utilities, as well as the large digital enterprises that rely on “proprietary” data.
Global and national regulators must address this problem by forcing companies to give users an easy way to share their own data, if they so choose. This is the logical consequence of personal data belonging to users. There is also the potential for enormous socio-economic benefits if we can create consent-based free data flows.
We need data-sharing across companies in all sectors in a real time, standardised way — not at a speed and in a format dictated by the companies that stockpile user data. These new rules should apply to all electronic data generated by users, whether provided directly or observed during their online interactions with any provider, across geographic borders and in any sector. This could include everything from geolocation history and electricity consumption to recent web searches, pension information or even most recently played songs.
This won’t be easy to achieve in practice, but the good news is that we already have a framework that could be the model for a broader solution. The UK’s Open Banking system provides a tantalising glimpse of what may be possible. In Europe, the regulation known as the Payment Services Directive 2 allows banking customers to share data about their transactions with multiple providers via secure, structured IT interfaces. We are already seeing this unlock new business models and drive competition in digital financial services. But these rules do not go far enough — they only apply to payments history, and that isn’t enough to push forward a data-driven economic revolution across other sectors of the economy.
We need a global framework with common rules across regions and sectors. This has already happened in financial services: after the 2008 financial crisis, the G20 strengthened global banking standards and created the Financial Stability Board. The rules, while not perfect, have delivered uniformity which has strengthened the system.
We need a similar global push for common rules on the use of data. While it will be difficult to achieve consensus on data, and undoubtedly more difficult still to implement and enforce it, I believe that now is the time to decide what we want. The involvement of the G20 in setting up global standards will be essential to realising the potential that data has to deliver a better world for all of us. There will be complaints about the cost of implementation. I know first hand how expensive it can be to simultaneously open up and protect sensitive core systems.
The alternative is siloed data that holds back innovation. There will also be justified concerns that easier data sharing could lead to new user risks. Security must be a non-negotiable principle in designing intercompany interfaces and protecting access to sensitive data. But Open Banking shows that these challenges are resolvable. …(More)”.
France Bans Judge Analytics, 5 Years In Prison For Rule Breakers
Artificial Lawyer: “In a startling intervention that seeks to limit the emerging litigation analytics and prediction sector, the French Government has banned the publication of statistical information about judges’ decisions – with a five year prison sentence set as the maximum punishment for anyone who breaks the new law.
Owners of legal tech companies focused on litigation analytics are the most likely to suffer from this new measure.
The new law, encoded in Article 33 of the Justice Reform Act, is aimed at preventing anyone – but especially legal tech companies focused on litigation prediction and analytics – from publicly revealing the pattern of judges’ behaviour in relation to court decisions.
A key passage of the new law states:
‘The identity data of magistrates and members of the judiciary cannot be reused with the purpose or effect of evaluating, analysing, comparing or predicting their actual or alleged professional practices.’ *
As far as Artificial Lawyer understands, this is the very first example of such a ban anywhere in the world.
Insiders in France told Artificial Lawyer that the new law is a direct result of an earlier effort to make all case law easily accessible to the general public, which was seen at the time as improving access to justice and a big step forward for transparency in the justice sector.
However, judges in France had not reckoned on NLP and machine learning companies taking the public data and using it to model how certain judges behave in relation to particular types of legal matter or argument, or how they compare to other judges.
In short, they didn’t like how the pattern of their decisions – now relatively easy to model – were potentially open for all to see.
Unlike in the US and the UK, where judges appear to have accepted the fait accompli of legal AI companies analysing their decisions in extreme detail and then creating models as to how they may behave in the future, French judges have decided to stamp it out….(More)”.
The Landscape of Open Data Policies
Apograf: “Open Access (OA) publishing has a long history, going back to the early 1990s, and was born with the explicit intention of improving access to scholarly literature. The internet has played a pivotal role in garnering support for free and reusable research publications, as well as stronger and more democratic peer-review systems — ones are not bogged down by the restrictions of influential publishing platforms….
Looking back, looking forward
Launched in 1991, ArXiv.org was a pioneering platform in this regard, a telling example of how researchers could cooperate to publish academic papers for free and in full view for the public. Though it has limitations — papers are curated by moderators and are not peer-reviewed — arXiv is a demonstration of how technology can be used to overcome some of the incentive and distribution problems that scientific research had long been subjected to.
The scientific community has itself assumed the mantle to this end: the Budapest Open Access Initiative (BOAI) and the Berlin Declaration on Open Access Initiative, launched in 2002 and 2003 respectively, are considered landmark movements in the push for unrestricted access to scientific research. While mostly symbolic, the effort highlighted the growing desire to solve the problems plaguing the space through technology.
The BOAI manifesto begins with a statement that is an encapsulation of the movement’s purpose,
“An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds.”
Plan S is a more recent attempt to make publicly funded research available to all. Launched by Science Europe in September 2018, Plan S — short for ‘Shock’ — has energized the research community with its resolution to make access to publicly funded knowledge a right to everyone and dissolve the profit-driven ecosystem of research publication. Members of the European Union have vowed to achieve this by 2020.
Plan S has been supported by governments outside Europe as well. China has thrown itself behind it, and the state of California has enacted a law that requires open access to research one year after publishing. It is, of course, not without its challenges: advocacy and ensuring that publishing is not restricted a few venues are two such obstacles. However, the organization behind forming the guidelines, cOAlition S, has agreed to make the guidelines more flexible.
The emergence of this trend is not without its difficulties, however, and numerous obstacles continue to hinder the dissemination of information in a manner that is truly transparent and public. Chief among these are the many gates that continue to keep research as somewhat of exclusive property, besides the fact that the infrastructure and development for such systems are short on funding and staff…..(More)”.
How Can We Overcome the Challenge of Biased and Incomplete Data?
Knowledge@Wharton: “Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.
In a conversation with Knowledge@Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases….
….Knowledge@Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?
Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you….(More)”.
EU countries and car manufacturers to share information to improve road safety
Press Release: “EU member states, car manufacturers and navigation systems suppliers will share information on road conditions with the aim of improving road safety. Cora van Nieuwenhuizen, Minister of Infrastructure and Water Management, agreed this today with four other EU countries during the ITS European Congress in Eindhoven. These agreements mean that millions of motorists in the Netherlands will have access to more information on unsafe road conditions along their route.
The data on road conditions that is registered by modern cars is steadily improving. For instance, information on iciness, wrong-way drivers and breakdowns in emergency lanes. This kind of data can be instantly shared with road authorities and other vehicles following the same route. Drivers can then adapt their driving behaviour appropriately so that accidents and delays are prevented….
The partnership was announced today at the ITS European Congress, the largest European event in the fields of smart mobility and the digitalisation of transport. Among other things, various demonstrations were given on how sharing this type of data contributes to road safety. In the year ahead, the car manufacturers BMW, Volvo, Ford and Daimler, the EU member states Germany, the Netherlands, Finland, Spain and Luxembourg, and navigation system suppliers TomTom and HERE will be sharing data. This means that millions of motorists across the whole of Europe will receive road safety information in their car. Talks on participating in the partnership are also being conducted with other European countries and companies.
ADAS
At the ITS congress, Minister Van Nieuwenhuizen and several dozen parties today also signed an agreement on raising awareness of advanced driver assistance systems (ADAS) and their safe use. Examples of ADAS include automatic braking systems, and blind spot detection and lane keeping systems. Using these driver assistance systems correctly makes driving a car safer and more sustainable. The agreement therefore also includes the launch of the online platform “slimonderweg.nl” where road users can share information on the benefits and risks of ADAS.
Minister Van Nieuwenhuizen: “Motorists are often unaware of all the capabilities modern cars offer. Yet correctly using driver assistance systems really can increase road safety. From today, dozens of parties are going to start working on raising awareness of ADAS and improving and encouraging the safe use of such systems so that more motorists can benefit from them.”
Connected Transport Corridors
Today at the congress, progress was also made regarding the transport of goods. For example, at the end of this year lorries on three transport corridors in our country will be sharing logistics data. This involves more than just information on environmental zones, availability of parking, recommended speeds and predicted arrival times at terminals. Other new technologies will be used in practice on a large scale, including prioritisation at smart traffic lights and driving in convoy. Preparatory work on the corridors around Amsterdam and Rotterdam and in the southern Netherlands has started…..(More)”.
The Right to the Datafied City: Interfacing the Urban Data Commons
Chapter by Michiel de Lange in The Right to the Smart City: “The current datafication of cities raises questions about what Lefebvre and many after him have called “the right to the city.” In this contribution, I investigate how the use of data for civic purposes may strengthen the “right to the datafied city,” that is, the degree to which different people engage and participate in shaping urban life and culture, and experience a sense of ownership. The notion of the commons acts as the prism to see how data may serve to foster this participatory “smart citizenship” around collective issues. This contribution critically engages with recent attempts to theorize the city as a commons. Instead of seeing the city as a whole as a commons, it proposes a more fine-grained perspective of the “commons-as-interface.” The “commons-as-interface,” it is argued, productively connects urban data to the human-level political agency implied by “the right to the city” through processes of translation and collectivization. The term is applied to three short case studies, to analyze how these processes engender a “right to the datafied city.” The contribution ends by considering the connections between two seemingly opposed discourses about the role of data in the smart city – the cybernetic view versus a humanist view. It is suggested that the commons-as-interface allows for more detailed investigations of mediation processes between data, human actors, and urban issues….(More)”.
107 Years Later, The Titanic Sinking Helps Train Problem-Solving AI
Kiona N. Smith at Forbes: “What could the 107-year-old tragedy of the Titanic possibly have to do with modern problems like sustainable agriculture, human trafficking, or health insurance premiums? Data turns out to be the common thread. The modern world, for better or or worse, increasingly turns to algorithms to look for patterns in the data and and make predictions based on those patterns. And the basic methods are the same whether the question they’re trying to answer is “Would this person survive the Titanic sinking?” or “What are the most likely routes for human trafficking?”
An Enduring Problem
Predicting survival at sea based on the Titanic dataset is a standard practice problem for aspiring data scientists and programmers. Here’s the basic challenge: feed your algorithm a portion of the Titanic passenger list, which includes some basic variables describing each passenger and their fate. From that data, the algorithm (if you’ve programmed it well) should be able to draw some conclusions about which variables made a person more likely to live or die on that cold April night in 1912. To test its success, you then give the algorithm the rest of the passenger list (minus the outcomes) and see how well it predicts their fates.
Online communities like Kaggle.com have held competitions to see who can develop the algorithm that predicts survival most accurately, and it’s also a common problem presented to university classes. The passenger list is big enough to be useful, but small enough to be manageable for beginners. There’s a simple set out of outcomes — life or death — and around a dozen variables to work with, so the problem is simple enough for beginners to tackle but just complex enough to be interesting. And because the Titanic’s story is so famous, even more than a century later, the problem still resonates.
“It’s interesting to see that even in such a simple problem as the Titanic, there are nuggets,” said Sagie Davidovich, Co-Founder & CEO of SparkBeyond, who used the Titanic problem as an early test for SparkBeyond’s AI platform and still uses it as a way to demonstrate the technology to prospective customers….(More)”.
Opening Data for Global Health
Chapter by Matt Laessig, Bryon Jacob and Carla AbouZahr in The Palgrave Handbook of Global Health Data Methods for Policy and Practice: “…provide best practices for organizations to adopt to disseminate data openly for others to use. They describe development of the open data movement and its rapid adoption by governments, non-governmental organizations, and research groups. The authors provide examples from the health sector—an early adopter—but acknowledge concerns specific to health relating to informed consent, intellectual property, and ownership of personal data. Drawing on their considerable contributions to the open data movement, Laessig and Jacob share their Open Data Progression Model. They describe six stages to make data open: from data collection, documentation of the data, opening the data, engaging the community of users, making the data interoperable, to finally linking the data….(More)”