Open Data and the Private Sector


Chapter by Joel Gurin, Carla Bonini and Stefaan Verhulst in State of Open Data: “The open data movement launched a decade ago with a focus on transparency, good governance, and citizen participation. As other chapters in this collection have documented in detail, those critical uses of open data have remained paramount and are continuing to grow in importance at a time of fake news and increased secrecy. But the value of open data extends beyond transparency and accountability – open data is also an important resource for business and economic growth.

The past several years have seen an increased focus on the value of open data to the private sector. In 2012, the Open Data Institute (ODI) was founded in the United Kingdom (UK) and backed with GBP 10 million by the UK government to maximise the value of open data in business and government. A year later, McKinsey released a report suggesting open data could help unlock USD 3 to 5 trillion in economic value annually. At around the same time, Monsanto acquired the Climate Corporation, a digital agriculture company that leverages open data to inform farmers for approximately USD 1.1 billion. In 2014, the GovLab launched the Open Data 500,2the first national study of businesses using open government data (now in six countries), and, in 2015, Open Data for Development (OD4D) launched the Open Data Impact Map, which today contains more than 1 100 examples of private sector companies using open data. The potential business applications of open data continue to be a priority for many governments around the world as they plan and develop their data programmes.

The use of open data has become part of the broader business practice of using data and data science to inform business decisions, ranging from launching new products and services to optimising processes and outsmarting the competition. In this chapter, we take stock of the state of open data and the private sector by analysing how the private sector both leverages and contributes to the open data ecosystem….(More)”.

Africa must reap the benefits of its own data


Tshilidzi Marwala at Business Insider: “Twenty-two years ago when I was a doctoral student in artificial intelligence (AI) at the University of Cambridge, I had to create all the AI algorithms I needed to understand the complex phenomena related to this field.

For starters, AI is a computer software that performs intelligent tasks that normally require human beings, while an algorithm is a set of rules that instruct a computer to execute specific tasks. In that era, the ability to create AI algorithms was more important than the ability to acquire and use data.

Google has created an open-source library called TensorFlow, which contains all the developed AI algorithms. This way Google wants people to develop applications (apps) using their software, with the payoff being that Google will collect data on any individual using the apps developed with TensorFlow.

Today, an AI algorithm is not a competitive advantage but data is. The World Economic Forum calls data the new “oxygen”, while Chinese AI specialist Kai-Fu Lee calls it the new “oil”.

Africa’s population is increasing faster than in any region in the world. The continent has a population of 1.3-billion people and a total nominal GDP of $2.3-trillion. This increase in the population is in effect an increase in data, and if data is the new oil, it is akin to an increase in oil reserve.

Even oil-rich countries such as Saudi Arabia do not experience an increase in their oil reserve. How do we as Africans take advantage of this huge amount of data?

There are two categories of data in Africa: heritage and personal. Heritage data resides in society, whereas personal data resides in individuals. Heritage data includes data gathered from our languages, emotions and accents. Personal data includes health, facial and fingerprint data.

Facebook, Amazon, Apple, Netflix and Google are data companies. They trade data to advertisers, banks and political parties, among others. For example, the controversial company Cambridge Analytica harvested Facebook data to influence the presidential election that potentially contributed to Donald Trump’s victory in the US elections.

The company Google collects language data to build an application called Google Translate that translates from one language to another. This app claims to cover African languages such as Zulu, Yoruba and Swahili. Google Translate is less effective in handling African languages than it is in handling European and Asian languages.

Now, how do we capitalise on our language heritage to create economic value? We need to build our own language database and create our own versions of Google Translate.

An important area is the creation of an African emotion database. Different cultures exhibit emotions differently. These are very important in areas such as safety of cars and aeroplanes. If we can build a system that can read pilots’ emotions, this would enable us to establish if a pilot is in a good state of mind to operate an aircraft, which would increase safety.

To capitalise on the African emotion database, we should create a data bank that captures emotions of African people in various parts of the continent, and then use this database to create AI apps to read people’s emotions. Mercedes-Benz has already implemented the “Attention Assist”, which alerts drivers to fatigue.

Another important area is the creation of an African health database. AI algorithms are able to diagnose diseases better than human doctors. However, these algorithms depend on the availability of data. To capitalise on this, we need to collect such data and use it to build algorithms that will be able to augment medical care….(More)”.

Beyond Bias: Re-Imagining the Terms of ‘Ethical AI’ in Criminal Law


Paper by Chelsea Barabas: “Data-driven decision-making regimes, often branded as “artificial intelligence,” are rapidly proliferating across the US criminal justice system as a means of predicting and managing the risk of crime and addressing accusations of discriminatory practices. These data regimes have come under increased scrutiny, as critics point out the myriad ways that they can reproduce or even amplify pre-existing biases in the criminal justice system. This essay examines contemporary debates regarding the use of “artificial intelligence” as a vehicle for criminal justice reform, by closely examining two general approaches to, what has been widely branded as, “algorithmic fairness” in criminal law: 1) the development of formal fairness criteria and accuracy measures that illustrate the trade-offs of different algorithmic interventions and 2) the development of “best practices” and managerialist standards for maintaining a baseline of accuracy, transparency and validity in these systems.

The essay argues that attempts to render AI-branded tools more accurate by addressing narrow notions of “bias,” miss the deeper methodological and epistemological issues regarding the fairness of these tools. The key question is whether predictive tools reflect and reinforce punitive practices that drive disparate outcomes, and how data regimes interact with the penal ideology to naturalize these practices. The article concludes by calling for an abolitionist understanding of the role and function of the carceral state, in order to fundamentally reformulate the questions we ask, the way we characterize existing data, and how we identify and fill gaps in existing data regimes of the carceral state….(More)”

MegaPixels


About: “…MegaPixels is an art and research project first launched in 2017 for an installation at Tactical Technology Collective’s GlassRoom about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a commission by Elevate Arts festival in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.

MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the several of the same technology companies who have created datasets presented on this site.

MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing an academic paper, MegaPixels is a website-first research project, with an academic publication to follow.

One of the main focuses of the dataset investigations presented on this site is to uncover where funding originated. Because of our emphasis on other researcher’s funding sources, it is important that we are transparent about our own….(More)”.

Principles and Policies for “Data Free Flow With Trust”


Paper by Nigel Cory, Robert D. Atkinson, and Daniel Castro: “Just as there was a set of institutions, agreements, and principles that emerged out of Bretton Woods in the aftermath of World War II to manage global economic issues, the countries that value the role of an open, competitive, and rules-based global digital economy need to come together to enact new global rules and norms to manage a key driver of today’s global economy: data. Japanese Prime Minister Abe’s new initiative for “data free flow with trust,” combined with Japan’s hosting of the G20 and leading role in e-commerce negotiations at the World Trade Organization (WTO), provides a valuable opportunity for many of the world’s leading digital economies (Australia, the United States, and European Union, among others) to rectify the gradual drift toward a fragmented and less-productive global digital economy. Prime Minister Abe is right in proclaiming, “We have yet to catch up with the new reality, in which data drives everything, where the D.F.F.T., the Data Free Flow with Trust, should top the agenda in our new economy,” and right in his call “to rebuild trust toward the system for international trade. That should be a system that is fair, transparent, and effective in protecting IP and also in such areas as e-commerce.”

The central premise of this effort should be a recognition that data and data-driven innovation are a force for good. Across society, data innovation—the use of data to create value—is creating more productive and innovative economies, transparent and responsive governments, better social outcomes (improved health care, safer and smarter cities, etc.).3But to maximize the innovative and productivity benefits of data, countries that support an open, rules-based global trading system need to agree on core principles and enact common rules. The benefits of a rules-based and competitive global digital economy are at risk as a diverse range of countries in various stages of political and economic development have policy regimes that undermine core processes, especially the flow of data and its associated legal responsibilities; the use of encryption to protect data and digital activities and technologies; and the blocking of data constituting illegal, pirated content….(More)”.

A Symphony, Not a Solo: How Collective Management Organisations Can Embrace Innovation and Drive Data Sharing in the Music Industry


Paper by David Osimo, Laia Pujol Priego, Turo Pekari and Ano Sirppiniemi: “…data is becoming a fundamental source of competitive advantage in music, just as in other sectors, and streaming services in particular are generating large volume of new data offering unique insight around customer taste and behavior. (As Financial Times recently put it, the music
industry is having its “moneyball” moment) But how are the different players getting ready for this change?

This policy brief aims to look at the question from the perspective of CMOs, the organisations charged with redistributing royalties from music users to music rightsholders (such as musical authors and publishers).

The paper is divided in three sections. Part I will look at the current positioning of CMOs in this new data-intensive ecosystem. Part II will discuss how greater data sharing and reuse can maximize innovation, comparing the music industries with other industries. Part III will make policy and business-model reform recommendations for CMOs to stimulate data-driven innovation, internally and in the industry as a whole….(More)”

Airbnb and New York City Reach a Truce on Home-Sharing Data


Paris Martineau at Wired: “For much of the past decade, Airbnb and New York City have been embroiled in a high-profile feud. Airbnb wants legitimacy in its biggest market. City officials want to limit home-sharing platforms, which they argue exacerbate the city’s housing crisis and pose safety risks by allowing people to transform homes into illegal hotels.

Despite years of lawsuits, countersuits, lobbying campaigns, and failed attempts at legislation, progress on resolving the dispute has been incremental at best. The same could be said for many cities around the nation, as local government officials struggle to come to grips with the increasing popularity of short-term rental platforms like Airbnb, HomeAway, and VRBO in high-tourism areas.

In New York last week, there were two notable breaks in the logjam. On May 14, Airbnb agreed to give city officials partially anonymized host and reservation data for more than 17,000 listings. Two days later, a judge ordered Airbnb to turn over more detailed and nonanonymized information on dozens of hosts and hundreds of guests who have listed or stayed in more than a dozen buildings in Manhattan, Brooklyn, and Queens in the past seven years.

In both cases, the information will be used by investigators with the Mayor’s Office of Special Enforcement to identify hosts and property owners who may have broken the city’s notoriously strict short-term rental laws by converting residences into de facto hotels by listing them on Airbnb.

City officials originally subpoenaed Airbnb for the data—not anonymized—on the more than 17,000 listings in February. Mayor Bill de Blasio called the move an effort to force the company to “come clean about what they’re actually doing in this city.” The agreement outlining the data sharing was signed as a compromise on May 14, according to court records.

In addition to the 17,000 listings identified by the city, Airbnb will also share data on every listing rented through its platform between January 1, 2018, and February 18, 2019, that could have potentially violated New York’s short-term rental laws. The city prohibits rentals of an entire apartment or home for less than 30 days without the owner present in the unit, making many stays traditionally associated with services like Airbnb, HomeAway, and VRBO illegal. Only up to two guests are permitted in the short-term rental of an apartment or room, and they must be given “free and unobstructed access to every room and to each exit within the apartment,” meaning hosts can’t get around the ban on whole-apartment rentals by renting out three separate private rooms at once….(More)”.

What can we learn from billions of food purchases derived from fidelity cards?


Daniele Quercia at Medium: “By combining 1.6B food item purchases with 1.1B medical prescriptions for the entire city of London for one year, we discovered that, to predict health outcomes, socio-economic conditions matter less than what previous research has shown: despite being of lower-income, certain areas are healthy, and that is because of what their residents eat!

This result comes from our latest project “Poor but Healthy”, which was published in the Springer European Physical Journal (EPJ) of Data Science this month, and comes with a @tobi_vierzwo’s stunningly beautiful map of London I invite all of you to explore.

Why are we interested in urban health? In our cities, food is cheap and exercise discretionary, and health takes its toll. Half of European citizens will be obese by 2050, and obesity and its diseases are likely to reach crisis proportions. In this project, we set out to show that fidelity cards of grocery stores represent a treasure trove of health data — they can be used not only to (e)mail discount coupons to customers but also to effectively track a neighbourhood’s health in real-time for an entire city or even an entire country.

In research circles, the impact of eating habits on people’s health has mostly been studied using dietary surveys, which are costly and of limited scale.

To complement these surveys, we have recently resorted to grocery fidelity cards. We analyzed the anonymized records of 1.6B grocery items purchased by 1.6M grocery store customers in London over one whole year, and combined them with 1.1B medical prescriptions.

In so doing, we found that, as one expects, the “trick” to not being associated with chronic diseases is eating less what we instinctively like (e.g., sugar, carbohydrates), balancing all the nutrients, and avoiding the (big) quantities that are readily available. These results come as no surprise yet speak to the validity of using fidelity cards to capture health outcomes…(More)”.


Data to the rescue


Podcast by Kenneth Cukier: “Access to the right data can be as valuable in humanitarian crises as water or medical care, but it can also be dangerous. Misused or in the wrong hands, the same information can put already vulnerable people at further risk. Kenneth Cukier hosts this special edition of Babbage examining how humanitarian organisations use data and what they can learn from the profit-making tech industry. This episode was recorded live from Wilton Park, in collaboration with the United Nations OCHA Centre for Humanitarian Data…(More)”.

Privacy and Identity in a Networked Society: Refining Privacy Impact Assessment,


Book by Stefan Strauß: “This book offers an analysis of privacy impacts resulting from and reinforced by technology and discusses fundamental risks and challenges of protecting privacy in the digital age.

Privacy is among the most endangered “species” in our networked society: personal information is processed for various purposes beyond our control. Ultimately, this affects the natural interplay between privacy, personal identity and identification. This book investigates that interplay from a systemic, socio-technical perspective by combining research from the social and computer sciences. It sheds light on the basic functions of privacy, their relation to identity, and how they alter with digital identification practices. The analysis reveals a general privacy control dilemma of (digital) identification shaped by several interrelated socio-political, economic and technical factors. Uncontrolled increases in the identification modalities inherent to digital technology reinforce this dilemma and benefit surveillance practices, thereby complicating the detection of privacy risks and the creation of appropriate safeguards.

Easing this problem requires a novel approach to privacy impact assessment (PIA), and this book proposes an alternative PIA framework which, at its core, comprises a basic typology of (personally and technically) identifiable information. This approach contributes to the theoretical and practical understanding of privacy impacts and thus, to the development of more effective protection standards….(More)”.