How Tech Giants Are Devising Real Ethics for Artificial Intelligence


For years, science-fiction moviemakers have been making us fear the bad things that artificially intelligent machines might do to their human creators. But for the next decade or two, our biggest concern is more likely to be that robots will take away our jobs or bump into us on the highway.

Now five of the world’s largest tech companies are trying to create a standard of ethics around the creation of artificial intelligence. While science fiction has focused on the existential threat of A.I. to humans,researchers at Google’s parent company, Alphabet, and those from Amazon,Facebook, IBM and Microsoft have been meeting to discuss more tangible issues, such as the impact of A.I. on jobs, transportation and even warfare.

Tech companies have long overpromised what artificially intelligent machines can do. In recent years, however, the A.I. field has made rapid advances in a range of areas, from self-driving cars and machines that understand speech, like Amazon’s Echo device, to a new generation of weapons systems that threaten to automate combat.

The specifics of what the industry group will do or say — even its name —have yet to be hashed out. But the basic intention is clear: to ensure thatA.I. research is focused on benefiting people, not hurting them, according to four people involved in the creation of the industry partnership who are not authorized to speak about it publicly.

The importance of the industry effort is underscored in a report issued onThursday by a Stanford University group funded by Eric Horvitz, a Microsoft researcher who is one of the executives in the industry discussions. The Stanford project, called the One Hundred Year Study onArtificial Intelligence, lays out a plan to produce a detailed report on the impact of A.I. on society every five years for the next century….The Stanford report attempts to define the issues that citizens of a typicalNorth American city will face in computers and robotic systems that mimic human capabilities. The authors explore eight aspects of modern life,including health care, education, entertainment and employment, but specifically do not look at the issue of warfare..(More)”

The risks of relying on robots for fairer staff recruitment


Sarah O’Connor at the Financial Times: “Robots are not just taking people’s jobs away, they are beginning to hand them out, too. Go to any recruitment industry event and you will find the air is thick with terms like “machine learning”, “big data” and “predictive analytics”.

The argument for using these tools in recruitment is simple. Robo-recruiters can sift through thousands of job candidates far more efficiently than humans. They can also do it more fairly. Since they do not harbour conscious or unconscious human biases, they will recruit a more diverse and meritocratic workforce.

This is a seductive idea but it is also dangerous. Algorithms are not inherently neutral just because they see the world in zeros and ones.

For a start, any machine learning algorithm is only as good as the training data from which it learns. Take the PhD thesis of academic researcher Colin Lee, released to the press this year. He analysed data on the success or failure of 441,769 job applications and built a model that could predict with 70 to 80 per cent accuracy which candidates would be invited to interview. The press release plugged this algorithm as a potential tool to screen a large number of CVs while avoiding “human error and unconscious bias”.

But a model like this would absorb any human biases at work in the original recruitment decisions. For example, the research found that age was the biggest predictor of being invited to interview, with the youngest and the oldest applicants least likely to be successful. You might think it fair enough that inexperienced youngsters do badly, but the routine rejection of older candidates seems like something to investigate rather than codify and perpetuate. Mr Lee acknowledges these problems and suggests it would be better to strip the CVs of attributes such as gender, age and ethnicity before using them….(More)”

Technology Is Monitoring the Urban Landscape


Big City is watching you.

It will do it with camera-equipped drones that inspect municipal powerlines and robotic cars that know where people go. Sensor-laden streetlights will change brightness based on danger levels. Technologists and urban planners are working on a major transformation of urban landscapes over the next few decades.

Much of it involves the close monitoring of things and people, thanks to digital technology. To the extent that this makes people’s lives easier, the planners say, they will probably like it. But troubling and knotty questions of privacy and control remain.

A White House report published in February identified advances in transportation, energy and manufacturing, among other developments, that will bring on what it termed “a new era of change.”

Much of the change will also come from the private sector, which is moving faster to reach city dwellers, and is more skilled in collecting and responding to data. That is leading cities everywhere to work more closely than ever with private companies, which may have different priorities than the government.

One of the biggest changes that will hit a digitally aware city, it is widely agreed, is the seemingly prosaic issue of parking. Space given to parking is expected to shrink by half or more, as self-driving cars and drone deliveries lead an overall shift in connected urban transport. That will change or eliminate acres of urban space occupied by raised and underground parking structures.

Shared vehicles are not parked as much, and with more automation, they will know where parking spaces are available, eliminating the need to drive in search of a space.

“Office complexes won’t need parking lots with twice the footprint of their buildings,” said Sebastian Thrun, who led Google’s self-driving car project in its early days and now runs Udacity, an online learning company. “Whenwe started on self-driving cars, we talked all the time about cutting the number of cars in a city by a factor of three,” or a two-thirds reduction.

In addition, police, fire, and even library services will seek greater responsiveness by tracking their own assets, and partly by looking at things like social media. Later, technologies like three-dimensional printing, new materials and robotic construction and demolition will be able to reshape skylines in a matter of weeks.

At least that is the plan. So much change afoot creates confusion….

The new techno-optimism is focused on big data and artificial intelligence.“Futurists used to think everyone would have their own plane,” said ErickGuerra, a professor of city and regional planning at the University ofPennsylvania. “We never have a good understanding of how things will actually turn out.”

He recently surveyed the 25 largest metropolitan planning organizations in the country and found that almost none have solid plans for modernizing their infrastructure. That may be the right way to approach the challenges of cities full of robots, but so far most clues are coming from companies that also sell the technology.

 “There’s a great deal of uncertainty, and a competition to show they’re low on regulation,” Mr. Guerra said. “There is too much potential money for new technology to be regulated out.”

The big tech companies say they are not interested in imposing the sweeping “smart city” projects they used to push, in part because things are changing too quickly. But they still want to build big, and they view digital surveillance as an essential component…(More)”

Can mobile usage predict illiteracy in a developing country?


Pål Sundsøy at arXiv: “The present study provides the first evidence that illiteracy can be reliably predicted from standard mobile phone logs. By deriving a broad set of mobile phone indicators reflecting users financial, social and mobility patterns we show how supervised machine learning can be used to predict individual illiteracy in an Asian developing country, externally validated against a large-scale survey. On average the model performs 10 times better than random guessing with a 70% accuracy. Further we show how individual illiteracy can be aggregated and mapped geographically at cell tower resolution. Geographical mapping of illiteracy is crucial to know where the illiterate people are, and where to put in resources. In underdeveloped countries such mappings are often based on out-dated household surveys with low spatial and temporal resolution. One in five people worldwide struggle with illiteracy, and it is estimated that illiteracy costs the global economy more than 1 trillion dollars each year. These results potentially enable cost-effective, questionnaire-free investigation of illiteracy-related questions on an unprecedented scale…(More)”.

Enablers for Smart Cities


Book by Amal El Fallah Seghrouchni, Fuyuki Ishikawa, Laurent Hérault, and Hideyuki Tokuda: “Smart cities are a new vision for urban development.  They integrate information and communication technology infrastructures – in the domains of artificial intelligence, distributed and cloud computing, and sensor networks – into a city, to facilitate quality of life for its citizens and sustainable growth.  This book explores various concepts for the development of these new technologies (including agent-oriented programming, broadband infrastructures, wireless sensor networks, Internet-based networked applications, open data and open platforms), and how they can provide smart services and enablers in a range of public domains.

The most significant research, both established and emerging, is brought together to enable academics and practitioners to investigate the possibilities of smart cities, and to generate the knowledge and solutions required to develop and maintain them…(More)”

What Governments Can Learn From Airbnb And the Sharing Economy


 in Fortune: “….Despite some regulators’ fears, the sharing economy may not result in the decline of regulation but rather in its opposite, providing a basis upon which society can develop more rational, ethical, and participatory models of regulation. But what regulation looks like, as well as who actually creates and enforce the regulation, is also bound to change.

There are three emerging models – peer regulation, self-regulatory organizations, and data-driven delegation – that promise a regulatory future for the sharing economy best aligned with society’s interests. In the adapted book excerpt that follows, I explain how the third of these approaches, of delegating enforcement of regulations to companies that store critical data on consumers, can help mitigate some of the biases Airbnb guests may face, and why this is a superior alternative to the “open data” approach of transferring consumer information to cities and state regulators.

Consider a different problem — of collecting hotel occupancy taxes from hundreds of thousands of Airbnb hosts rather than from a handful of corporate hotel chains. The delegation of tax collection to Airbnb, something a growing number of cities are experimenting with, has a number of advantages. It is likely to yield higher tax revenues and greater compliance than a system where hosts are required to register directly with the government, which is something occasional hosts seem reluctant to do. It also sidesteps privacy concerns resulting from mandates that digital platforms like Airbnb turn over detailed user data to the government. There is also significant opportunity for the platform to build credibility as it starts to take on quasi governmental roles like this.

There is yet another advantage, and the one I believe will be the most significant in the long-run. It asks a platform to leverage its data to ensure compliance with a set of laws in a manner geared towards delegating responsibility to the platform. You might say that the task in question here — computing tax owed, collecting, and remitting it—is technologically trivial. True. But I like this structure because of the potential it represents. It could be a precursor for much more exciting delegated possibilities.

For a couple of decades now, companies of different kinds have been mining the large sets of “data trails” customers provide through their digital interactions. This generates insights of business and social importance. One such effort we are all familiar with is credit card fraud detection. When an unusual pattern of activity is detected, you get a call from your bank’s security team. Sometimes your card is blocked temporarily. The enthusiasm of these digital security systems is sometimes a nuisance, but it stems from your credit card company using sophisticated machine learning techniques to identify patterns that prior experience has told it are associated with a stolen card. It saves billions of dollars in taxpayer and corporate funds by detecting and blocking fraudulent activity swiftly.

A more recent visible example of the power of mining large data sets of customer interaction came in 2008, when Google engineers announced that they could predict flu outbreaks using data collected from Google searches, and track the spread of flu outbreaks in real time, providing information that was well ahead of the information available using the Center for Disease Control’s (CDC) own tracking systems. The Google system’s performance deteriorated after a couple of years, but its impact on public perception of what might be possible using “big data” was immense.

It seems highly unlikely that such a system would have emerged if Google had been asked to hand over anonymized search data to the CDC. In fact, there would have probably been widespread public backlash to this on privacy grounds. Besides, the reason why this capability emerged organically from within Google is partly as a consequence of Google having one of the highest concentrations of computer science and machine learning talent in the world.

Similar approaches hold great promise as a regulatory approach for sharing economy platforms. Consider the issue of discriminatory practices. There has long been anecdotal evidence that some yellow cabs in New York discriminate against some nonwhite passengers. There have been similar concerns that such behavior may start to manifest on ridesharing platforms and in other peer-to-peer markets for accommodation and labor services.

For example, a 2014 study by Benjamin Edelman and Michael Luca of Harvard suggested that African American hosts might have lower pricing power than white hosts on Airbnb. While the study did not conclusively establish that the difference is due to guests discriminating against African American hosts, a follow-up study suggested that guests with “distinctively African American names” were less likely to receive favorable responses for their requests to Airbnb hosts. This research raises a red flag about the need for vigilance as the lines between personal and professional blur.

One solution would be to apply machine-learning techniques to be able to identify patterns associated with discriminatory behavior. No doubt, many platforms are already using such systems….(More)”

What is Artificial Intelligence?


Report by Mike Loukides and Ben Lorica: “Defining artificial intelligence isn’t just difficult; it’s impossible, not the least because we don’t really understand human intelligence. Paradoxically, advances in AI will help more to define what human intelligence isn’t than what artificial intelligence is.

But whatever AI is, we’ve clearly made a lot of progress in the past few years, in areas ranging from computer vision to game playing. AI is making the transition from a research topic to the early stages of enterprise adoption. Companies such as Google and Facebook have placed huge bets on AI and are already using it in their products. But Google and Facebook are only the beginning: over the next decade, we’ll see AI steadily creep into one product after another. We’ll be communicating with bots, rather than scripted robo-dialers, and not realizing that they aren’t human. We’ll be relying on cars to plan routes and respond to road hazards. It’s a good bet that in the next decades, some features of AI will be incorporated into every application that we touch and that we won’t be able to do anything without touching an application.

Given that our future will inevitably be tied up with AI, it’s imperative that we ask: Where are we now? What is the state of AI? And where are we heading?

Capabilities and Limitations Today

Descriptions of AI span several axes: strength (how intelligent is it?), breadth (does it solve a narrowly defined problem, or is it general?), training (how does it learn?), capabilities (what kinds of problems are we asking it to solve?), and autonomy (are AIs assistive technologies, or do they act on their own?). Each of these axes is a spectrum, and each point in this many-dimensional space represents a different way of understanding the goals and capabilities of an AI system.

On the strength axis, it’s very easy to look at the results of the last 20 years and realize that we’ve made some extremely powerful programs. Deep Blue beat Garry Kasparov in chess; Watson beat the best Jeopardy champions of all time; AlphaGo beat Lee Sedol, arguably the world’s best Go player. But all of these successes are limited. Deep Blue, Watson, and AlphaGo were all highly specialized, single-purpose machines that did one thing extremely well. Deep Blue and Watson can’t play Go, and AlphaGo can’t play chess or Jeopardy, even on a basic level. Their intelligence is very narrow, and can’t be generalized. A lot of work has gone into usingWatson for applications such as medical diagnosis, but it’s still fundamentally a question-and-answer machine that must be tuned for a specific domain. Deep Blue has a lot of specialized knowledge about chess strategy and an encyclopedic knowledge of openings. AlphaGo was built with a more general architecture, but a lot of hand-crafted knowledge still made its way into the code. I don’t mean to trivialize or undervalue their accomplishments, but it’s important to realize what they haven’t done.

We haven’t yet created an artificial general intelligence that can solve a multiplicity of different kinds of problems. We still don’t have a machine that can listen to recordings of humans for a year or two, and start speaking. While AlphaGo “learned” to play Go by analyzing thousands of games, and then playing thousands more against itself, the same software couldn’t be used to master chess. The same general approach? Probably. But our best current efforts are far from a general intelligence that is flexible enough to learn without supervision, or flexible enough to choose what it wants to learn, whether that’s playing board games or designing PC boards.

Toward General Intelligence

How do we get from narrow, domain-specific intelligence to more general intelligence? By “general intelligence,” we don’t necessarily mean human intelligence; but we do want machines that can solve different kinds of problems without being programmed with domain-specific knowledge. We want machines that can make human judgments and decisions. That doesn’t necessarily mean that AI systems will implement concepts like creativity, intuition, or instinct, which may have no digital analogs. A general intelligence would have the ability to follow multiple pursuits and to adapt to unexpected situations. And a general AI would undoubtedly implement concepts like “justice” and “fairness”: we’re already talking about the impact of AI on the legal system….

It’s easier to think of super-intelligence as a matter of scale. If we can create “general intelligence,” it’s easy to assume that it could quickly become thousands of times more powerful than human intelligence. Or, more precisely: either general intelligence will be significantly slower than human thought, and it will be difficult to speed it up either through hardware or software; or it will speed up quickly, through massive parallelism and hardware improvements. We’ll go from thousand-core GPUs to trillions of cores on thousands of chips, with data streaming in from billions of sensors. In the first case, when speedups are slow, general intelligence might not be all that interesting (though it will have been a great ride for the researchers). In the second case, the ramp-up will be very steep and very fast….(More) (Full Report)”

This text-message hotline can predict your risk of depression or stress


Clinton Nguyen for TechInsider: “When counselors are helping someone in the midst of an emotional crisis, they must not only know how to talk – they also must be willing to text.

Crisis Text Line, a non-profit text-message-based counseling service, operates a hotline for people who find it safer or easier to text about their problems than make a phone call or send an instant message. Over 1,500 volunteers are on hand 24/7 to lend support about problems including bullying, isolation, suicidal thoughts, bereavement, self-harm, or even just stress.

But in addition to providing a new outlet for those who prefer to communicate by text, the service is gathering a wellspring of anonymized data.

“We look for patterns in historical conversations that end up being higher risk for self harm and suicide attempts,” Liz Eddy, a Crisis Text Line spokesperson, tells Tech Insider. “By grounding in historical data, we can predict the risk of new texters coming in.crisis-text-line-sms

According to Fortune, the organization is using machine learning to prioritize higher-risk individuals for quicker and more effective responses. But Crisis Text Line is also wielding the data it gathers in other ways – the company has published a page of trends that tells the public which hours or days people are more likely to be affected by certain issues, as well as which US states are most affected by specific crises or psychological states.

According to the data, residents of Alaska reach out to the Text Line for LGBTQ issues more than those in other states, and Maine is one of the most stressed out states. Physical abuse is most commonly reported in North Dakota and Wyoming, while depression is more prevalent in texters from Kentucky and West Virginia.

The research comes at an especially critical time. According to studies from the National Center for Health Statistics, US suicide rates have surged to a 30-year high. The study noted a rise in suicide rates for all demographics except black men over the age of 75. Alarmingly, the suicide rate among 10- to 14-year-old girls has tripled since 1999….(More)”

Is artificial intelligence key to dengue prevention?


BreakDengue: “Dengue fever outbreaks are increasing in both frequency and magnitude. Not only that, the number of countries that could potentially be affected by the disease is growing all the time.

This growth has led to renewed efforts to address the disease, and a pioneering Malaysian researcher was recently recognized for his efforts to harness the power of big data and artificial intelligence to accurately predict dengue outbreaks.

Dr. Dhesi Baha Raja received the Pistoia Alliance Life Science Award at King’s College London in April of this year, for developing a disease prediction platform that employs technology and data to give people prior warning of when disease outbreaks occur.The medical doctor and epidemiologist has spent years working to develop AIME (Artificial Intelligence in Medical Epidemiology)…

it relies on a complex algorithm, which analyses a wide range of data collected by local government and also satellite image recognition systems. Over 20 variables such as weather, wind speed, wind direction, thunderstorm, solar radiation and rainfall schedule are included and analyzed. Population models and geographical terrain are also included. The ultimate result of this intersection between epidemiology, public health and technology is a map, which clearly illustrates the probability and location of the next dengue outbreak.

The ground-breaking platform can predict dengue fever outbreaks up to two or three months in advance, with an accuracy approaching 88.7 per cent and within a 400m radius. Dr. Dhesi has just returned from Rio de Janeiro, where the platform was employed in a bid to fight dengue in advance of this summer’s Olympics. In Brazil, its perceived accuracy was around 84 per cent, whereas in Malaysia in was over 88 per cent – giving it an average accuracy of 86.37 per cent.

The web-based application has been tested in two states within Malaysia, Kuala Lumpur, and Selangor, and the first ever mobile app is due to be deployed across Malaysia soon. Once its capability is adequately tested there, it will be rolled out globally. Dr. Dhesi’s team are working closely with mobile digital service provider Webe on this.

By making the app free to download, this will ensure the service becomes accessible to all, Dr Dhesi explains.
“With the web-based application, this could only be used by public health officials and agencies. We recognized the need for us to democratize this health service to the community, and the only way to do this is to provide the community with the mobile app.”
This will also enable the gathering of even greater knowledge on the possibility of dengue outbreaks in high-risk areas, as well as monitoring the changing risks as people move to different areas, he adds….(More)”

Selected Readings on Data Collaboratives


By Neil Britto, David Sangokoya, Iryna Susha, Stefaan Verhulst and Andrew Young

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data collaboratives was originally published in 2017.

The term data collaborative refers to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (including private companies, research institutions, and government agencies ) can exchange data to help solve public problems. Several of society’s greatest challenges — from addressing climate change to public health to job creation to improving the lives of children — require greater access to data, more collaboration between public – and private-sector entities, and an increased ability to analyze datasets. In the coming months and years, data collaboratives will be essential vehicles for harnessing the vast stores of privately held data toward the public good.

Selected Reading List (in alphabetical order)

Annotated Selected Readings List (in alphabetical order)

Agaba, G., Akindès, F., Bengtsson, L., Cowls, J., Ganesh, M., Hoffman, N., . . . Meissner, F. “Big Data and Positive Social Change in the Developing World: A White Paper for Practitioners and Researchers.” 2014. http://bit.ly/25RRC6N.

  • This white paper, produced by “a group of activists, researchers and data experts” explores the potential of big data to improve development outcomes and spur positive social change in low- and middle-income countries. Using examples, the authors discuss four areas in which the use of big data can impact development efforts:
    • Advocating and facilitating by “opening[ing] up new public spaces for discussion and awareness building;
    • Describing and predicting through the detection of “new correlations and the surfac[ing] of new questions;
    • Facilitating information exchange through “multiple feedback loops which feed into both research and action,” and
    • Promoting accountability and transparency, especially as a byproduct of crowdsourcing efforts aimed at “aggregat[ing] and analyz[ing] information in real time.
  • The authors argue that in order to maximize the potential of big data’s use in development, “there is a case to be made for building a data commons for private/public data, and for setting up new and more appropriate ethical guidelines.”
  • They also identify a number of challenges, especially when leveraging data made accessible from a number of sources, including private sector entities, such as:
    • Lack of general data literacy;
    • Lack of open learning environments and repositories;
    • Lack of resources, capacity and access;
    • Challenges of sensitivity and risk perception with regard to using data;
    • Storage and computing capacity; and
    • Externally validating data sources for comparison and verification.

Ansell, C. and Gash, A. “Collaborative Governance in Theory and Practice.” Journal of Public Administration Research and  Theory 18 (4), 2008. http://bit.ly/1RZgsI5.

  • This article describes collaborative arrangements that include public and private organizations working together and proposes a model for understanding an emergent form of public-private interaction informed by 137 diverse cases of collaborative governance.
  • The article suggests factors significant to successful partnering processes and outcomes include:
    • Shared understanding of challenges,
    • Trust building processes,
    • The importance of recognizing seemingly modest progress, and
    • Strong indicators of commitment to the partnership’s aspirations and process.
  • The authors provide a ‘’contingency theory model’’ that specifies relationships between different variables that influence outcomes of collaborative governance initiatives. Three “core contingencies’’ for successful collaborative governance initiatives identified by the authors are:
    • Time (e.g., decision making time afforded to the collaboration);
    • Interdependence (e.g., a high degree of interdependence can mitigate negative effects of low trust); and
    • Trust (e.g. a higher level of trust indicates a higher probability of success).

Ballivian A, Hoffman W. “Public-Private Partnerships for Data: Issues Paper for Data Revolution Consultation.” World Bank, 2015. Available from: http://bit.ly/1ENvmRJ

  • This World Bank report provides a background document on forming public-prviate partnerships for data with the private sector in order to inform the UN’s Independent Expert Advisory Group (IEAG) on sustaining a “data revolution” in sustainable development.
  • The report highlights the critical position of private companies within the data value chain and reflects on key elements of a sustainable data PPP: “common objectives across all impacted stakeholders, alignment of incentives, and sharing of risks.” In addition, the report describes the risks and incentives of public and private actors, and the principles needed to “build[ing] the legal, cultural, technological and economic infrastructures to enable the balancing of competing interests.” These principles include understanding; experimentation; adaptability; balance; persuasion and compulsion; risk management; and governance.
  • Examples of data collaboratives cited in the report include HP Earth Insights, Orange Data for Development Challenges, Amazon Web Services, IBM Smart Cities Initiative, and the Governance Lab’s Open Data 500.

Brack, Matthew, and Tito Castillo. “Data Sharing for Public Health: Key Lessons from Other Sectors.” Chatham House, Centre on Global Health Security. April 2015. Available from: http://bit.ly/1DHFGVl

  • The Chatham House report provides an overview on public health surveillance data sharing, highlighting the benefits and challenges of shared health data and the complexity in adapting technical solutions from other sectors for public health.
  • The report describes data sharing processes from several perspectives, including in-depth case studies of actual data sharing in practice at the individual, organizational and sector levels. Among the key lessons for public health data sharing, the report strongly highlights the need to harness momentum for action and maintain collaborative engagement: “Successful data sharing communities are highly collaborative. Collaboration holds the key to producing and abiding by community standards, and building and maintaining productive networks, and is by definition the essence of data sharing itself. Time should be invested in establishing and sustaining collaboration with all stakeholders concerned with public health surveillance data sharing.”
  • Examples of data collaboratives include H3Africa (a collaboration between NIH and Wellcome Trust) and NHS England’s care.data programme.

de Montjoye, Yves-Alexandre, Jake Kendall, and Cameron F. Kerry. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, Issues in Technology Innovation. November 2014. Available from: http://brook.gs/1JxVpxp

  • Using Ebola as a case study, the authors describe the value of using private telecom data for uncovering “valuable insights into understanding the spread of infectious diseases as well as strategies into micro-target outreach and driving update of health-seeking behavior.”
  • The authors highlight the absence of a common legal and standards framework for “sharing mobile phone data in privacy-conscientious ways” and recommend “engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.”

Eckartz, Silja M., Hofman, Wout J., Van Veenstra, Anne Fleur. “A decision model for data sharing.” Vol. 8653 LNCS. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. http://bit.ly/21cGWfw.

  • This paper proposes a decision model for data sharing of public and private data based on literature review and three case studies in the logistics sector.
  • The authors identify five categories of the barriers to data sharing and offer a decision model for identifying potential interventions to overcome each barrier:
    • Ownership. Possible interventions likely require improving trust among those who own the data through, for example, involvement and support from higher management
    • Privacy. Interventions include “anonymization by filtering of sensitive information and aggregation of data,” and access control mechanisms built around identity management and regulated access.  
    • Economic. Interventions include a model where data is shared only with a few trusted organizations, and yield management mechanisms to ensure negative financial consequences are avoided.
    • Data quality. Interventions include identifying additional data sources that could improve the completeness of datasets, and efforts to improve metadata.
    • Technical. Interventions include making data available in structured formats and publishing data according to widely agreed upon data standards.

Hoffman, Sharona and Podgurski, Andy. “The Use and Misuse of Biomedical Data: Is Bigger Really Better?” American Journal of Law & Medicine 497, 2013. http://bit.ly/1syMS7J.

  • This journal articles explores the benefits and, in particular, the risks related to large-scale biomedical databases bringing together health information from a diversity of sources across sectors. Some data collaboratives examined in the piece include:
    • MedMining – a company that extracts EHR data, de-identifies it, and offers it to researchers. The data sets that MedMining delivers to its customers include ‘lab results, vital signs, medications, procedures, diagnoses, lifestyle data, and detailed costs’ from inpatient and outpatient facilities.
    • Explorys has formed a large healthcare database derived from financial, administrative, and medical records. It has partnered with major healthcare organizations such as the Cleveland Clinic Foundation and Summa Health System to aggregate and standardize health information from ten million patients and over thirty billion clinical events.
  • Hoffman and Podgurski note that biomedical databases populated have many potential uses, with those likely to benefit including: “researchers, regulators, public health officials, commercial entities, lawyers,” as well as “healthcare providers who conduct quality assessment and improvement activities,” regulatory monitoring entities like the FDA, and “litigants in tort cases to develop evidence concerning causation and harm.”
  • They argue, however, that risks arise based on:
    • The data contained in biomedical databases is surprisingly likely to be incorrect or incomplete;
    • Systemic biases, arising from both the nature of the data and the preconceptions of investigators are serious threats the validity of research results, especially in answering causal questions;
  • Data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers.

Krumholz, Harlan M., et al. “Sea Change in Open Science and Data Sharing Leadership by Industry.” Circulation: Cardiovascular Quality and Outcomes 7.4. 2014. 499-504. http://1.usa.gov/1J6q7KJ

  • This article provides a comprehensive overview of industry-led efforts and cross-sector collaborations in data sharing by pharmaceutical companies to inform clinical practice.
  • The article details the types of data being shared and the early activities of GlaxoSmithKline (“in coordination with other companies such as Roche and ViiV”); Medtronic and the Yale University Open Data Access Project; and Janssen Pharmaceuticals (Johnson & Johnson). The article also describes the range of involvement in data sharing among pharmaceutical companies including Pfizer, Novartis, Bayer, AbbVie, Eli Llly, AstraZeneca, and Bristol-Myers Squibb.

Mann, Gideon. “Private Data and the Public Good.” Medium. May 17, 2016. http://bit.ly/1OgOY68.

    • This Medium post from Gideon Mann, the Head of Data Science at Bloomberg, shares his prepared remarks given at a lecture at the City College of New York. Mann argues for the potential benefits of increasing access to private sector data, both to improve research and academic inquiry and also to help solve practical, real-world problems. He also describes a number of initiatives underway at Bloomberg along these lines.    
  • Mann argues that data generated at private companies “could enable amazing discoveries and research,” but is often inaccessible to those who could put it to those uses. Beyond research, he notes that corporate data could, for instance, benefit:
      • Public health – including suicide prevention, addiction counseling and mental health monitoring.
    • Legal and ethical questions – especially as they relate to “the role algorithms have in decisions about our lives,” such as credit checks and resume screening.
  • Mann recognizes the privacy challenges inherent in private sector data sharing, but argues that it is a common misconception that the only two choices are “complete privacy or complete disclosure.” He believes that flexible frameworks for differential privacy could open up new opportunities for responsibly leveraging data collaboratives.

Pastor Escuredo, D., Morales-Guzmán, A. et al, “Flooding through the Lens of Mobile Phone Activity.” IEEE Global Humanitarian Technology Conference, GHTC 2014. Available from: http://bit.ly/1OzK2bK

  • This report describes the impact of using mobile data in order to understand the impact of disasters and improve disaster management. The report was conducted in the Mexican state of Tabasco in 2009 as a multidisciplinary, multi-stakeholder consortium involving the UN World Food Programme (WFP), Telefonica Research, Technical University of Madrid (UPM), Digital Strategy Coordination Office of the President of Mexico, and UN Global Pulse.
  • Telefonica Research, a division of the major Latin American telecommunications company, provided call detail records covering flood-affected areas for nine months. This data was combined with “remote sensing data (satellite images), rainfall data, census and civil protection data.” The results of the data demonstrated that “analysing mobile activity during floods could be used to potentially locate damaged areas, efficiently assess needs and allocate resources (for example, sending supplies to affected areas).”
  • In addition to the results, the study highlighted “the value of a public-private partnership on using mobile data to accurately indicate flooding impacts in Tabasco, thus improving early warning and crisis management.”

* Perkmann, M. and Schildt, H. “Open data partnerships between firms and universities: The role of boundary organizations.” Research Policy, 44(5), 2015. http://bit.ly/25RRJ2c

  • This paper discusses the concept of a “boundary organization” in relation to industry-academic partnerships driven by data. Boundary organizations perform mediated revealing, allowing firms to disclose their research problems to a broad audience of innovators and simultaneously minimize the risk that this information would be adversely used by competitors.
  • The authors identify two especially important challenges for private firms to enter open data or participate in data collaboratives with the academic research community that could be addressed through more involvement from boundary organizations:
    • First is a challenge of maintaining competitive advantage. The authors note that, “the more a firm attempts to align the efforts in an open data research programme with its R&D priorities, the more it will have to reveal about the problems it is addressing within its proprietary R&D.”
    • Second, involves the misalignment of incentives between the private and academic field. Perkmann and Schildt argue that, a firm seeking to build collaborations around its opened data “will have to provide suitable incentives that are aligned with academic scientists’ desire to be rewarded for their work within their respective communities.”

Robin, N., Klein, T., & Jütting, J. “Public-Private Partnerships for Statistics: Lessons Learned, Future Steps.” OECD. 2016. http://bit.ly/24FLYlD.

  • This working paper acknowledges the growing body of work on how different types of data (e.g, telecom data, social media, sensors and geospatial data, etc.) can address data gaps relevant to National Statistical Offices (NSOs).
  • Four models of public-private interaction for statistics are describe: in-house production of statistics by a data-provider for a national statistics office (NSO), transfer of data-sets to NSOs from private entities, transfer of data to a third party provider to manage the NSO and private entity data, and the outsourcing of NSO functions.
  • The paper highlights challenges to public-private partnerships involving data (e.g., technical challenges, data confidentiality, risks, limited incentives for participation), suggests deliberate and highly structured approaches to public-private partnerships involving data require enforceable contracts, emphasizes the trade-off between data specificity and accessibility of such data, and the importance of pricing mechanisms that reflect the capacity and capability of national statistic offices.
  • Case studies referenced in the paper include:
    • A mobile network operator’s (MNO Telefonica) in house analysis of call detail records;
    • A third-party data provider and steward of travel statistics (Positium);
    • The Data for Development (D4D) challenge organized by MNO Orange; and
    • Statistics Netherlands use of social media to predict consumer confidence.

Stuart, Elizabeth, Samman, Emma, Avis, William, Berliner, Tom. “The data revolution: finding the missing millions.” Overseas Development Institute, 2015. Available from: http://bit.ly/1bPKOjw

  • The authors of this report highlight the need for good quality, relevant, accessible and timely data for governments to extend services into underrepresented communities and implement policies towards a sustainable “data revolution.”
  • The solutions focused on this recent report from the Overseas Development Institute focus on capacity-building activities of national statistical offices (NSOs), alternative sources of data (including shared corporate data) to address gaps, and building strong data management systems.

Taylor, L., & Schroeder, R. “Is bigger better? The emergence of big data as a tool for international development policy.” GeoJournal, 80(4). 2015. 503-518. http://bit.ly/1RZgSy4.

  • This journal article describes how privately held data – namely “digital traces” of consumer activity – “are becoming seen by policymakers and researchers as a potential solution to the lack of reliable statistical data on lower-income countries.
  • They focus especially on three categories of data collaborative use cases:
    • Mobile data as a predictive tool for issues such as human mobility and economic activity;
    • Use of mobile data to inform humanitarian response to crises; and
    • Use of born-digital web data as a tool for predicting economic trends, and the implications these have for LMICs.
  • They note, however, that a number of challenges and drawbacks exist for these types of use cases, including:
    • Access to private data sources often must be negotiated or bought, “which potentially means substituting negotiations with corporations for those with national statistical offices;”
    • The meaning of such data is not always simple or stable, and local knowledge is needed to understand how people are using the technologies in question
    • Bias in proprietary data can be hard to understand and quantify;
    • Lack of privacy frameworks; and
    • Power asymmetries, wherein “LMIC citizens are unwittingly placed in a panopticon staffed by international researchers, with no way out and no legal recourse.”

van Panhuis, Willem G., Proma Paul, Claudia Emerson, John Grefenstette, Richard Wilder, Abraham J. Herbst, David Heymann, and Donald S. Burke. “A systematic review of barriers to data sharing in public health.” BMC public health 14, no. 1 (2014): 1144. Available from: http://bit.ly/1JOBruO

  • The authors of this report provide a “systematic literature of potential barriers to public health data sharing.” These twenty potential barriers are classified in six categories: “technical, motivational, economic, political, legal and ethical.” In this taxonomy, “the first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.”
  • The authors suggest the need for a “systematic framework of barriers to data sharing in public health” in order to accelerate access and use of data for public good.

Verhulst, Stefaan and Sangokoya, David. “Mapping the Next Frontier of Open Data: Corporate Data Sharing.” In: Gasser, Urs and Zittrain, Jonathan and Faris, Robert and Heacock Jones, Rebekah, “Internet Monitor 2014: Reflections on the Digital World: Platforms, Policy, Privacy, and Public Discourse (December 15, 2014).” Berkman Center Research Publication No. 2014-17. http://bit.ly/1GC12a2

  • This essay describe a taxonomy of current corporate data sharing practices for public good: research partnerships; prizes and challenges; trusted intermediaries; application programming interfaces (APIs); intelligence products; and corporate data cooperatives or pooling.
  • Examples of data collaboratives include: Yelp Dataset Challenge, the Digital Ecologies Research Partnerhsip, BBVA Innova Challenge, Telecom Italia’s Big Data Challenge, NIH’s Accelerating Medicines Partnership and the White House’s Climate Data Partnerships.
  • The authors highlight important questions to consider towards a more comprehensive mapping of these activities.

Verhulst, Stefaan and Sangokoya, David, 2015. “Data Collaboratives: Exchanging Data to Improve People’s Lives.” Medium. Available from: http://bit.ly/1JOBDdy

  • The essay refers to data collaboratives as a new form of collaboration involving participants from different sectors exchanging data to help solve public problems. These forms of collaborations can improve people’s lives through data-driven decision-making; information exchange and coordination; and shared standards and frameworks for multi-actor, multi-sector participation.
  • The essay cites four activities that are critical to accelerating data collaboratives: documenting value and measuring impact; matching public demand and corporate supply of data in a trusted way; training and convening data providers and users; experimenting and scaling existing initiatives.
  • Examples of data collaboratives include NIH’s Precision Medicine Initiative; the Mobile Data, Environmental Extremes and Population (MDEEP) Project; and Twitter-MIT’s Laboratory for Social Machines.

Verhulst, Stefaan, Susha, Iryna, Kostura, Alexander. “Data Collaboratives: matching Supply of (Corporate) Data to Solve Public Problems.” Medium. February 24, 2016. http://bit.ly/1ZEp2Sr.

  • This piece articulates a set of key lessons learned during a session at the International Data Responsibility Conference focused on identifying emerging practices, opportunities and challenges confronting data collaboratives.
  • The authors list a number of privately held data sources that could create positive public impacts if made more accessible in a collaborative manner, including:
    • Data for early warning systems to help mitigate the effects of natural disasters;
    • Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
    • Data to monitor compliance with weapons treaties;
    • Data to more accurately measure progress related to the UN Sustainable Development Goals.
  • To the end of identifying and expanding on emerging practice in the space, the authors describe a number of current data collaborative experiments, including:
    • Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
    • Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
    • Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
    • Sharing intelligence products: JPMorgan Chase shares macro economic insights they gained leveraging their data through the newly established JPMorgan Chase Institute.
  • In order to capitalize on the opportunities provided by data collaboratives, a number of needs were identified:
    • A responsible data framework;
    • Increased insight into different business models that may facilitate the sharing of data;
    • Capacity to tap into the potential value of data;
    • Transparent stock of available data supply; and
    • Mapping emerging practices and models of sharing.

Vogel, N., Theisen, C., Leidig, J. P., Scripps, J., Graham, D. H., & Wolffe, G. “Mining mobile datasets to enable the fine-grained stochastic simulation of Ebola diffusion.” Paper presented at the Procedia Computer Science. 2015. http://bit.ly/1TZDroF.

  • The paper presents a research study conducted on the basis of the mobile calls records shared with researchers in the framework of the Data for Development Challenge by the mobile operator Orange.
  • The study discusses the data analysis approach in relation to developing a situation of Ebola diffusion built around “the interactions of multi-scale models, including viral loads (at the cellular level), disease progression (at the individual person level), disease propagation (at the workplace and family level), societal changes in migration and travel movements (at the population level), and mitigating interventions (at the abstract government policy level).”
  • The authors argue that the use of their population, mobility, and simulation models provide more accurate simulation details in comparison to high-level analytical predictions and that the D4D mobile datasets provide high-resolution information useful for modeling developing regions and hard to reach locations.

Welle Donker, F., van Loenen, B., & Bregt, A. K. “Open Data and Beyond.” ISPRS International Journal of Geo-Information, 5(4). 2016. http://bit.ly/22YtugY.

  • This research has developed a monitoring framework to assess the effects of open (private) data using a case study of a Dutch energy network administrator Liander.
  • Focusing on the potential impacts of open private energy data – beyond ‘smart disclosure’ where citizens are given information only about their own energy usage – the authors identify three attainable strategic goals:
    • Continuously optimize performance on services, security of supply, and costs;
    • Improve management of energy flows and insight into energy consumption;
    • Help customers save energy and switch over to renewable energy sources.
  • The authors propose a seven-step framework for assessing the impacts of Liander data, in particular, and open private data more generally:
    • Develop a performance framework to describe what the program is about, description of the organization’s mission and strategic goals;
    • Identify the most important elements, or key performance areas which are most critical to understanding and assessing your program’s success;
    • Select the most appropriate performance measures;
    • Determine the gaps between what information you need and what is available;
    • Develop and implement a measurement strategy to address the gaps;
    • Develop a performance report which highlights what you have accomplished and what you have learned;
    • Learn from your experiences and refine your approach as required.
  • While the authors note that the true impacts of this open private data will likely not come into view in the short term, they argue that, “Liander has successfully demonstrated that private energy companies can release open data, and has successfully championed the other Dutch network administrators to follow suit.”

World Economic Forum, 2015. “Data-driven development: pathways for progress.” Geneva: World Economic Forum. http://bit.ly/1JOBS8u

  • This report captures an overview of the existing data deficit and the value and impact of big data for sustainable development.
  • The authors of the report focus on four main priorities towards a sustainable data revolution: commercial incentives and trusted agreements with public- and private-sector actors; the development of shared policy frameworks, legal protections and impact assessments; capacity building activities at the institutional, community, local and individual level; and lastly, recognizing individuals as both produces and consumers of data.