OpenAI won’t benefit humanity without data-sharing


 at the Guardian: “There is a common misconception about what drives the digital-intelligence revolution. People seem to have the idea that artificial intelligence researchers are directly programming an intelligence; telling it what to do and how to react. There is also the belief that when we interact with this intelligence we are processed by an “algorithm” – one that is subject to the whims of the designer and encodes his or her prejudices.

OpenAI, a new non-profit artificial intelligence company that was founded on Friday, wants to develop digital intelligence that will benefit humanity. By sharing its sentient algorithms with all, the venture, backed by a host of Silicon Valley billionaires, including Elon Musk and Peter Thiel, wants to avoid theexistential risks associated with the technology.

OpenAI’s launch announcement was timed to coincide with this year’s Neural Information Processing Systems conference: the main academic outlet for scientific advances in machine learning, which I chaired. Machine learning is the technology that underpins the new generation of AI breakthroughs.

One of OpenAI’s main ideas is to collaborate openly, publishing code and papers. This is admirable and the wider community is already excited by what the company could achieve.

OpenAI is not the first company to target digital intelligence, and certainly not the first to publish code and papers. Both Facebook and Google have already shared code. They were also present at the same conference. All three companies hosted parties with open bars, aiming to entice the latest and brightest minds.

However, the way machine learning works means that making algorithms available isn’t necessarily as useful as one might think. A machine- learning algorithm is subtly different from popular perception.

Just as in baking we don’t have control over how the cake will emerge from the oven, in machine learning we don’t control every decision that the computer will make. In machine learning the quality of the ingredients, the quality of the data provided, has a massive impact on the intelligence that is produced.

For intelligent decision-making the recipe needs to be carefully applied to the data: this is the process we refer to as learning. The result is the combination of our data and the recipe. We need both to make predictions.

By sharing their algorithms, Facebook and Google are merely sharing the recipe. Someone has to provide the eggs and flour and provide the baking facilities (which in Google and Facebook’s case are vast data-computation facilities, often located near hydroelectric power stations for cheaper electricity).

So even before they start, an open question for OpenAI is how will it ensure it has access to the data on the necessary scale to make progress?…(More)”

The Moral Failure of Computer Scientists


Kaveh Waddell at the Atlantic: “Computer scientists and cryptographers occupy some of the ivory tower’s highest floors. Among academics, their work is prestigious and celebrated. To the average observer, much of it is too technical to comprehend. The field’s problems can sometimes seem remote from reality.

But computer science has quite a bit to do with reality. Its practitioners devise the surveillance systems that watch over nearly every space, public or otherwise—and they design the tools that allow for privacy in the digital realm. Computer science is political, by its very nature.

That’s at least according to Phillip Rogaway, a professor of computer science at the University of California, Davis, who has helped create some of the most important tools that secure the Internet today. Last week, Rogaway took his case directly to a roomful of cryptographers at a conference in Auckland, New Zealand. He accused them of a moral failure: By allowing the government to construct a massive surveillance apparatus, the field had abused the public trust. Rogaway said the scientists had a duty to pursue social good in their work.
He likened the danger posed by modern governments’ growing surveillance capabilities to the threat of nuclear warfare in the 1950s, and called upon scientists to step up and speak out today, as they did then.

I spoke to Rogaway about why cryptographers fail to see their work in moral terms, and the emerging link between encryption and terrorism in the national conversation. A transcript of our conversation appears below, lightly edited for concision and clarity….(More)”

Stretching science: why emotional intelligence is key to tackling climate change


Faith Kearns at the Conversation: “…some environmental challenges are increasingly taking on characteristics of intractable conflicts, which may remain unresolved despite good faith efforts.

In the case of climate change, conflicts ranging from debates over how to lower emissions to denialism are obvious and ongoing -– the science community has often approached them as something to be defeated or ignored.

While some people love it and others hate it, conflict is often an indicator that something important is happening; we generally don’t fight about things we don’t care about.

Working with conflict is a challenging proposition, in part because while it manifests in interactions with others, much of the real effort comes in dealing with our own internal conflicts.

However, beginning to accept and even value conflict as a necessary part of large-scale societal transformation has the potential to generate new approaches to climate change engagement. For example, understanding that in some cases denial by another person is protective may lead to new approaches to engagement.

As we connect more deeply with conflict, we may come to see it not as a flame to be fanned or put out, but as a resource.

A relational approach to climate change

Indeed, because of the emotion and conflict involved, the concept of a relational approach is one that offers a great deal of promise in the climate change arena. It is, however, vastly underexplored.

Relationship-centered approaches have been taken up in law, medicine, and psychology.

A common thread among these fields is a shift from expert-driven to more collaborative modes of working together. Navigating the personal and emotional elements of this kind of work asks quite a bit more of practitioners than subject-matter expertise.

In medicine, for example, relationship-centered care is a framework examining how relationships – between patients and clinicians, among clinicians, and even with broader communities – impact health care. It recognizes that care may go well beyond technical competency.

This kind of framework can demonstrate how a relational approach is different from more colloquial understandings of relationships; it can be a way to intentionally and transparently attend to conflict and power dynamics as they arise.

Although this is a simplified view of relational work, many would argue that an emphasis on emergent and transformative properties of relationships has been revolutionary. And one of the key challenges, and opportunities, of a relationship-centered approach to climate work is that we truly have no idea what the outcomes will be.

We have long tried to motivate action around climate change by decreasing scientific uncertainty, so introducing social uncertainty feels risky. At the same time it can be a relief because, in working together, nobody has to have the answer.

Learning to be comfortable with discomfort

A relational approach to climate change may sound basic to some, and complicated to others. In either case, it can be useful to know there is evidence that skillful relational capacity can be taught and learned.

The medical and legal communities have been developing relationship-centered training for years.

It is clear that relational skills and capacities like conflict resolution, empathy, and compassion can be enhanced through practices including active listening and self-reflection. Although it may seem an odd fit, climate change invites ability to work together in new ways that include acknowledging and working with the strong emotions involved.

With a relationship-centered approach, climate change issues become less about particular solutions, and more about transforming how we work together. It is both risky and revolutionary in that it asks us to take a giant leap into trusting not just scientific information, but each other….(More)”

China’s Biggest Polluters Face Wrath of Data-Wielding Citizens


Bloomberg News: “Besides facing hefty fines, criminal punishments and the possibility of closing, the worst emitters in China risk additional public anger as new smartphone applications and lower-cost monitoring devices widen access to data on pollution sources.

The Blue Map app, developed by the Institute of Public & Environmental Affairs with support from the SEE Foundation and the Alibaba Foundation, provides pollution data from more than 3,000 large coal-power, steel, cement and petrochemical production plants. Origins Technology Ltd. in July began sale of the Laser Egg, a palm-sized air quality monitor used to track indoor and outdoor air quality by measuring fine particulate matter in the air.

“Letting people know the sources of regional pollution will help the push for control over emissions of every chimney,” said Ma Jun, the founder and director of the Beijing-based IPE.

The phone map and Laser Egg are the latest levers in prying control over information on air quality from the hands of the few to the many, and they’re beginning to weigh on how officials respond to the issue. Numerous smartphone applications, including those developed by SINA Corp. and Moji Fengyun (Beijing) Software Technology Development Co., now provide people in China with real-time access to air quality readings, essentially democratizing what was once an information pipeline available only to the government.

“China’s continuing struggle to control and reduce air pollution exemplifies the government’s fear that lifestyle issues will mutate into demands for political change,” said Mary Gallagher, an associate professor of political science at the University of Michigan.

Even the government is getting in on the act. The Ministry of Environmental Protection rolled out a smartphone application called “Nationwide Air Quality” with the help ofWuhan Juzheng Environmental Science & Technology Co. at the end of 2013.

“As citizens know more about air pollution, more pressure will be put on the government,” said Xu Qinxiang, a technology manager at Wuhan Juzheng. “This will urge the government to control pollutant sources and upgrade heavy industries.”

 Laser Egg

Sources of air quality data come from the China National Environment Monitoring Center, local environmental protection bureaus and non-Chinese sources such as the U.S. Embassy’s website in Beijing, Xu said.

Air quality is a controversial subject in China. Since 2012, the public has pushed the government to move more quickly than planned to begin releasing data measuring pollution levels — especially of PM2.5, the particulates most harmful to human health.

The reading was 267 micrograms per cubic meter at 10 a.m. Monday near Tiananmen Square, according to the Beijing Municipal Environmental Monitoring Center. The World Health Organization cautions against 24-hour exposure to concentrations higher than 25.

The availability of data appears to be filling a need, especially with the arrival of colder temperatures and the associated smog that blanketed Beijing and northern Chinarecently….

“With more disclosure of the data, everyone becomes more sensitive, hoping the government can do something,” Li Yajuan, a 27-year-old office secretary, said in an interview in Beijing’s Fuchengmen area. “It’s our own living environment after all.”

Efforts to make products linked to air data continue. IBM has been developing artificial intelligence to help fight Beijing’s toxic air pollution, and plans to work with other municipalities in China and India on similar projects to manage air quality….(More)”

Engaging Citizens: A Review of Eight Approaches to Civic Engagement


 at User Experience: “Twenty years ago, Robert Putnam wrote about the rise of “bowling alone,” a metaphor for people participating in activities as individuals instead of groups that can lead to community. This led to the decline in social capital in America. The problem of individual participation as opposed to community building has become an even bigger problem since the invention of smartphones, the Internet as the source of all information, social networking, and asynchronous entertainment. We never need to talk to anyone anymore and it often feels like an imposition when we ask for an answer we know we could find online.

Putnam posited that the decline in social capital is a cause for decline in civic engagement and participation in democracy. If we aren’t engaged socially with the people around us, we don’t have as much incentive to care about what is going on that might affect them. Local elections have low voter turnout in part because people aren’t aware of or engaged in local issues.

In an attempt to chip away at this problem, platforms that attempt to encourage people to engage in civic life with government and local communities have been popping up. But how well do they actually engage people? These platforms are often criticized for producing “slacktivists” who are applying the minimum amount of effort possible and not really effecting change. Several of these platforms were evaluated to see how they work and to determine how well they actually promote civic engagement.

Measuring Civic Engagement

Code for America is an organization that works to increase engagement with local governments by putting together “brigades” of local volunteers to solve local problems using technology. They have developed an Engagement Standard that attempts to measure how well a government enables citizens to engage in civic life.

Elements of Code for America’s Engagement Standard include:

  • Reach: Defining the constituency you are trying to reach, with an emphasis on identifying those whose voices aren’t already represented.
  • Channels: Making use of a diversity of spaces, both online and off, that meet people where they are.
  • Information: Providing relevant information that is easy to find and understand, and speak with an authentic voice.
  • Productive Actions: Identifying clear, concrete, and meaningful actions residents can take to reach desired outcomes.
  • Feedback Loops: Making sure the public understands the productive impact of their participation and that their actions have value.

These elements form a funnel, shown in Figure 1, which starts with reaching the right audience and ends with providing feedback to that audience on the effects of their actions. Platforms that have low engagement tend to get stuck at the top of the funnel and platforms that foster more engagement meet all the standards in the funnel.

Funnel of engagement showing reach, channels, information, productive actions, and feedback loops.

Eight Approaches to Civic Engagement

Each of the platforms described below attempts to engage citizens in civic actions. However, each platform has a different approach….(More)”

The Quest for Good Governance


New book byAlina Mungiu-Pippidi: “Why do some societies manage to control corruption so that it manifests itself only occasionally, while other societies remain systemically corrupt? This book is about how societies reach that point when integrity becomes the norm and corruption the exception in regard to how public affairs are run and public resources are allocated. It primarily asks what lessons we have learned from historical and contemporary experiences in developing corruption control, which can aid policy-makers and civil societies in steering and expediting this process. Few states now remain without either an anticorruption agency or an Ombudsman, yet no statistical evidence can be found that they actually induce progress. Using both historical and contemporary studies and easy to understand statistics, Alina Mungiu-Pippidi looks at how to diagnose, measure and change governance so that those entrusted with power and authority manage to defend public resources….(More)”

Opening up government data for public benefit


Keiran Hardy at the Mandarin (Australia): “…This post explains the open data movement and considers the benefits and risks of releasing government data as open data. It then outlines the steps taken by the Labor and Liberal governments in accordance with this trend. It argues that the Prime Minister’stask, while admirably intentioned, is likely to prove difficult due to ongoing challenges surrounding the requirements of privacy law and a public service culture that remains reluctant to release government data into the public domain….

A key purpose of releasing government data is to improve the effectiveness and efficiency of services delivered by the government. For example, data on crops, weather and geography might be analysed to improve current approaches to farming and industry, or data on hospital admissions might be analysed alongside demographic and census data to improve the efficiency of health services in areas of need. It has been estimated that such innovation based on open data could benefit the Australian economy by up to $16 billion per year.

Another core benefit is that the open data movement is making gains in transparency and accountability, as a greater proportion of government decisions and operations are being shared with the public. These democratic values are made clear in the OGP’s Open Government Declaration, which aims to make governments ‘more open, accountable, and responsive to citizens’.

Open data can also improve democratic participation by allowing citizens to contribute to policy innovation. Events like GovHack, an annual Australian competition in which government, industry and the general public collaborate to find new uses for open government data, epitomise a growing trend towards service delivery informed by user input. The winner of the “Best Policy Insights Hack” at GovHack 2015 developed a software program for analysing which suburbs are best placed for rooftop solar investment.

At the same time, the release of government data poses significant risks to the privacy of Australian citizens. Much of the open data currently available is spatial (geographic or satellite) data, which is relatively unproblematic to post online as it poses minimal privacy risks. However, for the full benefits of open data to be gained, these kinds of data need to be supplemented with information on welfare payments, hospital admission rates and other potentially sensitive areas which could drive policy innovation.

Policy data in these areas would be de-identified — that is, all names, addresses and other obvious identifying information would be removed so that only aggregate or statistical data remains. However, debates continue as to the reliability of de-identification techniques, as there have been prominent examples of individuals being re-identified by cross-referencing datasets….

With regard to open data, a culture resistant to releasing government informationappears to be driven by several similar factors, including:

  • A generational preference amongst public service management for maintaining secrecy of information, whereas younger generations expect that data should be made freely available;
  • Concerns about the quality or accuracy of information being released;
  • Fear that mistakes or misconduct on behalf of government employees might be exposed;
  • Limited understanding of the benefits that can be gained from open data; and
  • A lack of leadership to help drive the open data movement.

If open data policies have a similar effect on public service culture as FOI legislation, it may be that open data policies in fact hinder transparency by having a chilling effect on government decision-making for fear of what might be exposed….

These legal and cultural hurdles will pose ongoing challenges for the Turnbull government in seeking to release greater amounts of government data as open data….(More)

Big Data Before the Web


Evan Hepler-Smith in the Wall Street Journal: “Sometime in the early 1950s, on a reservation in Wisconsin, a Menominee Indian man looked at an ink blot. An anthropologist recorded the man’s reaction according to a standard Rorschach-test protocol. The researcher submitted a copy of these notes to an enormous cache of records collected over the course of decades by American social scientists working among various “societies ‘other than our own.’ ” This entire collection of social-scientific data was photographed and printed in arrays of microscopic images on 3-by-5-inch cards. Sets of these cards were shipped to research libraries around the world. They gathered dust.

In the results of this Rorschach test, the anthropologist saw evidence of a culture eroded by modernity. Sixty years later, these documents also testify to the aspirations and fate of the social-scientific project for which they were generated. Deep within this forgotten Ozymandian card file sits the Menominee man’s reaction to Rorschach card VI: “It is like a dead planet. It seems to tell the story of a people once great who have lost . . . like something happened. All that’s left is the symbol.”

In “Database of Dreams: The Lost Quest to Catalog Humanity,” Rebecca Lemov delves into the ambitious efforts of mid-20th-century social scientists to build a “capacious and reliable science of the varieties of the human being” by generating an archive of human experience through interviews and tests and by storing the information on the high-tech media of the day.

 For these psychologists and anthropologists, the key to a universal human science lay in studying members of cultures in transition between traditional and modern ways of life and in rendering their individuality as data. Interweaving stories of social scientists, Native American research subjects and information technologies, Ms. Lemov presents a compelling account of “what ‘humanness’ came to mean in an age of rapid change in technological and social conditions.” Ms. Lemov, an associate professor of the history of science at Harvard University, follows two contrasting threads through a story that she calls “a parable for our time.” She shows, first, how collecting data about human experience shapes human experience and, second, how a high-tech data repository of the 1950s became, as she puts it, a “data ruin.”…(More) – See also: Database of Dreams: The Lost Quest to Catalog Humanity

OpenFDA: an innovative platform providing access to a wealth of FDA’s publicly available data


Paper by Taha A Kass-Hout et al in JAMIA: “The objective of openFDA is to facilitate access and use of big important Food and Drug Administration public datasets by developers, researchers, and the public through harmonization of data across disparate FDA datasets provided via application programming interfaces (APIs).

Materials and Methods: Using cutting-edge technologies deployed on FDA’s new public cloud computing infrastructure, openFDA provides open data for easier, faster (over 300 requests per second per process), and better access to FDA datasets; open source code and documentation shared on GitHub for open community contributions of examples, apps and ideas; and infrastructure that can be adopted for other public health big data challenges.

Results:Since its launch on June 2, 2014, openFDA has developed four APIs for drug and device adverse events, recall information for all FDA-regulated products, and drug labeling. There have been more than 20 million API calls (more than half from outside the United States), 6000 registered users, 20,000 connected Internet Protocol addresses, and dozens of new software (mobile or web) apps developed. A case study demonstrates a use of openFDA data to understand an apparent association of a drug with an adverse event. Conclusion With easier and faster access to these datasets, consumers worldwide can learn more about FDA-regulated products

Conclusion: With easier and faster access to these datasets, consumers worldwide can learn more about FDA-regulated products…(More)”

Data Science ethics


Gov.uk blog: “If Tesco knows day-to-day how poorly the nation is, how can Government access  similar  insights so it can better plan health services? If Airbnb can give you a tailored service depending on your tastes, how can Government provide people with the right support to help them back into work in a way that is right for them? If companies are routinely using social media data to get feedback from their customers to improve their services, how can Government also use publicly available data to do the same?

Data science allows us to use new types of data and powerful tools to analyse this more quickly and more objectively than any human could. It can put us in the vanguard of policymaking – revealing new insights that leads to better and more tailored interventions. And  it can help reduce costs, freeing up resource to spend on more serious cases.

But some of these data uses and machine-learning techniques are new and still relatively untested in Government. Of course, we operate within legal frameworks such as the Data Protection Act and Intellectual Property law. These are flexible but don’t always talk explicitly about the new challenges data science throws up. For example, how are you to explain the decision making process of a deep learning black box algorithm? And if you were able to, how would you do so in plain English and not a row of 0s and 1s?

We want data scientists to feel confident to innovate with data, alongside  the policy makers and operational staff who make daily decisions on the data that the analysts provide –. That’s why we are creating an ethical framework which brings together the relevant parts of the law and ethical considerations into a simple document that helps Government officials decide what it can do and what it should do. We have a moral responsibility to maximise the use of data – which is never more apparent than after incidents of abuse or crime are left undetected – as well as to pay heed to the potential risks of these new tools. The guidelines are draft and not formal government policy, but we want to share them more widely in order to help iterate and improve them further….

So what’s in the framework? There is more detail in the fuller document, but it is based around six key principles:

  1. Start with a clear user need and public benefit: this will help you justify the level of data sensitivity and method you use
  2. Use the minimum level of data necessary to fulfill the public benefit: there are many techniques for doing so, such as de-identification, aggregation or querying against data
  3. Build robust data science models: the model is only as good as the data it contains and while machines are less biased than humans they can get it wrong. It’s critical to be clear about the confidence of the model and think through unintended consequences and biases contained within the data
  4. Be alert to public perceptions: put simply, what would a normal person on the street think about the project?
  5. Be as open and accountable as possible: Transparency is the antiseptic for unethical behavior. Aim to be as open as possible (with explanations in plain English), although in certain public protection cases the ability to be transparent will be constrained.
  6. Keep data safe and secure: this is not restricted to data science projects but we know that the public are most concerned about losing control of their data….(More)”