Linux Foundation Debuts Community Data License Agreement


Press Release: “The Linux Foundation, the nonprofit advancing professional open source management for mass collaboration, today announced the Community Data License Agreement(CDLA) family of open data agreements. In an era of expansive and often underused data, the CDLA licenses are an effort to define a licensing framework to support collaborative communities built around curating and sharing “open” data.

Inspired by the collaborative software development models of open source software, the CDLA licenses are designed to enable individuals and organizations of all types to share data as easily as they currently share open source software code. Soundly drafted licensing models can help people form communities to assemble, curate and maintain vast amounts of data, measured in petabytes and exabytes, to bring new value to communities of all types, to build new business opportunities and to power new applications that promise to enhance safety and services.

The growth of big data analytics, machine learning and artificial intelligence (AI) technologies has allowed people to extract unprecedented levels of insight from data. Now the challenge is to assemble the critical mass of data for those tools to analyze. The CDLA licenses are designed to help governments, academic institutions, businesses and other organizations open up and share data, with the goal of creating communities that curate and share data openly.

For instance, if automakers, suppliers and civil infrastructure services can share data, they may be able to improve safety, decrease energy consumption and improve predictive maintenance. Self-driving cars are heavily dependent on AI systems for navigation, and need massive volumes of data to function properly. Once on the road, they can generate nearly a gigabyte of data every second. For the average car, that means two petabytes of sensor, audio, video and other data each year.

Similarly, climate modeling can integrate measurements captured by government agencies with simulation data from other organizations and then use machine learning systems to look for patterns in the information. It’s estimated that a single model can yield a petabyte of data, a volume that challenges standard computer algorithms, but is useful for machine learning systems. This knowledge may help improve agriculture or aid in studying extreme weather patterns.

And if government agencies share aggregated data on building permits, school enrollment figures, sewer and water usage, their citizens benefit from the ability of commercial entities to anticipate their future needs and respond with infrastructure and facilities that arrive in anticipation of citizens’ demands.

“An open data license is essential for the frictionless sharing of the data that powers both critical technologies and societal benefits,” said Jim Zemlin, Executive Director of The Linux Foundation. “The success of open source software provides a powerful example of what can be accomplished when people come together around a resource and advance it for the common good. The CDLA licenses are a key step in that direction and will encourage the continued growth of applications and infrastructure.”…(More)”.

A Brief History of Living Labs: From Scattered Initiatives to Global Movement


Paper by Seppo Leminen, Veli-Pekka Niitamo, and Mika Westerlund presented at the Open Living Labs Day Conference: “This paper analyses the emergence of living labs based on a literature review and interviews with early living labs experts. Our study makes a contribution to the growing literature of living labs by analysing the emergence of living labs from the perspectives of (i) early living lab pioneers, (ii) early living lab activities in Europe and especially Nokia Corporation, (iii) framework programs of the European Union supporting the development of living labs, (iv) emergence of national living lab networks, and (v) emergence of the European Network of Living Labs (ENoLL). Moreover, the paper highlights major events in the emergence of living lab movement and labels three consecutive phases of the global living lab movement as (i) toward a new paradigm, (ii) practical experiences, and (iii) professional living labs….(More)”.

Open Space: The Global Effort for Open Access to Environmental Satellite Data


Book by Mariel Borowitz: “Key to understanding and addressing climate change is continuous and precise monitoring of environmental conditions. Satellites play an important role in collecting climate data, offering comprehensive global coverage that can’t be matched by in situ observation. And yet, as Mariel Borowitz shows in this book, much satellite data is not freely available but restricted; this remains true despite the data-sharing advocacy of international organizations and a global open data movement. Borowitz examines policies governing the sharing of environmental satellite data, offering a model of data-sharing policy development and applying it in case studies from the United States, Europe, and Japan—countries responsible for nearly half of the unclassified government Earth observation satellites.

Borowitz develops a model that centers on the government agency as the primary actor while taking into account the roles of such outside actors as other government officials and non-governmental actors, as well as the economic, security, and normative attributes of the data itself. The case studies include the U.S. National Aeronautics and Space Administration (NASA) and the U.S. National Oceanographic and Atmospheric Association (NOAA), and the United States Geological Survey (USGS); the European Space Agency (ESA) and the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT); and the Japanese Aerospace Exploration Agency (JAXA) and the Japanese Meteorological Agency (JMA). Finally, she considers the policy implications of her findings for the future and provides recommendations on how to increase global sharing of satellite data….(More)”.

The Unexamined Algorithm Is Not Worth Using


Ruben Mancha & Haslina Ali at Stanford Social Innovation Review: “In 1983, at the height of the Cold War, just one man stood between an algorithm and the outbreak of nuclear war. Stanislav Petrov, a colonel of the Soviet Air Defence Forces, was on duty in a secret command center when early-warning alarms went off indicating the launch of intercontinental ballistic missiles from an American base. The systems reported that the alarm was of the highest possible reliability. Petrov’s role was to advise his superiors on the veracity of the alarm that, in turn, would affect their decision to launch a retaliatory nuclear attack. Instead of trusting the algorithm, Petrov went with his gut and reported that the alarm was a malfunction. He turned out to be right.

This historical nugget represents an extreme example of the effect that algorithms have on our lives. The detection algorithm, it turns out, mistook the sun’s reflection for a missile launch. It is a sobering thought that a poorly designed or malfunctioning algorithm could have changed the course of history and resulted in millions of deaths….

We offer five recommendations to guide the ethical development and evaluation of algorithms used in your organization:

  1. Consider ethical outcomes first, speed and efficiency second. Organizations seeking speed and efficiency through algorithmic automation should remember that customer value comes through higher strategic speed, not higher operational speed. When implementing algorithms, organizations should never forget their ultimate goal is creating customer value, and fast yet potentially unethical algorithms defile that objective.
  2. Make ethical guiding principles salient to your organization. Your organization should reflect on the ethical principles guiding it and convey them clearly to employees, business partners, and customers. A corporate social responsibility framework is a good starting point for any organization ready to articulate its ethical principles.
  3. Employ programmers well versed in ethics. The computer engineers responsible for designing and programming algorithms should understand the ethical implications of the products of their work. While some ethical decisions may seem intuitive (such as do not use an algorithm to steal data from a user’s computer), most are not. The study of ethics and the practice of ethical inquiry should be part of every coding project.
  4. Interrogate your algorithms against your organization’s ethical standards. Through careful evaluation of the your algorithms’ behavior and outcomes, your organization can identify those circumstances, real or simulated, in which they do not meet the ethical standards.
  5. Engage your stakeholders. Transparently share with your customers, employees, and business partners details about the processes and outcomes of your algorithms. Stakeholders can help you identify and address ethical gaps….(More).

Data for Development


New Report by the OECD: “The 2017 volume of the  Development Co-operation Report focuses on Data for Development. “Big Data” and “the Internet of Things” are more than buzzwords: the data revolution is transforming the way that economies and societies are functioning across the planet. The Sustainable Development Goals along with the data revolution are opportunities that should not be missed: more and better data can help boost inclusive growth, fight inequalities and combat climate change. These data are also essential to measure and monitor progress against the Sustainable Development Goals.

The value of data in enabling development is uncontested. Yet, there continue to be worrying gaps in basic data about people and the planet and weak capacity in developing countries to produce the data that policy makers need to deliver reforms and policies that achieve real, visible and long-lasting development results. At the same time, investing in building statistical capacity – which represented about 0.30% of ODA in 2015 – is not a priority for most providers of development assistance.

There is a need for stronger political leadership, greater investment and more collective action to bridge the data divide for development. With the unfolding data revolution, developing countries and donors have a unique chance to act now to boost data production and use for the benefit of citizens. This report sets out priority actions and good practices that will help policy makers and providers of development assistance to bridge the global data divide, notably by strengthening statistical systems in developing countries to produce better data for better policies and better lives….(More)”

Fraud Data Analytics Tools and Techniques in Big Data Era


Paper by Sara Makki et al: “Fraudulent activities (e.g., suspicious credit card transaction, financial reporting fraud, and money laundering) are critical concerns to various entities including bank, insurance companies, and public service organizations. Typically, these activities lead to detrimental effects on the victims such as a financial loss. Over the years, fraud analysis techniques underwent a rigorous development. However, lately, the advent of Big data led to vigorous advancement of these techniques since Big Data resulted in extensive opportunities to combat financial frauds. Given that the massive amount of data that investigators need to sift through, massive volumes of data integrated from multiple heterogeneous sources (e.g., social media, blogs) to find fraudulent patterns is emerging as a feasible approach….(More)”.

Crowdsourced Morality Could Determine the Ethics of Artificial Intelligence


Dom Galeon in Futurism: “As artificial intelligence (AI) development progresses, experts have begun considering how best to give an AI system an ethical or moral backbone. A popular idea is to teach AI to behave ethically by learning from decisions made by the average person.

To test this assumption, researchers from MIT created the Moral Machine. Visitors to the website were asked to make choices regarding what an autonomous vehicle should do when faced with rather gruesome scenarios. For example, if a driverless car was being forced toward pedestrians, should it run over three adults to spare two children? Save a pregnant woman at the expense of an elderly man?

The Moral Machine was able to collect a huge swath of this data from random people, so Ariel Procaccia from Carnegie Mellon University’s computer science department decided to put that data to work.

In a new study published online, he and Iyad Rahwan — one of the researchers behind the Moral Machine — taught an AI using the Moral Machine’s dataset. Then, they asked the system to predict how humans would want a self-driving car to react in similar but previously untested scenarios….

This idea of having to choose between two morally problematic outcomes isn’t new. Ethicists even have a name for it: the double-effect. However, having to apply the concept to an artificially intelligent system is something humankind has never had to do before, and numerous experts have shared their opinions on how best to go about it.

OpenAI co-chairman Elon Musk believes that creating an ethical AI is a matter of coming up with clear guidelines or policies to govern development, and governments and institutions are slowly heeding Musk’s call. Germany, for example, crafted the world’s first ethical guidelines for self-driving cars. Meanwhile, Google parent company Alphabet’s AI DeepMind now has an ethics and society unit.

Other experts, including a team of researchers from Duke University, think that the best way to move forward is to create a “general framework” that describes how AI will make ethical decisions….(More)”.

Using Facebook data as a real-time census


Phys.org: “Determining how many people live in Seattle, perhaps of a certain age, perhaps from a specific country, is the sort of question that finds its answer in the census, a massive data dump for places across the country.

But just how fresh is that data? After all, the census is updated once a decade, and the U.S. Census Bureau’s smaller but more detailed American Community Survey, annually. There’s also a delay between when data are collected and when they are published. (The release of data for 2016 started gradually in September 2017.)

Enter Facebook, which, with some caveats, can serve as an even more current source of , especially about migrants. That’s the conclusion of a study led by Emilio Zagheni, associate professor of sociology at the University of Washington, published Oct. 11 in Population and Development Review. The study is believed to be the first to demonstrate how present-day migration statistics can be obtained by compiling the same data that advertisers use to target their audience on Facebook, and by combining that source with information from the Census Bureau.

Migration indicates a variety of political and economic trends and is a major driver of population change, Zagheni said. As researchers further explore the increasing number of databases produced for advertisers, Zagheni argues, social scientists could leverage Facebook, LinkedIn and Twitter more often to glean information on geography, mobility, behavior and employment. And while there are some limits to the data – each platform is a self-selected, self-reporting segment of the population – the number of migrants according to Facebook could supplement the official numbers logged by the U.S. Census Bureau, Zagheni said….(Full Paper).

When Cartography Meets Disaster Relief


Mimi Kirk at CityLab: “Almost three weeks after Hurricane Maria hit Puerto Rico, the island is in a grim state. Fewer than 15 percent of residents have power, and much of the island has no clean drinking water. Delivery of food and other necessities, especially to remote areas, has been hampered by a variety of ills, including a lack of cellular service, washed-out roads, additional rainfall, and what analysts and Puerto Ricans say is a slow and insufficient response from the U.S. government.

Another issue slowing recovery? Maps—or lack of them. While pre-Maria maps of Puerto Rico were fairly complete, their level of detail was nowhere near that of other parts of the United States. Platforms such as Google Maps are more comprehensive on the mainland than on the island, explains Juan Saldarriaga, a research scholar at the Center for Spatial Research at Columbia University. This is because companies like Google often create maps for financial reasons, selling them to advertisers or as navigation devices, so areas that have less economic activity are given less attention.

This lack of detail impedes recovery efforts: Without basic information on the location of buildings, for instance, rescue workers don’t know how many people were living in an area before the hurricane struck—and thus how much aid is needed.

Crowdsourced mapping can help. Saldarriaga recently organized a “mapathon” at Columbia, in which volunteers examined satellite imagery of Puerto Rico and added missing buildings, roads, bridges, and other landmarks in the open-source platform OpenStreetMap. While some universities and other groups are hosting similar events, anyone with an internet connection and computer can participate.

Saldarriaga and his co-organizers collaborated with Humanitarian OpenStreetMap Team (HOT), a nonprofit that works to create crowdsourced maps for aid and development work. Volunteers like Saldarriaga largely drive HOT’s “crisis mapping” projects, the first of which occurred in 2010 after Haiti’s earthquake…(More)”.

Open mapping from the ground up: learning from Map Kibera


Report by Erica Hagen for Making All Voices Count: “In Nairobi in 2009, 13 young residents of the informal settlement of Kibera mapped their community using OpenStreetMap, an online mapping platform. This was the start of Map Kibera, and eight years of ongoing work to date on digital mapping, citizen media and open data. In this paper, Erica Hagen – one of the initiators of Map Kibera – reflects on the trajectory of this work. Through research interviews with Map Kibera staff, participants and clients, and users of the data and maps the project has produced, she digs into what it means for citizens to map their communities, and examines the impact of open local information on members of the community. The paper begins by situating the research and Map Kibera in selected literature on transparency, accountability and mapping. It then presents three case studies of mapping in Kibera – in the education, security and water sectors – discussing evidence about the effects not only on project participants, but also on governmental and non-governmental actors in each of the three sectors. It concludes that open, community-based data collection can lead to greater trust, which is sorely lacking in marginalised places. In large-scale data gathering, it is often unclear to those involved why the data is needed or what will be done with it. But the experience of Map Kibera shows that by starting from the ground up and sharing open data widely, it is possible to achieve strong sector-wide ramifications beyond the scope of the initial project, including increased resources and targeting by government and NGOs. While debates continue over the best way to truly engage citizens in the ‘data revolution’ and tracking the Sustainable Development Goals, the research here shows that engaging people fully in the information value chain can be the missing link between data as a measurement tool, and information having an impact on social development….(More)”.