Small Data for Big Impact


Liz Luckett at the Stanford Social Innovation Review: “As an investor in data-driven companies, I’ve been thinking a lot about my grandfather—a baker, a small business owner, and, I now realize, a pioneering data scientist. Without much more than pencil, paper, and extraordinarily deep knowledge of his customers in Washington Heights, Manhattan, he bought, sold, and managed inventory while also managing risk. His community was poor, but his business prospered. This was not because of what we celebrate today as the power and predictive promise of big data, but rather because of what I call small data: nuanced market insights that come through regular and trusted interactions.

Big data takes into account volumes of information from largely electronic sources—such as credit cards, pay stubs, test scores—and segments people into groups. As a result, people participating in the formalized economy benefit from big data. But people who are paid in cash and have no recognized accolades, such as higher education, are left out. Small data captures those insights to address this market failure. My grandfather, for example, had critical customer information he carefully gathered over the years: who could pay now, who needed a few days more, and which tabs to close. If he had access to a big data algorithm, it likely would have told him all his clients were unlikely to repay him, based on the fact that they were low income (vs. high income) and low education level (vs. college degree). Today, I worry that in our enthusiasm for big data and aggregated predictions, we often lose the critical insights we can gain from small data, because we don’t collect it. In the process, we are missing vital opportunities to both make money and create economic empowerment.

We won’t solve this problem of big data by returning to my grandfather’s shop floor. What we need is more and better data—a small data movement to supply vital missing links in marketplaces and supply chains the world over. What are the proxies that allow large companies to discern whom among the low income are good customers in the absence of a shopkeeper? At The Social Entrepreneurs’ Fund (TSEF), we are profitably investing in a new breed of data company: enterprises that are intentionally and responsibly serving low-income communities, and generating new and unique insights about the behavior of individuals in the process. The value of the small data they collect is becoming increasingly useful to other partners, including corporations who are willing to pay for it. It is a kind of dual market opportunity that for the first time makes it economically advantageous for these companies to reach the poor. We are betting on small data to transform opportunities and quality of life for the underserved, tap into markets that were once seen as too risky or too costly to reach, and earn significant returns for investors….(More)”.

World’s biggest city database shines light on our increasingly urbanised planet


EU Joint Research Centers: “The JRC has launched a new tool with data on all 10,000 urban centres scattered across the globe. It is the largest and most comprehensive database on cities ever published.

With data derived from the JRC’s Global Human Settlement Layer (GHSL), researchers have discovered that the world has become even more urbanised than previously thought.

Populations in urban areas doubled in Africa and grew by 1.1 billion in Asia between 1990 and 2015.

Globally, more than 400 cities have a population between 1 and 5 million. More than 40 cities have 5 to 10 million people, and there are 32 ‘megacities’ with above 10 million inhabitants.

There are some promising signs for the environment: Cities became 25% greener between 2000 and 2015. And although air pollution in urban centres was increasing from 1990, between 2000 and 2015 the trend was reversed.

With every high density area of at least 50,000 inhabitants covered, the city centres database shows growth in population and built-up areas over the past 40 years.  Environmental factors tracked include:

  • ‘Greenness’: the estimated amount of healthy vegetation in the city centre
  • Soil sealing: the covering of the soil surface with materials like concrete and stone, as a result of new buildings, roads and other public and private spaces
  • Air pollution: the level of polluting particles such as PM2.5 in the air
  • Vicinity to protected areas: the percentage of natural protected space within 30 km distance from the city centre’s border
  • Disaster risk-related exposure of population and buildings in low lying areas and on steep slopes.

The data is free to access and open to everyone. It applies big data analytics and a global, people-based definition of cities, providing support to monitor global urbanisation and the 2030 Sustainable Development Agenda.

The information gained from the GHSL is used to map out population density and settlement maps. Satellite, census and local geographic information are used to create the maps….(More)”.

Republics of Makers: From the Digital Commons to a Flat Marginal Cost Society


Mario Carpo at eFlux: “…as the costs of electronic computation have been steadily decreasing for the last forty years at least, many have recently come to the conclusion that, for most practical purposes, the cost of computation is asymptotically tending to zero. Indeed, the current notion of Big Data is based on the assumption that an almost unlimited amount of digital data will soon be available at almost no cost, and similar premises have further fueled the expectation of a forthcoming “zero marginal costs society”: a society where, except for some upfront and overhead costs (the costs of building and maintaining some facilities), many goods and services will be free for all. And indeed, against all odds, an almost zero marginal cost society is already a reality in the case of many services based on the production and delivery of electricity: from the recording, transmission, and processing of electrically encoded digital information (bits) to the production and consumption of electrical power itself. Using renewable energies (solar, wind, hydro) the generation of electrical power is free, except for the cost of building and maintaining installations and infrastructure. And given the recent progress in the micro-management of intelligent electrical grids, it is easy to imagine that in the near future the cost of servicing a network of very small, local hydro-electric generators, for example, could easily be devolved to local communities of prosumers who would take care of those installations as their tend to their living environment, on an almost voluntary, communal basis.4 This was already often the case during the early stages of electrification, before the rise of AC (alternate current, which, unlike DC, or direct current, could be carried over long distances): AC became the industry’s choice only after Galileo Ferraris’s and Nikola Tesla’s developments in AC technologies in the 1880s.

Likewise, at the micro-scale of the electronic production and processing of bits and bytes of information, the Open Source movement and the phenomenal surge of some crowdsourced digital media (including some so-called social media) in the first decade of the twenty-first century has already proven that a collaborative, zero cost business model can effectively compete with products priced for profit on a traditional marketplace. As the success of Wikipedia, Linux, or Firefox proves, many are happy to volunteer their time and labor for free when all can profit from the collective work of an entire community without having to pay for it. This is now technically possible precisely because the fixed costs of building, maintaining, and delivering these service are very small; hence, from the point of view of the end-user, negligible.

Yet, regardless of the fixed costs of the infrastructure, content—even user-generated content—has costs, albeit for the time being these are mostly hidden, voluntarily born, or inadvertently absorbed by the prosumers themselves. For example, the wisdom of Wikipedia is not really a wisdom of crowds: most Wikipedia entries are de facto curated by fairly traditional scholar communities, and these communities can contribute their expertise for free only because their work has already been paid for by others—often by universities. In this sense, Wikipedia is only piggybacking on someone else’s research investments (but multiplying their outreach, which is one reason for its success). Ditto for most Open Source software, as training a software engineer, coder, or hacker, takes time and money—an investment for future returns that in many countries around the world is still born, at least in part, by public institutions….(More)”.

Big Data, Thick Mediation, and Representational Opacity


Rafael Alvarado and Paul Humphreys in the New Literary History: “In 2008, the phrase “big data” shifted in meaning. It turned from referring to a problem and an opportunity for organizations with very large data sets to being the talisman for an emerging economic and cultural order that is both celebrated and feared for its deep and pervasive effects on the human condition. Economically, the phrase now denotes a data-mediated form of commerce exemplified by Google. Culturally, the phrase stands for a new form of knowledge and knowledge production. In this essay, we explore the connection between these two implicit meanings, considered as dimensions of a real social and scientific transformation with observable properties. We develop three central concepts: the datasphere, thick mediation, and representational opacity. These concepts provide a theoretical framework for making sense of how the economic and cultural dimensions interact to produce a set of effects, problems, and opportunities, not all of which have been addressed by big data’s critics and advocates….(More)”.

Earth Observation Open Science and Innovation


Open Access book edited by Pierre-Philippe Mathieu and Christoph Aubrecht: “Over  the  past  decades,  rapid developments in digital and sensing technologies, such  as the Cloud, Web and Internet of Things, have dramatically changed the way we live and work. The digital transformation is revolutionizing our ability to monitor our planet and transforming the  way we access, process and exploit Earth Observation data from satellites.

This book reviews these megatrends and their implications for the Earth Observation community as well as the wider data economy. It provides insight into new paradigms of Open Science and Innovation applied to space data, which are characterized by openness, access to large volume of complex data, wide availability of new community tools, new techniques for big data analytics such as Artificial Intelligence, unprecedented level of computing power, and new types of collaboration among researchers, innovators, entrepreneurs and citizen scientists. In addition, this book aims to provide readers with some reflections on the future of Earth Observation, highlighting through a series of use cases not just the new opportunities created by the New Space revolution, but also the new challenges that must be addressed in order to make the most of the large volume of complex and diverse data delivered by the new generation of satellites….(More)”.

Can scientists learn to make ‘nature forecasts’ just as we forecast the weather?


 at The Conversation: “We all take weather forecasts for granted, so why isn’t there a ‘nature forecast’ to answer these questions? Enter the new scientific field of ecological forecasting. Ecologists have long sought to understand the natural world, but only recently have they begun to think systematically about forecasting.

Much of the current research in ecological forecasting is focused on long-term projections. It considers questions that play out over decades to centuries, such as how species may shift their ranges in response to climate change, or whether forests will continue to take up carbon dioxide from the atmosphere.

However, in a new article that I co-authored with 18 other scientists from universities, private research institutes and the U.S. Geological Survey, we argue that focusing on near-term forecasts over spans of days, seasons and years will help us better understand, manage and conserve ecosystems. Developing this ability would be a win-win for both science and society….

Big data is driving many of the advances in ecological forecasting. Today ecologists have orders of magnitude more data compared to just a decade ago, thanks to sustained public funding for basic science and environmental monitoring. This investment has given us better sensors, satellites and organizations such as the National Ecological Observatory Network, which collects high-quality data from 81 field sites across the United States and Puerto Rico. At the same time, cultural shifts across funding agencies, research networks and journals have made that data more open and available.

Digital technologies make it possible to access this information more quickly than in the past. Field notebooks have given way to tablets and cell networks that can stream new data into supercomputers in real time. Computing advances allow us to build better models and use more sophisticated statistical methods to produce forecasts….(More)”.

Rights-Based and Tech-Driven: Open Data, Freedom of Information, and the Future of Government Transparency


Beth Noveck at the Yale Human Rights and Development Journal: “Open data policy mandates that government proactively publish its data online for the public to reuse. It is a radically different approach to transparency than traditional right-to-know strategies as embodied in Freedom of Information Act (FOIA) legislation in that it involves ex ante rather than ex post disclosure of whole datasets. Although both open data and FOIA deal with information sharing, the normative essence of open data is participation rather than litigation. By fostering public engagement, open data shifts the relationship between state and citizen from a monitorial to a collaborative one, centered around using information to solve problems together. This Essay explores the theory and practice of open data in comparison to FOIA and highlights its uses as a tool for advancing human rights, saving lives, and strengthening democracy. Although open data undoubtedly builds upon the fifty-year legal tradition of the right to know about the workings of one’s government, open data does more than advance government accountability. Rather, it is a distinctly twenty-first century governing practice borne out of the potential of big data to help solve society’s biggest problems. Thus, this Essay charts a thoughtful path toward a twenty-first century transparency regime that takes advantage of and blends the strengths of open data’s collaborative and innovation-centric approach and the adversarial and monitorial tactics of freedom of information regimes….(More)”.

Algorithms show potential in measuring diagnostic errors using big data


Greg Slabodkin at Information Management: “While the problem of diagnostic errors is widespread in medicine, with an estimated 12 million Americans affected annually, a new approach to quantifying and monitoring these errors has the potential to prevent serious patient injuries, including disability or death.

“The single biggest impediment to making progress is the lack of operational measures of diagnostic errors,” says David Newman-Toker, MD, director of the Johns Hopkins Armstrong Institute Center for Diagnostic Excellence. “It’s very difficult to measure because we haven’t had the tools to look for it in a systematic way. And most of the methods that look for diagnostics errors involve training people to do labor-intensive chart reviews.”

However, a new method—called the Symptom-Disease Pair Analysis of Diagnostic Error (SPADE)—uncovers misdiagnosis-related harms using specific algorithms and big data. The automated approach could replace labor-intensive reviews of medical records by hospital staff, which researchers contend are limited by poor clinical documentation, low reliability and inherent bias.

According to Newman-Toker, SPADE utilizes statistical analyses to identify critical patterns that measure the rate of diagnostic error by analyzing large, existing clinical and claims datasets containing hundreds of thousands of patient visits. Specifically, algorithms are leveraged to look for common symptoms prompting a physician visit and then pairing them with one or more diseases that could be misdiagnosed in those clinical contexts….(More)”.

After Big Data: The Coming Age of “Big Indicators”


Andrew Zolli at the Stanford Social Innovation Review: “Consider, for a moment, some of the most pernicious challenges facing humanity today: the increasing prevalence of natural disasters; the systemic overfishing of the world’s oceans; the clear-cutting of primeval forests; the maddening persistence of poverty; and above all, the accelerating effects of global climate change.

Each item in this dark litany inflicts suffering on the world in its own, awful way. Yet as a group, they share some common characteristics. Each problem is messy, with lots of moving parts. Each is riddled with perverse incentives, which can lead local actors to behave in a way that is not in the common interest. Each is opaque, with dynamics that are only partially understood, even by experts; each can, as a result, often be made worse by seemingly rational and well-intentioned interventions. When things do go wrong, each has consequences that diverge dramatically from our day-to-day experiences, making their full effects hard to imagine, predict, and rehearse. And each is global in scale, raising questions about who has the legal obligation to act—and creating incentives for leaders to disavow responsibility (and sometimes even question the legitimacy of the problem itself).

With dynamics like these, it’s little wonder systems theorists label these kinds of problems “wicked” or even “super wicked.” It’s even less surprising that these challenges remain, by and large, externalities to the global system—inadequately measured, perennially underinvested in, and poorly accounted for—until their consequences spill disastrously and expensively into view.

For real progress to occur, we’ve got to move these externalities into the global system, so that we can fully assess their costs, and so that we can sufficiently incentivize and reward stakeholders for addressing them and penalize them if they don’t. And that’s going to require a revolution in measurement, reporting, and financial instrumentation—the mechanisms by which we connect global problems with the resources required to address them at scale.

Thankfully, just such a revolution is under way.

It’s a complex story with several moving parts, but it begins with important new technical developments in three critical areas of technology: remote sensing and big data, artificial intelligence, and cloud computing.

Remote sensing and big data allow us to collect unprecedented streams of observations about our planet and our impacts upon it, and dramatic advances in AI enable us to extract the deeper meaning and patterns contained in those vast data streams. The rise of the cloud empowers anyone with an Internet connection to access and interact with these insights, at a fraction of the traditional cost.

In the years to come, these technologies will shift much of the current conversation focused on big data to one focused on “big indicators”—highly detailed, continuously produced, global indicators that track change in the health of the Earth’s most important systems, in real time. Big indicators will form an important mechanism for guiding human action, allow us to track the impact of our collective actions and interventions as never before, enable better and more timely decisions, transform reporting, and empower new kinds of policy and financing instruments. In short, they will reshape how we tackle a number of global problems, and everyone—especially nonprofits, NGOs, and actors within the social and environmental sectors—will play a role in shaping and using them….(More)”.

Urban Big Data: City Management and Real Estate Markets


Report by Richard Barkham, Sheharyar Bokhari and Albert Saiz: “In this report, we discuss recent trends in the application of urban big data and their impact on real estate markets. We expect such technologies to improve quality of life and the productivity of cities over the long run.

We forecast that smart city technologies will reinforce the primacy of the most successful global metropolises at least for a decade or more. A few select metropolises in emerging countries may also leverage these technologies to leapfrog on the provision of local public services.

In the long run, all cities throughout the urban system will end up adopting successful and cost-effective smart city initiatives. Nevertheless, smaller-scale interventions are likely to crop up everywhere, even in the short run. Such targeted programs are more likely to improve conditions in blighted or relatively deprived neighborhoods, which could generate gentrification and higher valuations there. It is unclear whether urban information systems will have a centralizing or suburbanizing impact. They are likely to make denser urban centers more attractive, but they are also bound to make suburban or exurban locations more accessible…(More)”.