How Big Data is Helping to Tackle Climate Change


Bernard Marr at DataInformed: “Climate scientists have been gathering a great deal of data for a long time, but analytics technology’s catching up is comparatively recent. Now that cloud, distributed storage, and massive amounts of processing power are affordable for almost everyone, those data sets are being put to use. On top of that, the growing number of Internet of Things devices we are carrying around are adding to the amount of data we are collecting. And the rise of social media means more and more people are reporting environmental data and uploading photos and videos of their environment, which also can be analyzed for clues.

Perhaps one of the most ambitious projects that employ big data to study the environment is Microsoft’s Madingley, which is being developed with the intention of creating a simulation of all life on Earth. The project already provides a working simulation of the global carbon cycle, and it is hoped that, eventually, everything from deforestation to animal migration, pollution, and overfishing will be modeled in a real-time “virtual biosphere.” Just a few years ago, the idea of a simulation of the entire planet’s ecosphere would have seemed like ridiculous, pie-in-the-sky thinking. But today it’s something into which one of the world’s biggest companies is pouring serious money. Microsoft is doing this because it believes that analytical technology has finally caught up with the ability to collect and store data.

Another data giant that is developing tools to facilitate analysis of climate and ecological data is EMC. Working with scientists at Acadia National Park in Maine, the company has developed platforms to pull in crowd-sourced data from citizen science portals such as eBird and iNaturalist. This allows park administrators to monitor the impact of climate change on wildlife populations as well as to plan and implement conservation strategies.

Last year, the United Nations, under its Global Pulse data analytics initiative, launched the Big Data Climate Challenge, a competition aimed to promote innovate data-driven climate change projects. Among the first to receive recognition under the program is Global Forest Watch, which combines satellite imagery, crowd-sourced witness accounts, and public datasets to track deforestation around the world, which is believed to be a leading man-made cause of climate change. The project has been promoted as a way for ethical businesses to ensure that their supply chain is not complicit in deforestation.

Other initiatives are targeted at a more personal level, for example by analyzing transit routes that could be used for individual journeys, using Google Maps, and making recommendations based on carbon emissions for each route.

The idea of “smart cities” is central to the concept of the Internet of Things – the idea that everyday objects and tools are becoming increasingly connected, interactive, and intelligent, and capable of communicating with each other independently of humans. Many of the ideas put forward by smart-city pioneers are grounded in climate awareness, such as reducing carbon dioxide emissions and energy waste across urban areas. Smart metering allows utility companies to increase or restrict the flow of electricity, gas, or water to reduce waste and ensure adequate supply at peak periods. Public transport can be efficiently planned to avoid wasted journeys and provide a reliable service that will encourage citizens to leave their cars at home.

These examples raise an important point: It’s apparent that data – big or small – can tell us if, how, and why climate change is happening. But, of course, this is only really valuable to us if it also can tell us what we can do about it. Some projects, such as Weathersafe, which helps coffee growers adapt to changing weather patterns and soil conditions, are designed to help humans deal with climate change. Others are designed to tackle the problem at the root, by highlighting the factors that cause it in the first place and showing us how we can change our behavior to minimize damage….(More)”

Data protection in a big data society. Ideas for a future regulation


Paper by Alessandro Mantelero and Giuseppe Vaciago at Digital Investigation: “Big data society has changed the traditional forms of data analysis and created a new predictive approach to knowledge and investigation. In this light, it is necessary to consider the impact of this new paradigm on the traditional notion of data protection and its regulation.

Focussing on the individual and communal dimension of data use, encompassing digital investigations, the authors outline the challenges that big data poses for individual information self-determination, reasonable suspicion and collective interests. Therefore, the article suggests some innovative proposals that may update the existing data protection legal framework and contribute to make it respondent to the present algorithmic society….(More)”

The promise and perils of predictive policing based on big data


H. V. Jagadish in the Conversation: “Police departments, like everyone else, would like to be more effective while spending less. Given the tremendous attention to big data in recent years, and the value it has provided in fields ranging from astronomy to medicine, it should be no surprise that police departments are using data analysis to inform deployment of scarce resources. Enter the era of what is called “predictive policing.”

Some form of predictive policing is likely now in force in a city near you.Memphis was an early adopter. Cities from Minneapolis to Miami have embraced predictive policing. Time magazine named predictive policing (with particular reference to the city of Santa Cruz) one of the 50 best inventions of 2011. New York City Police Commissioner William Bratton recently said that predictive policing is “the wave of the future.”

The term “predictive policing” suggests that the police can anticipate a crime and be there to stop it before it happens and/or apprehend the culprits right away. As the Los Angeles Times points out, it depends on “sophisticated computer analysis of information about previous crimes, to predict where and when crimes will occur.”

At a very basic level, it’s easy for anyone to read a crime map and identify neighborhoods with higher crime rates. It’s also easy to recognize that burglars tend to target businesses at night, when they are unoccupied, and to target homes during the day, when residents are away at work. The challenge is to take a combination of dozens of such factors to determine where crimes are more likely to happen and who is more likely to commit them. Predictive policing algorithms are getting increasingly good at such analysis. Indeed, such was the premise of the movie Minority Report, in which the police can arrest and convict murderers before they commit their crime.

Predicting a crime with certainty is something that science fiction can have a field day with. But as a data scientist, I can assure you that in reality we can come nowhere close to certainty, even with advanced technology. To begin with, predictions can be only as good as the input data, and quite often these input data have errors.

But even with perfect, error-free input data and unbiased processing, ultimately what the algorithms are determining are correlations. Even if we have perfect knowledge of your troubled childhood, your socializing with gang members, your lack of steady employment, your wacko posts on social media and your recent gun purchases, all that the best algorithm can do is to say it is likely, but not certain, that you will commit a violent crime. After all, to believe such predictions as guaranteed is to deny free will….

What data can do is give us probabilities, rather than certainty. Good data coupled with good analysis can give us very good estimates of probability. If you sum probabilities over many instances, you can usually get a robust estimate of the total.

For example, data analysis can provide a probability that a particular house will be broken into on a particular day based on historical records for similar houses in that neighborhood on similar days. An insurance company may add this up over all days in a year to decide how much to charge for insuring that house….(More)”

Questioning Smart Urbanism: Is Data-Driven Governance a Panacea?


 at the Chicago Policy Review: “In the era of data explosion, urban planners are increasingly relying on real-time, streaming data generated by “smart” devices to assist with city management. “Smart cities,” referring to cities that implement pervasive and ubiquitous computing in urban planning, are widely discussed in academia, business, and government. These cities are characterized not only by their use of technology but also by their innovation-driven economies and collaborative, data-driven city governance. Smart urbanism can seem like an effective strategy to create more efficient, sustainable, productive, and open cities. However, there are emerging concerns about the potential risks in the long-term development of smart cities, including political neutrality of big data, technocratic governance, technological lock-ins, data and network security, and privacy risks.

In a study entitled, “The Real-Time City? Big Data and Smart Urbanism,” Rob Kitchin provides a critical reflection on the potential negative effects of data-driven city governance on social development—a topic he claims deserves greater governmental, academic, and social attention.

In contrast to traditional datasets that rely on samples or are aggregated to a coarse scale, “big data” is huge in volume, high in velocity, and diverse in variety. Since the early 2000s, there has been explosive growth in data volume due to the rapid development and implementation of technology infrastructure, including networks, information management, and data storage. Big data can be generated from directed, automated, and volunteered sources. Automated data generation is of particular interest to urban planners. One example Kitchin cites is urban sensor networks, which allow city governments to monitor the movements and statuses of individuals, materials, and structures throughout the urban environment by analyzing real-time data.

With the huge amount of streaming data collected by smart infrastructure, many city governments use real-time analysis to manage different aspects of city operations. There has been a recent trend in centralizing data streams into a single hub, integrating all kinds of surveillance and analytics. These one-stop data centers make it easier for analysts to cross-reference data, spot patterns, identify problems, and allocate resources. The data are also often accessible by field workers via operations platforms. In London and some other cities, real-time data are visualized on “city dashboards” and communicated to citizens, providing convenient access to city information.

However, the real-time city is not a flawless solution to all the problems faced by city managers. The primary concern is the politics of big, urban data. Although raw data are often perceived as neutral and objective, no data are free of bias; the collection of data is a subjective process that can be shaped by various confounding factors. The presentation of data can also be manipulated to answer a specific question or enact a particular political vision….(More)”

Build digital democracy


Dirk Helbing & Evangelos Pournaras in Nature: “Fridges, coffee machines, toothbrushes, phones and smart devices are all now equipped with communicating sensors. In ten years, 150 billion ‘things’ will connect with each other and with billions of people. The ‘Internet of Things’ will generate data volumes that double every 12 hours rather than every 12 months, as is the case now.

Blinded by information, we need ‘digital sunglasses’. Whoever builds the filters to monetize this information determines what we see — Google and Facebook, for example. Many choices that people consider their own are already determined by algorithms. Such remote control weakens responsible, self-determined decision-making and thus society too.

The European Court of Justice’s ruling on 6 October that countries and companies must comply with European data-protection laws when transferring data outside the European Union demonstrates that a new digital paradigm is overdue. To ensure that no government, company or person with sole control of digital filters can manipulate our decisions, we need information systems that are transparent, trustworthy and user-controlled. Each of us must be able to choose, modify and build our own tools for winnowing information.

With this in mind, our research team at the Swiss Federal Institute of Technology in Zurich (ETH Zurich), alongside international partners, has started to create a distributed, privacy-preserving ‘digital nervous system’ called Nervousnet. Nervousnet uses the sensor networks that make up the Internet of Things, including those in smartphones, to measure the world around us and to build a collective ‘data commons’. The many challenges ahead will be best solved using an open, participatory platform, an approach that has proved successful for projects such as Wikipedia and the open-source operating system Linux.

A wise king?

The science of human decision-making is far from understood. Yet our habits, routines and social interactions are surprisingly predictable. Our behaviour is increasingly steered by personalized advertisements and search results, recommendation systems and emotion-tracking technologies. Thousands of pieces of metadata have been collected about every one of us (seego.nature.com/stoqsu). Companies and governments can increasingly manipulate our decisions, behaviour and feelings1.

Many policymakers believe that personal data may be used to ‘nudge’ people to make healthier and environmentally friendly decisions. Yet the same technology may also promote nationalism, fuel hate against minorities or skew election outcomes2 if ethical scrutiny, transparency and democratic control are lacking — as they are in most private companies and institutions that use ‘big data’. The combination of nudging with big data about everyone’s behaviour, feelings and interests (‘big nudging’, if you will) could eventually create close to totalitarian power.

Countries have long experimented with using data to run their societies. In the 1970s, Chilean President Salvador Allende created computer networks to optimize industrial productivity3. Today, Singapore considers itself a data-driven ‘social laboratory’4 and other countries seem keen to copy this model.

The Chinese government has begun rating the behaviour of its citizens5. Loans, jobs and travel visas will depend on an individual’s ‘citizen score’, their web history and political opinion. Meanwhile, Baidu — the Chinese equivalent of Google — is joining forces with the military for the ‘China brain project’, using ‘deep learning’ artificial-intelligence algorithms to predict the behaviour of people on the basis of their Internet activity6.

The intentions may be good: it is hoped that big data can improve governance by overcoming irrationality and partisan interests. But the situation also evokes the warning of the eighteenth-century philosopher Immanuel Kant, that the “sovereign acting … to make the people happy according to his notions … becomes a despot”. It is for this reason that the US Declaration of Independence emphasizes the pursuit of happiness of individuals.

Ruling like a ‘benevolent dictator’ or ‘wise king’ cannot work because there is no way to determine a single metric or goal that a leader should maximize. Should it be gross domestic product per capita or sustainability, power or peace, average life span or happiness, or something else?

Better is pluralism. It hedges risks, promotes innovation, collective intelligence and well-being. Approaching complex problems from varied perspectives also helps people to cope with rare and extreme events that are costly for society — such as natural disasters, blackouts or financial meltdowns.

Centralized, top-down control of data has various flaws. First, it will inevitably become corrupted or hacked by extremists or criminals. Second, owing to limitations in data-transmission rates and processing power, top-down solutions often fail to address local needs. Third, manipulating the search for information and intervening in individual choices undermines ‘collective intelligence’7. Fourth, personalized information creates ‘filter bubbles’8. People are exposed less to other opinions, which can increase polarization and conflict9.

Fifth, reducing pluralism is as bad as losing biodiversity, because our economies and societies are like ecosystems with millions of interdependencies. Historically, a reduction in diversity has often led to political instability, collapse or war. Finally, by altering the cultural cues that guide peoples’ decisions, everyday decision-making is disrupted, which undermines rather than bolsters social stability and order.

Big data should be used to solve the world’s problems, not for illegitimate manipulation. But the assumption that ‘more data equals more knowledge, power and success’ does not hold. Although we have never had so much information, we face ever more global threats, including climate change, unstable peace and socio-economic fragility, and political satisfaction is low worldwide. About 50% of today’s jobs will be lost in the next two decades as computers and robots take over tasks. But will we see the macroeconomic benefits that would justify such large-scale ‘creative destruction’? And how can we reinvent half of our economy?

The digital revolution will mainly benefit countries that achieve a ‘win–win–win’ situation for business, politics and citizens alike10. To mobilize the ideas, skills and resources of all, we must build information systems capable of bringing diverse knowledge and ideas together. Online deliberation platforms and reconfigurable networks of smart human minds and artificially intelligent systems can now be used to produce collective intelligence that can cope with the diverse and complex challenges surrounding us….(More)” See Nervousnet project

The Transformation of Human Rights Fact-Finding


Book edited by Philip Alston and Sarah Knuckey: “Fact-finding is at the heart of human rights advocacy, and is often at the center of international controversies about alleged government abuses. In recent years, human rights fact-finding has greatly proliferated and become more sophisticated and complex, while also being subjected to stronger scrutiny from governments. Nevertheless, despite the prominence of fact-finding, it remains strikingly under-studied and under-theorized. Too little has been done to bring forth the assumptions, methodologies, and techniques of this rapidly developing field, or to open human rights fact-finding to critical and constructive scrutiny.

The Transformation of Human Rights Fact-Finding offers a multidisciplinary approach to the study of fact-finding with rigorous and critical analysis of the field of practice, while providing a range of accounts of what actually happens. It deepens the study and practice of human rights investigations, and fosters fact-finding as a discretely studied topic, while mapping crucial transformations in the field. The contributions to this book are the result of a major international conference organized by New York University Law School’s Center for Human Rights and Global Justice. Engaging the expertise and experience of the editors and contributing authors, it offers a broad approach encompassing contemporary issues and analysis across the human rights spectrum in law, international relations, and critical theory. This book addresses the major areas of human rights fact-finding such as victim and witness issues; fact-finding for advocacy, enforcement, and litigation; the role of interdisciplinary expertise and methodologies; crowd sourcing, social media, and big data; and international guidelines for fact-finding….(More)”

Privacy in a Digital, Networked World: Technologies, Implications and Solutions


Book edited by Zeadally, Sherali and Badra, Mohamad: “This comprehensive textbook/reference presents a focused review of the state of the art in privacy research, encompassing a range of diverse topics. The first book of its kind designed specifically to cater to courses on privacy, this authoritative volume provides technical, legal, and ethical perspectives on privacy issues from a global selection of renowned experts. Features: examines privacy issues relating to databases, P2P networks, big data technologies, social networks, and digital information networks; describes the challenges of addressing privacy concerns in various areas; reviews topics of privacy in electronic health systems, smart grid technology, vehicular ad-hoc networks, mobile devices, location-based systems, and crowdsourcing platforms; investigates approaches for protecting privacy in cloud applications; discusses the regulation of personal information disclosure and the privacy of individuals; presents the tools and the evidence to better understand consumers’ privacy behaviors….(More)”

New flu tracker uses Google search data better than Google


 at ArsTechnica: “With big data comes big noise. Google learned this lesson the hard way with its now kaput Google Flu Trends. The online tracker, which used Internet search data to predict real-life flu outbreaks, emerged amid fanfare in 2008. Then it met a quiet death this August after repeatedly coughing up bad estimates.

But big Internet data isn’t out of the disease tracking scene yet.

With hubris firmly in check, a team of Harvard researchers have come up with a way to tame the unruly data, combine it with other data sets, and continually calibrate it to track flu outbreaks with less error. Their new model, published Monday in the Proceedings of the National Academy of Sciences, out-performs Google Flu Trends and other models with at least double the accuracy. If the model holds up in coming flu seasons, it could reinstate some optimism in using big data to monitor disease and herald a wave of more accurate second-generation models.

Big data has a lot of potential, Samuel Kou, a statistics professor at Harvard University and coauthor on the new study, told Ars. It’s just a question of using the right analytics, he said.

Kou and his colleagues built on Google’s flu tracking model for their new version, called ARGO (AutoRegression with GOogle search data). Google Flu Trends basically relied on trends in Internet search terms, such as headache and chills, to estimate the number of flu cases. Those search terms were correlated with flu outbreak data collected by the Centers for Disease Control and Prevention. The CDC’s data relies on clinical reports from around the country. But compiling and analyzing that data can be slow, leading to a lag time of one to three weeks. The Google data, on the other hand, offered near real-time tracking for health experts to manage and prepare for outbreaks.

At first Google’s tracker appeared to be pretty good, matching CDC data’s late-breaking data somewhat closely. But, two notable stumbles led to its ultimate downfall: an underestimate of the 2009 H1N1 swine flu outbreak and an alarming overestimate (almost double real numbers) of the 2012-2013 flu season’s cases…..For ARGO, he and colleagues took the trend data and then designed a model that could self-correct for changes in how people search. The model has a two-year sliding window in which it re-calibrates current search term trends with the CDC’s historical flu data (the gold standard for flu data). They also made sure to exclude winter search terms, such as March Madness and the Oscars, so they didn’t get accidentally correlated with seasonal flu trends. Last, they incorporated data on the historical seasonality of flu.

The result was a model that significantly out-competed the Google Flu Trends estimates for the period between March 29, 2009 to July 11, 2015. ARGO also beat out other models, including one based on current and historical CDC data….(More)”

See also Proceedings of the National Academy of Sciences, 2015. DOI: 10.1073/pnas.1515373112

How smartphones are solving one of China’s biggest mysteries


Ana Swanson at the Washington Post: “For decades, China has been engaged in a building boom of a scale that is hard to wrap your mind around. In the last three decades, 260 million people have moved from the countryside to Chinese cities — equivalent to around 80 percent of the population of the U.S. To make room for all of those people, the size of China’s built-up urban areas nearly quintupled between 1984 and 2010.

Much of that development has benefited people’s lives, but some has not. In a breathless rush to boost growth and development, some urban areas have built vast, unused real estate projects — China’s infamous “ghost cities.” These eerie, shining developments are complete except for one thing: people to live in them.

China’s ghost cities have sparked a lot of debate over the last few years. Some argue that the developments are evidence of the waste in top-down planning, or the result of too much cheap funding for businesses. Some blame the lack of other good places for average people to invest their money, or the desire of local officials to make a quick buck — land sales generate a lot of revenue for China’s local governments.

Others say the idea of ghost cities has been overblown. They espouse a “build it and they will come” philosophy, pointing out that, with time, some ghost cities fill up and turn into vibrant communities.

It’s been hard to evaluate these claims, since most of the research on ghost cities has been anecdotal. Even the most rigorous research methods leave a lot to be desired — for example, investment research firms sending poor junior employees out to remote locations to count how many lights are turned on in buildings at night.

Now new research from Baidu, one of China’s biggest technology companies, provides one of the first systematic looks at Chinese ghost cities. Researchers from Baidu’s Big Data Lab and Peking University in Beijing used the kind of location data gathered by mobile phones and GPS receivers to track how people moved in and out suspected ghost cities, in real time and on a national scale, over a period of six months. You can see the interactive project here.

Google has been blocked in China for years, and Baidu dominates the market in terms of search, mobile maps and other offerings. That gave the researchers a huge data base to work with —  770 million users, a hefty chunk of China’s 1.36 billion people.

To identify potential ghost cities, the researchers created an algorithm that identifies urban areas with a relatively spare population. They define a ghost city as an urban region with a population of fewer than 5,000 people per square kilometer – about half the density recommended by the Chinese Ministry of Housing and Urban-Rural Development….(More)”

Mobile data: Made to measure


Neil Savage in Nature: “For decades, doctors around the world have been using a simple test to measure the cardiovascular health of patients. They ask them to walk on a hard, flat surface and see how much distance they cover in six minutes. This test has been used to predict the survival rates of lung transplant candidates, to measure the progression of muscular dystrophy, and to assess overall cardiovascular fitness.

The walk test has been studied in many trials, but even the biggest rarely top a thousand participants. Yet when Euan Ashley launched a cardiovascular study in March 2015, he collected test results from 6,000 people in the first two weeks. “That’s a remarkable number,” says Ashley, a geneticist who heads Stanford University’s Center for Inherited Cardiovascular Disease. “We’re used to dealing with a few hundred patients, if we’re lucky.”

Numbers on that scale, he hopes, will tell him a lot more about the relationship between physical activity and heart health. The reason they can be achieved is that millions of people now have smartphones and fitness trackers with sensors that can record all sorts of physical activity. Health researchers are studying such devices to figure out what sort of data they can collect, how reliable those data are, and what they might learn when they analyse measurements of all sorts of day-to-day activities from many tens of thousands of people and apply big-data algorithms to the readings.

By July, more than 40,000 people in the United States had signed up to participate in Ashley’s study, which uses an iPhone application called MyHeart Counts. He expects the numbers to surge as the app becomes more widely available around the world. The study — designed by scientists, approved by institutional review boards, and requiring informed consent — asks participants to answer questions about their health and risk factors, and to use their phone’s motion sensors to collect data about their activities for seven days. They also do a six-minute walk test, and the phone measures the distance they cover. If their own doctors have ordered blood tests, users can enter information such as cholesterol or glucose measurements. Every three months, the app checks back to update their data.

Physicians know that physical activity is a strong predictor of long-term heart health, Ashley says. But it is less clear what kind of activity is best, or whether different groups of people do better with different types of exercise. MyHeart Counts may open a window on such questions. “We can start to look at subgroups and find differences,” he says.

“You can take pretty noisy data, but if you have enough of it, you can find a signal.”

It is the volume of the data that makes such studies possible. In traditional studies, there may not be enough data to find statistically significant results for such subgroups. And rare events may not occur in the smaller samples, or may produce a signal so weak that it is lost in statistical noise. Big data can overcome those problems, and if the data set is big enough, small errors can be smoothed out. “You can take pretty noisy data, but if you have enough of it, you can find a signal,” Ashley says….(More)”.