Crowdsourced social media data for disaster management: Lessons from the project

R.I.Ogie, R.J.Clarke, H.Forehead and P.Perez in Computers, Environment and Urban Systems: “The application of crowdsourced social media data in flood mapping and other disaster management initiatives is a burgeoning field of research, but not one that is without challenges. In identifying these challenges and in making appropriate recommendations for future direction, it is vital that we learn from the past by taking a constructively critical appraisal of highly-praised projects in this field, which through real-world implementations have pioneered the use of crowdsourced geospatial data in modern disaster management. These real-world applications represent natural experiments, each with myriads of lessons that cannot be easily gained from computer-confined simulations.

This paper reports on lessons learnt from a 3-year implementation of a highly-praised project- the project. The lessons presented derive from the key success factors and the challenges associated with the project. To contribute in addressing some of the identified challenges, desirable characteristics of future social media-based disaster mapping systems are discussed. It is envisaged that the lessons and insights shared in this study will prove invaluable within the broader context of designing socio-technical systems for crowdsourcing and harnessing disaster-related information….(More)”.

To turn the open data revolution from idea to reality, we need more evidence

Stefaan Verhulst at apolitical: “The idea that we are living in a data age — one characterised by unprecedented amounts of information with unprecedented potential — has  become mainstream. We regularly read “data is the new oil,” or “data is the most valuable commodity in the global economy.”

Doubtlessly, there is truth in these statements. But a major, often unacknowledged problem is how much data remains inaccessible, hidden in siloes and behind walls.

For close to a decade, the technology and public interest community has pushed the idea of open data. At its core, open data represents a new paradigm of information and information access.

Rooted in notions of an information commons — developed by scholars like Nobel Prize winner Elinor Ostrom — and borrowing from the language of open source, open data begins from the premise that data collected from the public, often using public funds or publicly funded infrastructure, should also belong to the public — or at least, be made broadly accessible to those pursuing public-interest goals.

The open data movement has reached significant milestones in its short history. An ever-increasing number of governments across both developed and developing economies have released large datasets for the public’s benefit….

Similarly, a growing number of private companies have “Data Collaboratives” leveraging their data — with various degrees of limitations — to serve the public interest.

Despite such initiatives, many open data projects (and data collaboratives) remain fledgling. The field has trouble scaling projects beyond initial pilots. In addition, many potential stakeholders — private sector and government “owners” of data, as well as public beneficiaries — remain sceptical of open data’s value. Such limitations need to be overcome if open data and its benefits are to spread. We need hard evidence of its impact.

Ironically, the field is held back by an absence of good data on open data — that is, a lack of reliable empirical evidence that could guide new initiatives.

At the GovLab, a do-tank at New York University, we study the impact of open data. One of our overarching conclusions is that we need a far more solid evidence base to move open data from being a good idea to reality.

What do we know? Several initiatives undertaken at the GovLab offer insight. Our ODImpactwebsite now includes more than 35 detailed case studies of open government data projects. These examples provide powerful evidence not only that open data can work but also about howit works….

We have also launched an Open Data Periodic Table to better understand what conditions predispose an open data project toward success or failure. For example, having a clear problem definition, as well as the capacity and culture to carry out open data projects, are vital. Successful projects also build cross-sector partnerships around open data and its potential uses and establish practices to assess and mitigate risks, and have transparent and responsive governance structures….(More)”.

Google is using AI to predict floods in India and warn users

James Vincent at The Verge: “For years Google has warned users about natural disasters by incorporating alerts from government agencies like FEMA into apps like Maps and Search. Now, the company is making predictions of its own. As part of a partnership with the Central Water Commission of India, Google will now alert users in the country about impending floods. The service is only currently available in the Patna region, with the first alert going out earlier this month.

As Google’s engineering VP Yossi Matias outlines in a blog post, these predictions are being made using a combination of machine learning, rainfall records, and flood simulations.

“A variety of elements — from historical events, to river level readings, to the terrain and elevation of a specific area — feed into our models,” writes Matias. “With this information, we’ve created river flood forecasting models that can more accurately predict not only when and where a flood might occur, but the severity of the event as well.”

The US tech giant announced its partnership with the Central Water Commission back in June. The two organizations agreed to share technical expertise and data to work on the predictions, with the Commission calling the collaboration a “milestone in flood management and in mitigating the flood losses.” Such warnings are particularly important in India, where 20 percent of the world’s flood-related fatalities are estimated to occur….(More)”.

The rush for data risks growing the North-South divide

Laura Mann and Gianluca Lazzolino at SciDevNet: “Across the world, tech firms and software developers are embedding digital platforms into humanitarian and commercial infrastructures. There’s Jembi and Hello Doctor for the healthcare sector, for example; SASSA and Tamween for social policy; and M-farmi-CowEsoko among many others for agriculture.

While such systems proliferate, it is time we asked some tough questions about who is controlling this data, and for whose benefit. There is a danger that ‘platformisation’ widens the knowledge gap between firms and scientists in poorer countries and those in more advanced economies.

Digital platforms serve three purposes. They improve interactions between service providers and users; gather transactional data about those users; and nudge them towards behaviours, activities and products considered ‘virtuous’, profitable, or valued — often because they generate more data. This data  can be extremely valuable to policy-makers interested in developing interventions, to researchers exploring socio-economic trends and to businesses seeking new markets.

But the development and use of these platforms are not always benign.

Knowledge and power

Digital technologies are knowledge technologies because they record the personal information, assets, behaviour and networks of the people that use them.

Knowledge has a somewhat gentle image of a global good shared openly and evenly across the world. But in reality, it is competitive.
Simply put, knowledge shapes economic rivalry between rich and poor countries. It influences who has power over the rules of the economic game, and it does this in three key ways.

First, firms can use knowledge and technology to become more efficient and competitive in what they do. For example, a farmer can choose to buy technologically enhanced seeds, inputs such as fertilisers, and tools to process their crop.

This technology transfer is not automatic — the farmer must first invest time to learn how to use these tools.  In this sense, economic competition between nations is partly about how well-equipped their people are in using technology effectively.

The second key way in which knowledge impacts global economic competition depends on looking at development as a shift from cut-throat commodity production towards activities that bring higher profits and wages.

In farming, for example, development means moving out of crop production alone into a position of having more control over agricultural inputs, and more involvement in distributing or marketing agricultural goods and services….(More)”.

The New York City Business Atlas: Leveling the Playing Field for Small Businesses with Open Data

Chapter by Stefaan Verhulst and Andrew Young in Smarter New York City:How City Agencies Innovate. Edited by André Corrêa d’Almeida: “While retail entrepreneurs, particularly those operating in the small-business space, are experts in their respective trades, they often lack access to high-quality information about social, environmental, and economic conditions in the neighborhoods where they operate or are considering operating.

The New York City Business Atlas, conceived by the Mayor’s Office of Data Analytics (MODA) and the Department of Small Business Services, is designed to alleviate that information gap by providing a public web-based tool that gives small businesses access to high-quality data to help them decide where to establish a new business or expand an existing one. e tool brings together a diversity of data, including business-fling data from the Department of Consumer Affairs, sales-tax data from the Department of Finance, demographic data from the census, and traffic data from Placemeter, a New York City startup focusing on real-time traffic information.

The initial iteration of the Business Atlas made useful and previously inaccessible data available to small-business owners and entrepreneurs in an innovative manner. After a few years, however, it became clear that the tool was not experiencing the level of use or creating the level of demonstrable impact anticipated. Rather than continuing down the same path or abandoning the effort entirely, MODA pivoted to a new approach, moving from the Business Atlas as a single information-providing tool to the Business Atlas as a suite of capabilities aimed at bolstering New York’s small-business community.

Through problem- and user-centered efforts, the Business Atlas is now making important insights available to stakeholders who can put it to meaningful use—from how long it takes to open a restaurant in the city to which areas are most in need of education and outreach to improve their code compliance. This chapter considers the open data environment from which the Business Atlas was launched, details the initial version of the Business Atlas and the lessons it generated and describes the pivot to this new approach….(More)”.

Ethics & Algorithms Toolkit

Toolkit: “Government leaders and staff who leverage algorithms are facing increasing pressure from the public, the media, and academic institutions to be more transparent and accountable about their use. Every day, stories come out describing the unintended or undesirable consequences of algorithms. Governments have not had the tools they need to understand and manage this new class of risk.

GovEx, the City and County of San Francisco, Harvard DataSmart, and Data Community DC have collaborated on a practical toolkit for cities to use to help them understand the implications of using an algorithm, clearly articulate the potential risks, and identify ways to mitigate them….We developed this because:

  • We saw a gap. There are many calls to arms and lots of policy papers, one of which was a DataSF research paper, but nothing practitioner-facing with a repeatable, manageable process.
  • We wanted an approach which governments are already familiar with: risk management. By identifing and quantifying levels of risk, we can recommend specific mitigations.. …(More)”.

Urban Science: Putting the “Smart” in Smart Cities

Introduction to Special Issue on Urban Modeling and Simulation by Shade T. Shutters: “Increased use of sensors and social data collection methods have provided cites with unprecedented amounts of data. Yet, data alone is no guarantee that cities will make smarter decisions and many of what we call smart cities would be more accurately described as data-driven cities.

Parallel advances in theory are needed to make sense of those novel data streams and computationally intensive decision support models are needed to guide decision makers through the avalanche of new data. Fortunately, extraordinary increases in computational ability and data availability in the last two decades have led to revolutionary advances in the simulation and modeling of complex systems.

Techniques, such as agent-based modeling and systems dynamic modeling, have taken advantage of these advances to make major contributions to diverse disciplines such as personalized medicine, computational chemistry, social dynamics, or behavioral economics. Urban systems, with dynamic webs of interacting human, institutional, environmental, and physical systems, are particularly suited to the application of these advanced modeling and simulation techniques. Contributions to this special issue highlight the use of such techniques and are particularly timely as an emerging science of cities begins to crystallize….(More)”.

Making Wage Data Work: Creating a Federal Resource for Evidence and Transparency

Christina Pena at the National Skills Coalition: “Administrative data on employment and earnings, commonly referred to as wage data or wage records, can be used to assess the labor market outcomes of workforce, education, and other programs, providing policymakers, administrators, researchers, and the public with valuable information. However, there is no single readily accessible federal source of wage data which covers all workers. Noting the importance of employment and earnings data to decision makers, the Commission on Evidence-Based Policymaking called for the creation of a single federal source of wage data for statistical purposes and evaluation. They recommended three options for further exploration: expanding access to systems that already exist at the U.S. Census Bureau or the U.S. Department of Health and Human Services (HHS), or creating a new database at the U.S. Department of Labor (DOL).

This paper reviews current coverage and allowable uses, as well as federal and state actions required to make each option viable as a single federal source of wage data that can be accessed by government agencies and authorized researchers. Congress and the President, in conjunction with relevant federal and state agencies, should develop one or more of those options to improve wage information for multiple purposes. Although not assessed in the following review, financial as well as privacy and security considerations would influence the viability of each scenario. Moreover, if a system like the Commission-recommended National Secure Data Service for sharing data between agencies comes to fruition, then a wage system might require additional changes to work with the new service….(More)”

Uninformed Consent

Leslie K. John at Harvard Business Review: “…People are bad at making decisions about their private data. They misunderstand both costs and benefits. Moreover, natural human biases interfere with their judgment. And whether by design or accident, major platform companies and data aggregators have structured their products and services to exploit those biases, often in subtle ways.

Impatience. People tend to overvalue immediate costs and benefits and underweight those that will occur in the future. They want $9 today rather than $10 tomorrow. On the internet, this tendency manifests itself in a willingness to reveal personal information for trivial rewards. Free quizzes and surveys are prime examples. …

The endowment effect. In theory people should be willing to pay the same amount to buy a good as they’d demand when selling it. In reality, people typically value a goodless when they have to buy it. A similar dynamic can be seen when people make decisions about privacy….

Illusion of control. People share a misapprehension that they can control chance processes. This explains why, for example, study subjects valued lottery tickets that they had personally selected more than tickets that had been randomly handed to them. People also confuse the superficial trappings of control with real control….

Desire for disclosure. This is not a decision-making bias. Rather, humans have what appears to be an innate desire, or even need, to share with others. After all, that’s how we forge relationships — and we’re inherently social creatures…

False sense of boundaries. In off-line contexts, people naturally understand and comply with social norms about discretion and interpersonal communication. Though we may be tempted to gossip about someone, the norm “don’t talk behind people’s backs” usually checks that urge. Most of us would never tell a trusted confidant our secrets when others are within earshot. And people’s reactions in the moment can make us quickly scale back if we disclose something inappropriate….(More)”.

United Nations accidentally exposed passwords and sensitive information to the whole internet

Micah Lee at The Intercept: “The United Nations accidentally published passwords, internal documents, and technical details about websites when it misconfigured popular project management service Trello, issue tracking app Jira, and office suite Google Docs.

The mistakes made sensitive material available online to anyone with the proper link, rather than only to specific users who should have access. Affected data included credentials for a U.N. file server, the video conferencing system at the U.N.’s language school, and a web development environment for the U.N.’s Office for the Coordination of Humanitarian Affairs. Security researcher Kushagra Pathak discovered the accidental leak and notified the U.N. about what he found a little over a month ago. As of today, much of the material appears to have been taken down.

In an online chat, Pathak said he found the sensitive information by running searches on Google. The searches, in turn, produced public Trello pages, some of which contained links to the public Google Docs and Jira pages.

Trello projects are organized into “boards” that contain lists of tasks called “cards.” Boards can be public or private. After finding one public Trello board run by the U.N., Pathak found additional public U.N. boards by using “tricks like by checking if the users of one Trello board are also active on some other boards and so on.” One U.N. Trello board contained links to an issue tracker hosted on Jira, which itself contained even more sensitive information. Pathak also discovered links to documents hosted on Google Docs and Google Drive that were configured to be accessible to anyone who knew their web addresses. Some of these documents contained passwords….Here is just some of the sensitive information that the U.N. accidentally made accessible to anyone who Googled for it:

  • A social media team promoting the U.N.’s “peace and security” efforts published credentials to access a U.N. remote file access, or FTP, server in a Trello card coordinating promotion of the International Day of United Nations Peacekeepers. It is not clear what information was on the server; Pathak said he did not connect to it.
  • The U.N.’s Language and Communication Programme, which offers language courses at U.N. Headquarters in New York City, published credentials for a Google account and a Vimeo account. The program also exposed, on a publicly visible Trello board, credentials for a test environment for a human resources web app. It also made public a Google Docs spreadsheet, linked from a public Trello board, that included a detailed meeting schedule for 2018, along with passwords to remotely access the program’s video conference system to join these meetings.
  • One public Trello board used by the developers of Humanitarian Response and ReliefWeb, both websites run by the U.N.’s Office for the Coordination of Humanitarian Affairs, included sensitive information like internal task lists and meeting notes. One public card from the board had a PDF, marked “for internal use only,” that contained a map of all U.N. buildings in New York City. …(More)”.