Using Public Data From Different Sources


Chapter byYair Cohen in Maximizing Social Science Research Through Publicly Accessible Data Sets, book edited by S. Marshall Perry: “The United States federal government agencies as well as states agencies are liberating their data through web portals. Web portals like data.gov, census.gov, healthdata.gov, ed.gov and many others on the state level provide great opportunity for researchers of all fields. This chapter shows the challenges and the opportunities that lie by merging data from different pubic sources. The researcher collected and merged data from the following datasets: NYSED school report card, NYSED Fiscal Profile Reporting System, Civil Rights Data Collection, and Census 2010 School District Demographics System. The challenges include data validation, data cleaning, flatting data for easy reporting, and merging datasets based on text fields….(More)”.

 

Using big data to predict suicide risk among Canadian youth


SAS Insights “Suicide is the second leading cause of death among youth in Canada, according to Statistics Canada, accounting for one-fifth of deaths of people under the age of 25 in 2011. The Canadian Mental Health Association states that among 15 – 24 year olds the number is an even more frightening at 24 percent – the third highest in the industrialized world. Yet despite these disturbing statistics, the signals that an individual plans on self-injury or suicide are hard to isolate….

Team members …collected 2.3 million tweets and used text mining software to identify 1.1 million of them as likely to have been authored by 13 to 17 year olds in Canada by building a machine learning model to predict age, based on the open source PAN author profiling dataset. Their analysis made use of natural language processing, predictive modelling, text mining, and data visualization….

However, there were challenges. Ages are not revealed on Twitter, so the team had to figure out how to tease out the data for 13 – 17 year olds in Canada. “We had a text data set, and we created a model to identify if people were in that age group based on how they talked in their tweets,” Soehl said. “From there, we picked some specific buzzwords and created topics around them, and our software mined those tweets to collect the people.”

Another issue was the restrictions Twitter places on pulling data, though Soehl believes that once this analysis becomes an established solution, Twitter may work with researchers to expedite the process. “Now that we’ve shown it’s possible, there are a lot of places we can go with it,” said Soehl. “Once you know your path and figure out what’s going to be valuable, things come together quickly.”

The team looked at the percentage of people in the group who were talking about depression or suicide, and what they were talking about. Horne said that when SAS’ work went in front of a Canadian audience working in health care, they said that it definitely filled a gap in their data — and that was the validation he’d been looking for. The team also won $10,000 for creating the best answer to this question (the team donated the award money to two mental health charities: Mind Your Mind and Rise Asset Development)

What’s next?

That doesn’t mean the work is done, said Jos Polfliet. “We’re just scraping the surface of what can be done with the information.” Another way to use the results is to look at patterns and trends….(More)”

Privacy and Outrage


Paper by Jordan M. Blanke: “Technology has dramatically altered virtually every aspect of our life in recent years. While technology has always driven change, it seems that these changes are occurring more rapidly and more extensively than ever before. Society and its laws will evolve; but it is not always an easy process. Privacy has changed dramatically in our data-driven world – and continues to change daily. It has always been difficult to define exactly what privacy is, and therefore, it is even more difficult to propose what it should become. As the meaning of privacy often varies from person to person, it is difficult to establish a one-size-fits-all concept. This paper explores some of the historical, legal and ethical development of privacy, discusses how some of the normative values of privacy may survive or change, and examines how outrage has been – and will continue to be – a driver of such change….(More)”.

Advancing Urban Health and Wellbeing Through Collective and Artificial Intelligence: A Systems Approach 3.0


Policy brief by Franz Gatzweiler: “Many problems of urban health and wellbeing, such as pollution, obesity, ageing, mental health, cardiovascular diseases, infectious diseases, inequality and poverty (WHO 2016), are highly complex and beyond the reach of individual problem solving capabilities. Biodiversity loss, climate change, and urban health problems emerge at aggregate scales and are unpredictable. They are the consequence of complex interactions between many individual agents and their environments across urban sectors and scales. Another challenge of complex urban health problems is the knowledge approach we apply to understand and solve them. We are challenged to create a new, innovative knowledge approach to understand and solve the problems of urban health. The positivist approach of separating cause from effect, or observer from observed, is insufficient when human agents are both part of the problemand the solution.

Problems emerging from complexity can only be solved collectively by applying rules which govern complexity. For example, the law of requisite variety (Ashby 1960) tells us that we need as much variety in our problemsolving toolbox as there are different types of problemsto be solved, and we need to address these problems at the respective scale. No individual, hasthe intelligence to solve emergent problems of urban health alone….

  • Complex problems of urban health and wellbeing cause millions of premature deaths annually and are beyond the reach of individual problem-solving capabilities.
  • Collective and artificial intelligence (CI+AI) working together can address the complex challenges of urban health
  • The systems approach (SA) is an adaptive, intelligent and intelligence-creating, “data-metabolic” mechanism for solving such complex challenges
  • Design principles have been identified to successfully create CI and AI. Data metabolic costs are the limiting factor.
  • A call for collaborative action to build an “urban brain” by means of next generation systems approaches is required to save lives in the face of failure to tackle complex urban health challenges….(More)”.

Disaster recovery’s essential tool: Data


Amy Liu and Allison Plyer at Brookings: “To recover from a disaster on the scale of Harvey and Irma requires a massive coordinated effort. Federal, state and local governments must lead. Philanthropy, nonprofits and the private sector will be key partners. Residents will voice their views, through community planning meetings and other venues, on how best to spend disaster-recovery dollars. With so many stakeholders and rebuilding needs, the process of restoring neighborhoods and economic activity will become emotionally and politically charged. As Brock Long, administrator of the Federal Emergency Management Agency, has already warned in Texas: “This is going to be a frustrating and painful process.”

For public officials to effectively steer a recovery process and for citizens to trust in the effort, reliable, transparent information will be essential. Leaders and the public need a shared understanding of the scale and extent of the damage and which households, businesses and neighborhoods have been affected. This is not a one-time effort. Data must be collected and issued regularly over months and years to match the duration of the rebuilding effort.

Without this information, it will be nearly impossible to estimate the nature of aid required, determine how best to deploy resources, prioritize spending and monitor progress. Rebuilding processes are chaotic, with emotions high over multiple, competing priorities. Credible public information organized in one place can help to neutralize misconceptions, put every need in context and depoliticize decision-making. Most importantly, data on recovery needs also can enable citizen involvement and allow residents to hold public leaders accountable for progress.

We know this first-hand from our experience in New Orleans, where the Brookings Institution and the New Orleans Data Center teamed up to produce what became the New Orleans Index following Hurricane Katrina in 2005. We set out to help the public and decision-makers understand the level of outstanding damage in New Orleans and the region and to monitor the extent to which the city was bouncing back….(More)”.

Storm Crowds: Evidence from Zooniverse on Crowd Contribution Design


Paper by Sandra Barbosu and Joshua S. Gans: “Crowdsourcing – a collaborative form of content production based on the contributions of large groups of individuals – has proliferated in the past decade. As a result, recent research seeks to understand the factors that affect its sustainability. Prior studies have highlighted the importance of volunteers’ prosocial motivations, the sense of belonging to a community, and symbolic rewards within crowdsourcing websites. One factor that has received limited attention in the existing literature is how the design of crowdsourcingplatforms affects their sustainability.

We study whether the design element – particularly, the divisibility of contributions (i.e. whether contributing tasks are bundled together or can be carried out separately) – is a factor that affects the level and quality of crowdsourcing contributions. We investigate this in the context of Zooniverse, the world’s largest crowdsourced science site, in which volunteers contribute to scientific research by performing data processing tasks. Our choice of empirical setting is motivated by the fact that one of the Zooniverse projects, Cyclone Center, underwent a format change that decreased the divisibility of contributions, by bundling together two tasks that were previously separate. We refer to contributions for which both tasks were done as complete, and contributions for which only one task was done as incomplete. In this context, we develop a theoretical model that predicts (i) a positive relationship between contribution divisibility and the total number of contributions (i.e. complete and incomplete) per volunteer, (ii) an ambiguous relationship between contribution divisibility and the number of complete contributions per volunteer, and (iii) an ambiguous relationship between contribution divisibility and the value of complete contributions. We test these predictions empirically by exploiting the format change in Cyclone Center.

We find that after the format change, which decreased contribution divisibility, (i) the total number of contributions per volunteer decreased, (ii) the number of complete contributions made by anonymous volunteers increased, while that made by registered volunteers remained unchanged, and (iii) the value of complete contributions increased because anonymous volunteers, who increased their number of complete contributions, contributed high quality contributions. Our results have strategic implications for crowdsourcing platforms because they suggest that the design of crowdsourcing platforms, specifically the divisibility of contributions, is a factor that matters for their sustainability….(More)”

Google Gets Serious About Mapping Wheelchair Accessibility


Linda Poon at CityLab: “If there’s one thing Google’s got at its disposal, it’s a global army of avid map users. Now the company is leveraging that power to make its Maps feature more useful for people with mobility challenges—a group that often gets overlooked in the world of transit and urban innovation.

Google Maps already indicates if a location is wheelchair accessible—a result of a personal project by one of its employees—but its latest campaign will crowdsource data from its 30 million Local Guides worldwide, who contribute tips and photos about neighborhood establishments in exchange for points and small prizes like extra digital storage space. The company is calling on them to answer five simple questions—like whether a building has accessible entrances or bathrooms—when they submit a review for a location. In the coming weeks, Google will host workshops and “geowalks” specifically focused on mobility across seven cities, from New York City and London to Tokyo and Surabaya, Indonesia.

“The [users] have multiple motivations, and one is wanting to help their own community get around.” says Laura Slabin, Google’s director of local content and community. “So we’re leveraging the fact that people are motivated by altruism.”

But as simple as the questions seem—Is there wheelchair-accessible seating? or Is there a wheelchair-accessible elevator?—answering them requires careful attention to detail. That’s why Google even sent out a  nifty tip sheet to help its physically abled members answer those questions….(More)”.

Why Information Matters


Essay by Luciano Floridi in Special Issue of Atlantis on Information, Matter and Life: “…As information technologies come to affect all areas of life, they are becoming implicated in our most important problems — their causes, effects, and solutions, the scientific investigations aimed at explaining them, the concepts created to understand them, the means of discussing them, and even, as in the case of Bill Gates, the wealth required to tackle them.

Furthermore, information technologies don’t just modify how we act in the world; they also profoundly affect how we understand the world, how we relate to it, how we see ourselves, how we interact with each other, and how our hopes for a better future are shaped. All these are old philosophical issues, of course, but we must now consider them anew, with the concept of information as a central concern.

This means that if philosophers are to help enable humanity to make sense of our world and to improve it responsibly, information needs to be a significant field of philosophical study. Among our mundane and technical concepts, information is currently not only one of the most important and widely used, but also one of the least understood. We need a philosophy of information.

How to Ask a Question

In the fall of 1999, NASA lost radio contact with its Mars Climate Orbiter, a $125 million weather satellite that had been launched the year before. In a maneuver to enter the spacecraft into orbit around Mars, the trajectory had put the spacecraft far closer to Mars than planned, so that it directly entered the planet’s atmosphere, where it probably disintegrated. The reason for this unhappy event was that for a particular software file, the Lockheed Martin engineering team had used English (imperial) units of measurement instead of the metric units specified by the agency, whose trajectory modelers assumed the data they were looking at was provided in metric.

This incident illustrates a simple lesson: successful cooperation depends on an agreement between all parties that the information being exchanged is fixed at a specified level. Wrongly assuming that everyone will follow the rules that specify the level — for example, that impulse will be expressed not as pound-seconds (the English unit) but as newton-seconds (the metric unit) — can lead to costly mistakes. Even though this principle may seem obvious, it is one of the most valuable contributions that philosophy can offer to our understanding of information. This is because, as we will see, failing to specify a level at which we ask a given philosophical question can be the reason for deep confusions and useless answers. Another simple example will help to illustrate the problem…(More)”

Reclaiming personal data for the common good


Theo Bass at Nesta: “…The argument of our new report for DECODE is that more of the social value of personal data can be discovered by tools and platforms that give people the power to decide how their data is used. We need to flip the current model on its head, giving people back full control and respecting our data protection and fundamental rights framework.

The report describes how this might pave the way for a fairer distribution of the value generated by data, while opening up new use-cases that are valuable to government, society and individuals themselves. In order to achieve this vision, the DECODE project will develop and test the following:

Flexible rules to give people full control:  There is currently a lack of  technical and legal norms that would allow people to control and share data on their own terms. If this were possible, then people might be able to share their data for the public good, or publish it as anonymised open data under specific conditions, or for specific use-cases (say, non-commercial purposes). DECODE is working with the Making Sense project and Barcelona City Council to assist local communities with new forms of citizen sensing. The pilots will tackle the challenges of collating, storing and sharing data anonymously to influence policy on the city’s digital democracy platform Decidim (part of the D-CENT toolkit).

Trusted platforms to realise the collective value of data: Much of the opportunity will only be realised where individuals are able to pool their data together to leverage its potential economic and social value. Platform cooperatives offer a feasible model, highlighting the potential of digital technologies to help members collectively govern themselves. Effective data sharing has to be underpinned by high levels of user trust, and platform co-ops achieve this by embedding openness, respect for individual users’ privacy, and democratic participation over how decisions are made. DECODE is working with two platform co-ops – a neighbourhood social networking site called Gebied Online; and a democratic alternative to Airbnb in Amsterdam called FairBnB – to test new privacy-preserving features and granular data sharing options….(More)”

A Guide to Tactical Data Engagement


Sunlight Foundation Press Release: “A Guide to Tactical Data Engagement is a brand new resource released today designed to help city leaders and residents collaborate on increasing the social impact of open government data. Based on the core concepts of human-centered design and tactical urbanism, this approach challenges city halls to make open data programs more transparent, accountable, and participatory by actively helping residents use open government data to improve their communities.

The new guide outlines a four-step process to help readers complete a resident-informed project, product, or tool that addresses a specific community need:

  • Find a focus area by observing the community
  • Refine use cases by interviewing stakeholders
  • Design a plan by coordinating with target users
  • Implement an intervention by collaborating with actual users

Readers can carry out each of these steps “tactically” — using lightweight, adaptable, and inexpensive tactics that can realistically fit within a city hall’s or community’s unique constraints and capacities. The tactics are drawn from examples of good resident engagement around the country, and ensure that in every step, residents are collaborators in determining promising opportunities for impact through the community use of open government data.

Read the new guide to see the full process, including specific ideas at each step of the way to help your community come together and use open data to solve problems, together….”