There aren’t any rules on how social scientists use private data. Here’s why we need them.


 at SSRC: “The politics of social science access to data are shifting rapidly in the United States as in other developed countries. It used to be that states were the most important source of data on their citizens, economy, and society. States needed to collect and aggregate large amounts of information for their own purposes. They gathered this directly—e.g., through censuses of individuals and firms—and also constructed relevant indicators. Sometimes state agencies helped to fund social science projects in data gathering, such as the National Science Foundation’s funding of the American National Election Survey over decades. While scholars such as James Scott and John Brewer disagreed about the benefits of state data gathering, they recognized the state’s primary role.

In this world, the politics of access to data were often the politics of engaging with the state. Sometimes the state was reluctant to provide information, either for ethical reasons (e.g. the privacy of its citizens) or self-interest. However, democratic states did typically provide access to standard statistical series and the like, and where they did not, scholars could bring pressure to bear on them. This led to well-understood rules about the common availability of standard data for many research questions and built the foundations for standard academic practices. It was relatively easy for scholars to criticize each other’s work when they were drawing on common sources. This had costs—scholars tended to ask the kinds of questions that readily available data allowed them to ask—but also significant benefits. In particular, it made research more easily reproducible.

We are now moving to a very different world. On the one hand, open data initiatives in government are making more data available than in the past (albeit often without much in the way of background resources or documentation).The new universe of private data is reshaping social science research in some ways that are still poorly understood. On the other, for many research purposes, large firms such as Google or Facebook (or even Apple) have much better data than the government. The new universe of private data is reshaping social science research in some ways that are still poorly understood. Here are some of the issues that we need to think about:…(More)”

Designing an Active, Healthier City


Meera Senthilingam in the New York Times: “Despite a firm reputation for being walkers, New Yorkers have an obesity epidemic on their hands. Lee Altman, a former employee of New York City’s Department of Design and Construction, explains it this way: “We did a very good job at designing physical activity out of our daily lives.”

According to the city’s health department, more than half of the city’s adult population is either overweight (34 percent) or obese (22 percent), and the convenience of their environment has contributed to this. “Everything is dependent on a car, elevator; you sit in front of a computer,” said Altman, “not moving around a lot.”

This is not just a New York phenomenon. Mass urbanization has caused populations the world over to reduce the amount of time they spend moving their bodies. But the root of the problem runs deep in a city’s infrastructure.

Safety, graffiti, proximity to a park, and even the appeal of stairwells all play roles in whether someone chooses to be active or not. But only recently have urban developers begun giving enough priority to these factors.

Planners in New York have now begun employing a method known as “active design” to solve the problem. The approach is part of a global movement to get urbanites onto their streets and enjoying their surroundings on foot, bike or public transport.

“We can impact public health and improve health outcomes through the way that we design,” said Altman, a former active design coordinator for New York City. She now lectures as an adjunct assistant professor inColumbia University’s urban design program.

“The communities that have the least access to well-maintained sidewalks and parks have the highest risk of obesity and chronic disease,” said Joanna Frank, executive director of the nonprofit Center for Active Design; her work focuses on creating guidelines and reports, so that developers and planners are aware, for example, that people have been “less likely to walk down streets, less likely to bike, if they didn’t feel safe, or if the infrastructure wasn’t complete, so you couldn’t get to your destination.”

Even adding items as straightforward as benches and lighting to a streetscape can greatly increase the likelihood of someone’s choosing to walk, she said.

This may seem obvious, but without evidence its importance could be overlooked. “We’ve now established that’s actually the case,” said Frank.

How can things change? According to Frank, four areas are critical: transportation, recreation, buildings and access to food….(More)”

Data as a Means, Not an End: A Brief Case Study


Tracie Neuhaus & Jarasa Kanok  in the Stanford Social Innovation Review: “In 2014, City Year—the well-known national education nonprofit that leverages young adults in national service to help students and schools succeed—was outgrowing the methods it used for collecting, managing, and using performance data. As the organization established its strategy for long-term impact, leaders identified a business problem: The current system for data collection and use would need to evolve to address the more-complex challenges the organization was undertaking. Staff throughout the organization were citing pain points one might expect, including onerous manual data collection, and long lag times to get much-needed data and reports on student attendance, grades, and academic and social-emotional assessments. After digging deeper, leaders realized they couldn’t fix the organization’s challenges with technology or improved methods without first addressing more fundamental issues. They saw City Year lacked a common “language” for the data it collected and used. Staff varied widely in their levels of data literacy, as did the scope of data-sharing agreements with the 27 urban school districts where City Year was working at the time. What’s more, its evaluation group had gradually become a default clearinghouse for a wide variety of service requests from across the organization that the group was neither designed nor staffed to address. The situation was much more complex than it appeared.

With significant technology roadmap decisions looming, City Year engaged with us to help it develop its data strategy. Together we came to realize that these symptoms were reflective of a single issue, one that exists in many organizations: City Year’s focus on data wasn’t targeted to address the very different kinds of decisions that each staff member—from the front office to the front lines—needed to make. …

Many of us in the social sector have probably seen elements of this dynamic. Many organizations create impact reports designed to satisfy external demands from donors, but these reports have little relevance to the operational or strategic choices the organizations face every day, much less address harder-to-measure, system-level outcomes. As a result, over time and in the face of constrained resources, measurement is relegated to a compliance activity, disconnected from identifying and collecting the information that directly enables individuals within the organization to drive impact. Gathering data becomes an end in itself, rather than a means of enabling ground-level work and learning how to improve the organization’s impact.

Overcoming this all-too-common “measurement drift” requires that we challenge the underlying orthodoxies that drive it and reorient measurement activities around one simple premise: Data should support better decision-making. This enables organizations to not only shed a significant burden of unproductive activity, but also drive themselves to new heights of performance.

In the case of City Year, leaders realized that to really take advantage of existing technology platforms, they needed a broader mindset shift….(More)”

Research in the Crowdsourcing Age, a Case Study


Report by  (Pew): “How scholars, companies and workers are using Mechanical Turk, a ‘gig economy’ platform, for tasks computers can’t handle

How Mechanical Turk WorksDigital age platforms are providing researchers the ability to outsource portions of their work – not just to increasingly intelligent machines, but also to a relatively low-cost online labor force comprised of humans. These so-called “online outsourcing” services help employers connect with a global pool of free-agent workers who are willing to complete a variety of specialized or repetitive tasks.

Because it provides access to large numbers of workers at relatively low cost, online outsourcing holds a particular appeal for academics and nonprofit research organizations – many of whom have limited resources compared with corporate America. For instance, Pew Research Center has experimented with using these services to perform tasks such as classifying documents and collecting website URLs. And a Google search of scholarly academic literature shows that more than 800 studies – ranging from medical research to social science – were published using data from one such platform, Amazon’s Mechanical Turk, in 2015 alone.1

The rise of these platforms has also generated considerable commentary about the so-called “gig economy” and the possible impact it will have on traditional notions about the nature of work, the structure of compensation and the “social contract” between firms and workers. Pew Research Center recently explored some of the policy and employment implications of these new platforms in a national survey of Americans.

Proponents say this technology-driven innovation can offer employers – whether companies or academics – the ability to control costs by relying on a global workforce that is available 24 hours a day to perform relatively inexpensive tasks. They also argue that these arrangements offer workers the flexibility to work when and where they want to. On the other hand, some critics worry this type of arrangement does not give employees the same type of protections offered in more traditional work environments – while others have raised concerns about the quality and consistency of data collected in this manner.

A recent report from the World Bank found that the online outsourcing industry generated roughly $2 billion in 2013 and involved 48 million registered workers (though only 10% of them were considered “active”). By 2020, the report predicted, the industry will generate between $15 billion and $25 billion.

Amazon’s Mechanical Turk is one of the largest outsourcing platforms in the United States and has become particularly popular in the social science research community as a way to conduct inexpensive surveys and experiments. The platform has also become an emblem of the way that the internet enables new businesses and social structures to arise.

In light of its widespread use by the research community and overall prominence within the emerging world of online outsourcing, Pew Research Center conducted a detailed case study examining the Mechanical Turk platform in late 2015 and early 2016. The study utilizes three different research methodologies to examine various aspects of the Mechanical Turk ecosystem. These include human content analysis of the platform, a canvassing of Mechanical Turk workers and an analysis of third party data.

The first goal of this research was to understand who uses the Mechanical Turk platform for research or business purposes, why they use it and who completes the work assignments posted there. To evaluate these issues, Pew Research Center performed a content analysis of the tasks posted on the site during the week of Dec. 7-11, 2015.

A second goal was to examine the demographics and experiences of the workers who complete the tasks appearing on the site. This is relevant not just to fellow researchers that might be interested in using the platform, but as a snapshot of one set of “gig economy” workers. To address these questions, Pew Research Center administered a nonprobability online survey of Turkers from Feb. 9-25, 2016, by posting a task on Mechanical Turk that rewarded workers for answering questions about their demographics and work habits. The sample of 3,370 workers contains any number of interesting findings, but it has its limits. This canvassing emerges from an opt-in sample of those who were active on MTurk during this particular period, who saw our survey and who had the time and interest to respond. It does not represent all active Turkers in this period or, more broadly, all workers on MTurk.

Finally, this report uses data collected by the online tool mturk-tracker, which is run by Dr. Panagiotis G. Ipeirotis of the New York University Stern School of Business, to examine the amount of activity occurring on the site. The mturk-tracker data are publically available online, though the insights presented here have not been previously published elsewhere….(More)”

What is Artificial Intelligence?


Report by Mike Loukides and Ben Lorica: “Defining artificial intelligence isn’t just difficult; it’s impossible, not the least because we don’t really understand human intelligence. Paradoxically, advances in AI will help more to define what human intelligence isn’t than what artificial intelligence is.

But whatever AI is, we’ve clearly made a lot of progress in the past few years, in areas ranging from computer vision to game playing. AI is making the transition from a research topic to the early stages of enterprise adoption. Companies such as Google and Facebook have placed huge bets on AI and are already using it in their products. But Google and Facebook are only the beginning: over the next decade, we’ll see AI steadily creep into one product after another. We’ll be communicating with bots, rather than scripted robo-dialers, and not realizing that they aren’t human. We’ll be relying on cars to plan routes and respond to road hazards. It’s a good bet that in the next decades, some features of AI will be incorporated into every application that we touch and that we won’t be able to do anything without touching an application.

Given that our future will inevitably be tied up with AI, it’s imperative that we ask: Where are we now? What is the state of AI? And where are we heading?

Capabilities and Limitations Today

Descriptions of AI span several axes: strength (how intelligent is it?), breadth (does it solve a narrowly defined problem, or is it general?), training (how does it learn?), capabilities (what kinds of problems are we asking it to solve?), and autonomy (are AIs assistive technologies, or do they act on their own?). Each of these axes is a spectrum, and each point in this many-dimensional space represents a different way of understanding the goals and capabilities of an AI system.

On the strength axis, it’s very easy to look at the results of the last 20 years and realize that we’ve made some extremely powerful programs. Deep Blue beat Garry Kasparov in chess; Watson beat the best Jeopardy champions of all time; AlphaGo beat Lee Sedol, arguably the world’s best Go player. But all of these successes are limited. Deep Blue, Watson, and AlphaGo were all highly specialized, single-purpose machines that did one thing extremely well. Deep Blue and Watson can’t play Go, and AlphaGo can’t play chess or Jeopardy, even on a basic level. Their intelligence is very narrow, and can’t be generalized. A lot of work has gone into usingWatson for applications such as medical diagnosis, but it’s still fundamentally a question-and-answer machine that must be tuned for a specific domain. Deep Blue has a lot of specialized knowledge about chess strategy and an encyclopedic knowledge of openings. AlphaGo was built with a more general architecture, but a lot of hand-crafted knowledge still made its way into the code. I don’t mean to trivialize or undervalue their accomplishments, but it’s important to realize what they haven’t done.

We haven’t yet created an artificial general intelligence that can solve a multiplicity of different kinds of problems. We still don’t have a machine that can listen to recordings of humans for a year or two, and start speaking. While AlphaGo “learned” to play Go by analyzing thousands of games, and then playing thousands more against itself, the same software couldn’t be used to master chess. The same general approach? Probably. But our best current efforts are far from a general intelligence that is flexible enough to learn without supervision, or flexible enough to choose what it wants to learn, whether that’s playing board games or designing PC boards.

Toward General Intelligence

How do we get from narrow, domain-specific intelligence to more general intelligence? By “general intelligence,” we don’t necessarily mean human intelligence; but we do want machines that can solve different kinds of problems without being programmed with domain-specific knowledge. We want machines that can make human judgments and decisions. That doesn’t necessarily mean that AI systems will implement concepts like creativity, intuition, or instinct, which may have no digital analogs. A general intelligence would have the ability to follow multiple pursuits and to adapt to unexpected situations. And a general AI would undoubtedly implement concepts like “justice” and “fairness”: we’re already talking about the impact of AI on the legal system….

It’s easier to think of super-intelligence as a matter of scale. If we can create “general intelligence,” it’s easy to assume that it could quickly become thousands of times more powerful than human intelligence. Or, more precisely: either general intelligence will be significantly slower than human thought, and it will be difficult to speed it up either through hardware or software; or it will speed up quickly, through massive parallelism and hardware improvements. We’ll go from thousand-core GPUs to trillions of cores on thousands of chips, with data streaming in from billions of sensors. In the first case, when speedups are slow, general intelligence might not be all that interesting (though it will have been a great ride for the researchers). In the second case, the ramp-up will be very steep and very fast….(More) (Full Report)”

Solving All the Wrong Problems


Allison Arieff in the New York Times:Every day, innovative companies promise to make the world a better place. Are they succeeding? Here is just a sampling of the products, apps and services that have come across my radar in the last few weeks:

A service that sends someone to fill your car with gas.

A service that sends a valet on a scooter to you, wherever you are, to park your car.

A service that will film anything you desire with a drone….

We are overloaded daily with new discoveries, patents and inventions all promising a better life, but that better life has not been forthcoming for most. In fact, the bulk of the above list targets a very specific (and tiny!) slice of the population. As one colleague in tech explained it to me recently, for most people working on such projects, the goal is basically to provide for themselves everything that their mothers no longer do….When everything is characterized as “world-changing,” is anything?

Clay Tarver, a writer and producer for the painfully on-point HBO comedy “Silicon Valley,” said in a recent New Yorker article: “I’ve been told that, at some of the big companies, the P.R. departments have ordered their employees to stop saying ‘We’re making the world a better place,’ specifically because we have made fun of that phrase so mercilessly. So I guess, at the very least, we’re making the world a better place by making these people stop saying they’re making the world a better place.”

O.K., that’s a start. But the impulse to conflate toothbrush delivery with Nobel Prize-worthy good works is not just a bit cultish, it’s currently a wildfire burning through the so-called innovation sector. Products and services are designed to “disrupt” market sectors (a.k.a. bringing to market things no one really needs) more than to solve actual problems, especially those problems experienced by what the writer C. Z. Nnaemeka has described as “the unexotic underclass” — single mothers, the white rural poor, veterans, out-of-work Americans over 50 — who, she explains, have the “misfortune of being insufficiently interesting.”

If the most fundamental definition of design is to solve problems, why are so many people devoting so much energy to solving problems that don’t really exist? How can we get more people to look beyond their own lived experience?

In “Design: The Invention of Desire,” a thoughtful and necessary new book by the designer and theorist Jessica Helfand, the author brings to light an amazing kernel: “hack,” a term so beloved in Silicon Valley that it’s painted on the courtyard of the Facebook campus and is visible from planes flying overhead, is also prison slang for “horse’s ass carrying keys.”

To “hack” is to cut, to gash, to break. It proceeds from the belief that nothing is worth saving, that everything needs fixing. But is that really the case? Are we fixing the right things? Are we breaking the wrong ones? Is it necessary to start from scratch every time?…

Ms. Helfand calls for a deeper embrace of personal vigilance: “Design may provide the map,” she writes, “but the moral compass that guides our personal choices resides permanently within us all.”

Can we reset that moral compass? Maybe we can start by not being a bunch of hacks….(More)”

Bridging data gaps for policymaking: crowdsourcing and big data for development


 for the DevPolicyBlog: “…By far the biggest innovation in data collection is the ability to access and analyse (in a meaningful way) user-generated data. This is data that is generated from forums, blogs, and social networking sites, where users purposefully contribute information and content in a public way, but also from everyday activities that inadvertently or passively provide data to those that are able to collect it.

User-generated data can help identify user views and behaviour to inform policy in a timely way rather than just relying on traditional data collection techniques (census, household surveys, stakeholder forums, focus groups, etc.), which are often cumbersome, very costly, untimely, and in many cases require some form of approval or support by government.

It might seem at first that user-generated data has limited usefulness in a development context due to the importance of the internet in generating this data combined with limited internet availability in many places. However, U-Report is one example of being able to access user-generated data independent of the internet.

U-Report was initiated by UNICEF Uganda in 2011 and is a free SMS based platform where Ugandans are able to register as “U-Reporters” and on a weekly basis give their views on topical issues (mostly related to health, education, and access to social services) or participate in opinion polls. As an example, Figure 1 shows the result from a U-Report poll on whether polio vaccinators came to U-Reporter houses to immunise all children under 5 in Uganda, broken down by districts. Presently, there are more than 300,000 U-Reporters in Uganda and more than one million U-Reporters across 24 countries that now have U-Report. As an indication of its potential impact on policymaking,UNICEF claims that every Member of Parliament in Uganda is signed up to receive U-Report statistics.

Figure 1: U-Report Uganda poll results

Figure 1: U-Report Uganda poll results

U-Report and other platforms such as Ushahidi (which supports, for example, I PAID A BRIBE, Watertracker, election monitoring, and crowdmapping) facilitate crowdsourcing of data where users contribute data for a specific purpose. In contrast, “big data” is a broader concept because the purpose of using the data is generally independent of the reasons why the data was generated in the first place.

Big data for development is a new phrase that we will probably hear a lot more (see here [pdf] and here). The United Nations Global Pulse, for example, supports a number of innovation labs which work on projects that aim to discover new ways in which data can help better decision-making. Many forms of “big data” are unstructured (free-form and text-based rather than table- or spreadsheet-based) and so a number of analytical techniques are required to make sense of the data before it can be used.

Measures of Twitter activity, for example, can be a real-time indicator of food price crises in Indonesia [pdf] (see Figure 2 below which shows the relationship between food-related tweet volume and food inflation: note that the large volume of tweets in the grey highlighted area is associated with policy debate on cutting the fuel subsidy rate) or provide a better understanding of the drivers of immunisation awareness. In these examples, researchers “text-mine” Twitter feeds by extracting tweets related to topics of interest and categorising text based on measures of sentiment (positive, negative, anger, joy, confusion, etc.) to better understand opinions and how they relate to the topic of interest. For example, Figure 3 shows the sentiment of tweets related to vaccination in Kenya over time and the dates of important vaccination related events.

Figure 2: Plot of monthly food-related tweet volume and official food price statistics

Figure 2: Plot of monthly food-related Tweet volume and official food price statistics

Figure 3: Sentiment of vaccine related tweets in Kenya

Figure 3: Sentiment of vaccine-related tweets in Kenya

Another big data example is the use of mobile phone usage to monitor the movement of populations in Senegal in 2013. The data can help to identify changes in the mobility patterns of vulnerable population groups and thereby provide an early warning system to inform humanitarian response effort.

The development of mobile banking too offers the potential for the generation of a staggering amount of data relevant for development research and informing policy decisions. However, it also highlights the public good nature of data collected by public and private sector institutions and the reliance that researchers have on them to access the data. Building trust and a reputation for being able to manage privacy and commercial issues will be a major challenge for researchers in this regard….(More)”

Reforms to improve U.S. government accountability


Alexander B. Howard and Patrice McDermott in Science: “Five decades after the United States first enacted the Freedom of Information Act (FOIA), Congress has voted to make the first major reforms to the statute since 2007. President Lyndon Johnson signed the first FOIA on 4 July 1966, enshrining in law the public’s right to access to information from executive branch government agencies. Scientists and others around the world can use the FOIA to learn what the U.S. government has done in its policies and practices. Proposed reforms should be a net benefit to public understanding of the scientific process and knowledge, by increasing the access of scientists to archival materials and reducing the likelihood of science and scientists being suppressed by official secrecy or bureaucracy.

Although the FOIA has been important for accountability, reform is sorely needed. An analysis of the 15 federal government agencies that received the most FOIA requests found poor to abysmal compliance rates (1, 2). In 2016, the Associated Press found that the Obama Administration had set a new record for unfulfilled FOIA requests (3). Although that has to be considered in the context of a rise in request volume without commensurate increases in resources to address them, researchers have found that most agencies simply ignore routine requests for travel schedules (4). An audit of 165 federal government agencies found that only 40% complied with the E-FOIA Act of 1996; just 67 of them had online libraries that were regularly updated with a substantial number of documents released under FOIA (5).

In the face of growing concerns about compliance, FOIA reform was one of the few recent instances of bicameral bipartisanship in Congress, with both the House and Senate each passing bills this spring with broad support. Now that Congress moved to send the Senate bill on to the president to sign into law, implementation of specific provisions will bear close scrutiny, including the potential impact of disclosure upon scientists who work in or with government agencies (6). Proposed revisions to the FOIA statute would improve how government discloses information to the public, while leaving intact exemptions for privacy, proprietary information, deliberative documents, and national security.

Features of Reforms

One of the major reforms in the House and Senate bills was to codify the “presumption of openness” outlined by President Obama the day after he took office in January 2009 when he declared that FOIA should be administered with a clear presumption: In the face of doubt, “openness” would prevail. This presumption of openness was affirmed by U.S. Attorney General Holder in March 2009. Although these declarations have had limited effect in the agencies (as described above), codifying these reforms into law is crucial not only to ensure that this remains executive branch policy after this president leaves office but also to provide requesters with legal force beyond an executive order….(More)”

Intermediation in Open Development


Katherine M. A. Reilly and Juan P. Alperin at Global Media Journal: “Open Development (OD) is a subset of ICT4D that studies the potential of ITenabled openness to support social change among poor or marginalized populations. Early OD work examined the potential of IT-enabled openness to decentralize power and enable public engagement by disintermediating knowledge production and dissemination. However, in practice, intermediaries have emerged to facilitate open data and related knowledge production activities in development processes. We identify five models of intermediation in OD work: decentralized, arterial, ecosystem, bridging, and communities of practice and examine the implications of each for stewardship of open processes. We conclude that studying OD through these five forms of intermediation is a productive way of understanding whether and how different patterns of knowledge stewardship influence development outcomes. We also offer suggestions for future research that can improve our understanding of how to sustain openness, facilitate public engagement, and ensure that intermediation contributes to open development….(More)”

Open data for transit app developers


Springwise: “Creating good transit apps can be difficult, given the vast amount of city (and worldwide) data app builders need to have access to. Aiming to address this, Transitland is an open platform that aggregates publicly available transport information from around the world.

The startup cleans the data sets, making them easy-to-use, and adds them to Mapzen, an open source mapping platform. Mapzen Turn-by-Turn is the platform’s transport planning service that, following its latest expansion, now contains data from more than 200 regions around the world on every continent except Antarctica. Transitland encourages anyone interested in transport, data and mapping to get involved, from adding data streams to sharing new apps and analyses. Mapzen Turn-by-Turn also manages all licensing related to use of the data, leaving developers free to discover and build. The platform is available to use for free.

We have seen a platform enable data sharing to help local communities and governments work better together, as well as a startup that visualizes government data so that it is easy-to-use for entrepreneurs….(More)”