OSoMe: The IUNI observatory on social media


Clayton A Davis et al at Peer J. PrePrint:  “The study of social phenomena is becoming increasingly reliant on big data from online social networks. Broad access to social media data, however, requires software development skills that not all researchers possess. Here we present the IUNI Observatory on Social Media, an open analytics platform designed to facilitate computational social science. The system leverages a historical, ongoing collection of over 70 billion public messages from Twitter. We illustrate a number of interactive open-source tools to retrieve, visualize, and analyze derived data from this collection. The Observatory, now available at osome.iuni.iu.edu, is the result of a large, six-year collaborative effort coordinated by the Indiana University Network Science Institute.”…(More)”

Design principles for engaging and retaining virtual citizen scientists


Dara M. WaldJustin Longo and A. R. Dobell at Conservation Biology: “Citizen science initiatives encourage volunteer participants to collect and interpret data and contribute to formal scientific projects. The growth of virtual citizen science (VCS), facilitated through websites and mobile applications since the mid-2000s, has been driven by a combination of software innovations and mobile technologies, growing scientific data flows without commensurate increases in resources to handle them, and the desire of internet-connected participants to contribute to collective outputs. However, the increasing availability of internet-based activities requires individual VCS projects to compete for the attention of volunteers and promote their long-term retention. We examined program and platform design principles that might allow VCS initiatives to compete more effectively for volunteers, increase productivity of project participants, and retain contributors over time. We surveyed key personnel engaged in managing a sample of VCS projects to identify the principles and practices they pursued for these purposes and led a team in a heuristic evaluation of volunteer engagement, website or application usability, and participant retention. We received 40 completed survey responses (33% response rate) and completed a heuristic evaluation of 20 VCS program sites. The majority of the VCS programs focused on scientific outcomes, whereas the educational and social benefits of program participation, variables that are consistently ranked as important for volunteer engagement and retention, were incidental. Evaluators indicated usability, across most of the VCS program sites, was higher and less variable than the ratings for participant engagement and retention. In the context of growing competition for the attention of internet volunteers, increased attention to the motivations of virtual citizen scientists may help VCS programs sustain the necessary engagement and retention of their volunteers….(More)”

Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights


Latest White House report on Big Data charts pathways for fairness and opportunity but also cautions against re-encoding bias and discrimination into algorithmic systems: ” Advertisements tailored to reflect previous purchasing decisions; targeted job postings based on your degree and social networks; reams of data informing predictions around college admissions and financial aid. Need a loan? There’s an app for that.

As technology advances and our economic, social, and civic lives become increasingly digital, we are faced with ethical questions of great consequence. Big data and associated technologies create enormous new opportunities to revisit assumptions and instead make data-driven decisions. Properly harnessed, big data can be a tool for overcoming longstanding bias and rooting out discrimination.

The era of big data is also full of risk. The algorithmic systems that turn data into information are not infallible—they rely on the imperfect inputs, logic, probability, and people who design them. Predictors of success can become barriers to entry; careful marketing can be rooted in stereotype. Without deliberate care, these innovations can easily hardwire discrimination, reinforce bias, and mask opportunity.

Because technological innovation presents both great opportunity and great risk, the White House has released several reports on “big data” intended to prompt conversation and advance these important issues. The topics of previous reports on data analytics included privacy, prices in the marketplace, and consumer protection laws. Today, we are announcing the latest report on big data, one centered on algorithmic systems, opportunity, and civil rights.

The first big data report warned of “the potential of encoding discrimination in automated decisions”—that is, discrimination may “be the inadvertent outcome of the way big data technologies are structured and used.” A commitment to understanding these risks and harnessing technology for good prompted us to specifically examine the intersection between big data and civil rights.

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The purpose of the report is not to offer remedies to the issues it raises, but rather to identify these issues and prompt conversation, research—and action—among technologists, academics, policy makers, and citizens, alike.

The report includes a number of recommendations for advancing work in this nascent field of data and ethics. These include investing in research, broadening and diversifying technical leadership, cross-training, and expanded literacy on data discrimination, bolstering accountability, and creating standards for use within both the government and the private sector. It also calls on computer and data science programs and professionals to promote fairness and opportunity as part of an overall commitment to the responsible and ethical use of data.

Big data is here to stay; the question is how it will be used: to advance civil rights and opportunity, or to undermine them….(More)”

Citizen scientists aid Ecuador earthquake relief


Mark Zastrow at Nature: “After a magnitude-7.8 earthquake struck Ecuador’s Pacific coast on 16 April, a new ally joined the international relief effort: a citizen-science network called Zooniverse.

On 25 April, Zooniverse launched a website that asks volunteers to analyse rapidly-snapped satellite imagery of the disaster, which led to more than 650 reported deaths and 16,000 injuries. The aim is to help relief workers on the ground to find the most heavily damaged regions and identify which roads are passable.

Several crisis-mapping programmes with thousands of volunteers already exist — but it can take days to train satellites on the damaged region and to transmit data to humanitarian organizations, and results have not always proven useful. The Ecuador quake marked the first live public test for an effort dubbed the Planetary Response Network (PRN), which promises to be both more nimble than previous efforts, and to use more rigorous machine-learning algorithms to evaluate the quality of crowd-sourced analyses.

The network relies on imagery from the satellite company Planet Labs in San Francisco, California, which uses an array of shoebox-sized satellites to map the planet. In order to speed up the crowd-sourced process, it uses the Zooniverse platform to distribute the tasks of spotting features in satellite images. Machine-learning algorithms employed by a team at the University of Oxford, UK, then classify the reliability of each volunteer’s analysis and weight their contributions accordingly.

Rapid-fire data

Within 2 hours of the Ecuador test project going live with a first set of 1,300 images, each photo had been checked at least 20 times. “It was one of the fastest responses I’ve seen,” says Brooke Simmons, an astronomer at the University of California, San Diego, who leads the image processing. Steven Reece, who heads the Oxford team’s machine-learning effort, says that results — a “heat map” of damage with possible road blockages — were ready in another two hours.

In all, more than 2,800 Zooniverse users contributed to analysing roughly 25,000 square kilometres of imagery centred around the coastal cities of Pedernales and Bahia de Caraquez. That is where the London-based relief organization Rescue Global — which requested the analysis the day after the earthquake — currently has relief teams on the ground, including search dogs and medical units….(More)”

NEW Platform for Sharing Research on Opening Governance: The Open Governance Research Exchange (OGRX)


Andrew Young: “Today,  The GovLab, in collaboration with founding partners mySociety and the World Bank’s Digital Engagement Evaluation Team are launching the Open Governance Research Exchange (OGRX), a new platform for sharing research and findings on innovations in governance.

From crowdsourcing to nudges to open data to participatory budgeting, more open and innovative ways to tackle society’s problems and make public institutions more effective are emerging. Yet little is known about what innovations actually work, when, why, for whom and under what conditions.

And anyone seeking existing research is confronted with sources that are widely dispersed across disciplines, often locked behind pay walls, and hard to search because of the absence of established taxonomies. As the demand to confront problems in new ways grows so too does the urgency for making learning about governance innovations more accessible.

As part of GovLab’s broader effort to move from “faith-based interventions” toward more “evidence-based interventions,” OGRX curates and makes accessible the most diverse and up-to-date collection of findings on innovating governance. At launch, the site features over 350 publications spanning a diversity of governance innovation areas, including but not limited to:

Visit ogrx.org to explore the latest research findings, submit your own work for inclusion on the platform, and share knowledge with others interested in using science and technology to improve the way we govern. (More)”

How Big Data Creates False Confidence


Jesse Dunietz at Nautilus: “…A feverish push for “big data” analysis has swept through biology, linguistics, finance, and every field in between. Although no one can quite agree how to define it, the general idea is to find datasets so enormous that they can reveal patterns invisible to conventional inquiry. The data are often generated by millions of real-world user actions, such as tweets or credit-card purchases, and they can take thousands of computers to collect, store, and analyze. To many companies and researchers, though, the investment is worth it because the patterns can unlock information about anything from genetic disorders to tomorrow’s stock prices.

But there’s a problem: It’s tempting to think that with such an incredible volume of data behind them, studies relying on big data couldn’t be wrong. But the bigness of the data can imbue the results with a false sense of certainty. Many of them are probably bogus—and the reasons why should give us pause about any research that blindly trusts big data.

In the case of language and culture, big data showed up in a big way in 2011, when Google released itsNgrams tool. Announced with fanfare in the journal Science, Google Ngrams allowed users to search for short phrases in Google’s database of scanned books—about 4 percent of all books ever published!—and see how the frequency of those phrases has shifted over time. The paper’s authors heralded the advent of “culturomics,” the study of culture based on reams of data and, since then, Google Ngrams has been, well, largely an endless source of entertainment—but also a goldmine for linguists, psychologists, and sociologists. They’ve scoured its millions of books to show that, for instance, yes, Americans are becoming more individualistic; that we’re “forgetting our past faster with each passing year”; and that moral ideals are disappearing from our cultural consciousness.

WE’RE LOSING HOPE: An Ngrams chart for the word “hope,” one of many intriguing plots found by xkcd author Randall Munroe. If Ngrams really does reflect our culture, we may be headed for a dark place.

The problems start with the way the Ngrams corpus was constructed. In a study published last October, three University of Vermont researchers pointed out that, in general, Google Books includes one copy of every book. This makes perfect sense for its original purpose: to expose the contents of those books to Google’s powerful search technology. From the angle of sociological research, though, it makes the corpus dangerously skewed….

Even once you get past the data sources, there’s still the thorny issue of interpretation. Sure, words like “character” and “dignity” might decline over the decades. But does that mean that people care about morality less? Not so fast, cautions Ted Underwood, an English professor at the University of Illinois, Urbana-Champaign. Conceptions of morality at the turn of the last century likely differed sharply from ours, he argues, and “dignity” might have been popular for non-moral reasons. So any conclusions we draw by projecting current associations backward are suspect.

Of course, none of this is news to statisticians and linguists. Data and interpretation are their bread and butter. What’s different about Google Ngrams, though, is the temptation to let the sheer volume of data blind us to the ways we can be misled.

This temptation isn’t unique to Ngrams studies; similar errors undermine all sorts of big data projects. Consider, for instance, the case of Google Flu Trends (GFT). Released in 2008, GFT would count words like “fever” and “cough” in millions of Google search queries, using them to “nowcast” how many people had the flu. With those estimates, public health officials could act two weeks before the Centers for Disease Control could calculate the true numbers from doctors’ reports.

When big data isn’t seen as a panacea, it can be transformative.

Initially, GFT was claimed to be 97 percent accurate. But as a study out of Northeastern University documents, that accuracy was a fluke. First, GFT completely missed the “swine flu” pandemic in the spring and summer of 2009. (It turned out that GFT was largely predicting winter.) Then, the system began to overestimate flu cases. In fact, it overshot the peak 2013 numbers by a whopping 140 percent. Eventually, Google just retired the program altogether.

So what went wrong? As with Ngrams, people didn’t carefully consider the sources and interpretation of their data. The data source, Google searches, was not a static beast. When Google started auto-completing queries, users started just accepting the suggested keywords, distorting the searches GFT saw. On the interpretation side, GFT’s engineers initially let GFT take the data at face value; almost any search term was treated as a potential flu indicator. With millions of search terms, GFT was practically guaranteed to over-interpret seasonal words like “snow” as evidence of flu.

But when big data isn’t seen as a panacea, it can be transformative. Several groups, like Columbia University researcher Jeffrey Shaman’s, for example, have outperformed the flu predictions of both the CDC and GFT by using the former to compensate for the skew of the latter. “Shaman’s team tested their model against actual flu activity that had already occurred during the season,” according to the CDC. By taking the immediate past into consideration, Shaman and his team fine-tuned their mathematical model to better predict the future. All it takes is for teams to critically assess their assumptions about their data….(More)

Science to the People


David Lang on how citizen science bridges the gap between science and society: “It’s hard to find a silver lining in the water crisis in Flint, Michigan. The striking images of jugs of brown water being held high in protest are a symbol of institutional failure on a grand scale. It’s a disaster. But even as questions of accountability and remedy remain unanswered, there is already one lesson we can take away: Citizen science can be used as a powerful tool to build (or rebuild) the public’s trust in science.

Because the other striking image from Flint is this: Citizen-scientists  sampling and testing their own water, from their homes and neighborhoods,and reporting the results as scientific data. Dr. Marc Edwards is the VirginiaTech civil engineering professor who led the investigation into the lead levels in Flint’s water supply, and in a February 2016 interview with TheChronicle of Higher Education, he gave an important answer about the methods his team used to obtain the data: “Normal people really appreciate good science that’s done in their interest. They stepped forward as citizen-scientists to explore what was happening to them and to their community,we provided some funding and the technical and analytical expertise, and they did all the work. I think that work speaks for itself.”

It’s a subtle but important message: The community is rising up and rallying by using science, not by reacting to it. Other scientists trying to highlight important issues and influence public opinion would do well to take note, because there’s a disconnect between what science reports and what the general public chooses to believe. For instance, 97 percent of scientists agree that the world’s climate is warming, likely due to human activities. Yet only 70 percent of Americans believe that global warming is real. Many of the most important issues of our time have the same, growing gap between scientific and societal consensus: genetically modified foods, evolution,vaccines are often widely distrusted or disputed despite strong, positive scientific evidence…..

The good news is that we’re learning. Citizen science — the growing trend of involving non-professional scientists in the process of discovery — is proving to be a supremely effective tool. It now includes far more than birders and backyard astronomers, its first amateur champions. Over the past few years,the discipline has been gaining traction and popularity in academic circles too. Involving groups of amateur volunteers is now a proven strategy for collecting data over large geographic areas or over long periods of time.Online platforms like Zooniverse have shown that even an untrained human eye can spot anomalies in everything from wildebeest migrations to Martiansurfaces. For certain types of research, citizen science just works.

While a long list of peer-reviewed papers now backs up the efficacy of citizen science, and a series of papers has shown its positive impact on students’ view of science, we’re just beginning to understand the impact of that participation on the wider perception of science. Truthfully, for now,most of what we know so far about its public impact is anecdotal, as in the work in Flint, or even on our online platform for explorers, OpenExplorer….It makes sense that citizen science should affect public perception of science.The difference between “here are the results of a study” and “please help

It makes sense that citizen science should affect public perception of science.The difference between “here are the results of a study” and “please help us in the process of discovery” is profound. It’s the difference between a rote learning moment and an immersive experience. And even if not everyone is getting involved, the fact that this is possible and that some members of a community are engaging makes science instantly more relatable. It creates what Tim O’Reilly calls an “architecture of participation.” Citizen scientists create the best interface for convincing the rest of the populace.

A recent article in Nature argued that the DIY biology community was, in fact, ahead of the scientific establishment in terms of proactively thinking about the safety and ethics of rapidly advancing biotechnology tools. They had to be. For those people opening up community labs so that anyone can come and participate, public health issues can’t be pushed aside or dealt with later. After all, they are the public that will be affected….(More)”

Juries as Problem Solving Institutions


Series of interviews on Collective Problem Solving by Henry FarrellOver the last two years, a group of scholars from disciplines including political science, political theory, cognitive psychology, information science, statistics and computer science have met under the auspices of the MacArthur Foundation Research Network on Opening Governance. The goal of these meetings has been to bring the insights of different disciplines to bear on fundamental problems of collective problem solving. How do we best solve collective problems? How should we study and think about collective intelligence? How can we apply insights to real world problems? A wide body of work leads us to believe that complex problems are most likely to be solved when people with different viewpoints and sets of skills come together. This means that we can expect that the science of collective problem solving too will be improved when people from diverse disciplinary perspectives work together to generate new insights on shared problems.

Political theorists are beginning to think in different ways about institutions such as juries. Here, the crucial insights will involve how these institutions can address the traditional concerns of political theory, such as justice and recognition, while also solving the complex problem of figuring out how best to resolve disputes, and establishing the guilt or innocence of parties in criminal cases.

Melissa Schwartzberg is an associate professor of political science at New York University, working on the political theory of democratic decision making. I asked her a series of questions about the jury as a problem-solving institution.

Henry: Are there any general ways for figuring out the kinds of issues that juries (based on random selection of citizens and some voting rule) are good at deciding on, and the issues that they might have problems with?

Melissa: This is a difficult question, in part because we don’t have unmediated access to the “true state of the world”: our evidence about jury competence essentially derives from the correlation of jury verdicts with what the judge would have rendered, but obviously that doesn’t mean that the judge was correct. One way around the question is to ask instead what, historically, have been the reasons why we would wish to assign judgment to laypersons: what the “jury of one’s peers” signifies. Placing a body of ordinary citizens between the state and the accused serves an important protective device, so the use of the jury is quite clearly not all about judgment. But there is a long history of thinking that juries have special access to local knowledge – the established norms, practices, and expectations of a community, but in early periods knowledge of the parties and the alleged crime – that helps to shed light on why we still think “vicinage” is important…..(More)”

Accountable Algorithms


Paper by Joshua A. Kroll et al: “Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.

The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.

We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.

The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.

The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.

Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability….(More)”

A New Dark Age Looms


William B. Gail in the New York Times: “Imagine a future in which humanity’s accumulated wisdom about Earth — our vast experience with weather trends, fish spawning and migration patterns, plant pollination and much more — turns increasingly obsolete. As each decade passes, knowledge of Earth’s past becomes progressively less effective as a guide to the future. Civilization enters a dark age in its practical understanding of our planet.

To comprehend how this could occur, picture yourself in our grandchildren’s time, a century hence. Significant global warming has occurred, as scientists predicted. Nature’s longstanding, repeatable patterns — relied on for millenniums by humanity to plan everything from infrastructure to agriculture — are no longer so reliable. Cycles that have been largely unwavering during modern human history are disrupted by substantial changes in temperature and precipitation….

Our foundation of Earth knowledge, largely derived from historically observed patterns, has been central to society’s progress. Early cultures kept track of nature’s ebb and flow, passing improved knowledge about hunting and agriculture to each new generation. Science has accelerated this learning process through advanced observation methods and pattern discovery techniques. These allow us to anticipate the future with a consistency unimaginable to our ancestors.

But as Earth warms, our historical understanding will turn obsolete faster than we can replace it with new knowledge. Some patterns will change significantly; others will be largely unaffected, though it will be difficult to say what will change, by how much, and when.

The list of possible disruptions is long and alarming. We could see changes to the prevalence of crop and human pests, like locust plagues set off by drought conditions; forest fire frequency; the dynamics of the predator-prey food chain; the identification and productivity of reliably arable land, and the predictability of agriculture output.

Historians of the next century will grasp the importance of this decline in our ability to predict the future. They may mark the coming decades of this century as the period during which humanity, despite rapid technological and scientific advances, achieved “peak knowledge” about the planet it occupies. They will note that many decades may pass before society again attains the same level.

One exception to this pattern-based knowledge is the weather, whose underlying physics governs how the atmosphere moves and adjusts. Because we understand the physics, we can replicate the atmosphere with computer models. Monitoring by weather stations and satellites provides the starting point for the models, which compute a forecast for how the weather will evolve. Today, forecast accuracy based on such models is generally good out to a week, sometimes even two.

But farmers need to think a season or more ahead. So do infrastructure planners as they design new energy and water systems. It may be feasible to develop the science and make the observations necessary to forecast weather a month or even a season in advance. We are also coming to understand enough of the physics to make useful global and regional climate projections a decade or more ahead.

The intermediate time period is our big challenge. Without substantial scientific breakthroughs, we will remain reliant on pattern-based methods for time periods between a month and a decade. … Our best knowledge is built on what we have seen in the past, like how fish populations respond to El Niño’s cycle. Climate change will further undermine our already limited ability to make these predictions. Anticipating ocean resources from one year to the next will become harder.

Civilization’s understanding of Earth has expanded enormously in recent decades, making humanity safer and more prosperous. As the patterns that we have come to expect are disrupted by warming temperatures, we will face huge challenges feeding a growing population and prospering within our planet’s finite resources. New developments in science offer our best hope for keeping up, but this is by no means guaranteed….(More)”