I want your (anonymized) social media data


Anthony Sanford at The Conversation: “Social media sites’ responses to the Facebook-Cambridge Analytica scandal and new European privacy regulations have given users much more control over who can access their data, and for what purposes. To me, as a social media user, these are positive developments: It’s scary to think what these platforms could do with the troves of data available about me. But as a researcher, increased restrictions on data sharing worry me.

I am among the many scholars who depend on data from social media to gain insights into people’s actions. In a rush to protect individuals’ privacy, I worry that an unintended casualty could be knowledge about human nature. My most recent work, for example, analyzes feelings people express on Twitter to explain why the stock market fluctuates so much over the course of a single day. There are applications well beyond finance. Other scholars have studied mass transit rider satisfactionemergency alert systems’ function during natural disasters and how online interactions influence people’s desire to lead healthy lifestyles.

This poses a dilemma – not just for me personally, but for society as a whole. Most people don’t want social media platforms to share or sell their personal information, unless specifically authorized by the individual user. But as members of a collective society, it’s useful to understand the social forces at work influencing everyday life and long-term trends. Before the recent crises, Facebook and other companies had already been making it hard for legitimate researchers to use their data, including by making it more difficult and more expensive to download and access data for analysis. The renewed public pressure for privacy means it’s likely to get even tougher….

It’s true – and concerning – that some presumably unethical people have tried to use social media data for their own benefit. But the data are not the actual problem, and cutting researchers’ access to data is not the solution. Doing so would also deprive society of the benefits of social media analysis.

Fortunately, there is a way to resolve this dilemma. Anonymization of data can keep people’s individual privacy intact, while giving researchers access to collective data that can yield important insights.

There’s even a strong model for how to strike that balance efficiently: the U.S. Census Bureau. For decades, that government agency has collected extremely personal data from households all across the country: ages, employment status, income levels, Social Security numbers and political affiliations. The results it publishes are very rich, but also not traceable to any individual.

It often is technically possible to reverse anonymity protections on data, using multiple pieces of anonymized information to identify the person they all relate to. The Census Bureau takes steps to prevent this.

For instance, when members of the public access census data, the Census Bureau restricts information that is likely to identify specific individuals, such as reporting there is just one person in a community with a particularly high- or low-income level.

For researchers the process is somewhat different, but provides significant protections both in law and in practice. Scholars have to pass the Census Bureau’s vetting process to make sure they are legitimate, and must undergo training about what they can and cannot do with the data. The penalties for violating the rules include not only being barred from using census data in the future, but also civil fines and even criminal prosecution.

Even then, what researchers get comes without a name or Social Security number. Instead, the Census Bureau uses what it calls “protected identification keys,” a random number that replaces data that would allow researchers to identify individuals.

Each person’s data is labeled with his or her own identification key, allowing researchers to link information of different types. For instance, a researcher wanting to track how long it takes people to complete a college degree could follow individuals’ education levels over time, thanks to the identification keys.

Social media platforms could implement a similar anonymization process instead of increasing hurdles – and cost – to access their data…(More)” .

Six or Seven Things Social Media Can Do For Democracy


Ethan Zuckerman: “I am concerned that we’ve not had a robust conversation about what we want social media to do for us.

We know what social media does for platform companies like Facebook and Twitter: it generates enormous masses of user-generated content that can be monetized with advertising, and reams of behavioral data that make that advertising more valuable. Perhaps we have a sense for what social media does for us as individuals, connecting us to distant friends, helping us maintain a lightweight awareness of each other’s lives even when we are not co-present. Or perhaps it’s a machine for disappointment and envy, a window into lives better lived than our own. It’s likely that what social media does for us personally is a deeply idiosyncratic question, dependent on our own lives, psyches and decisions, better discussed with our therapists than spoken about in generalities.

I’m interested in what social media should do for us as citizens in a democracy. We talk about social media as a digital public sphere, invoking Habermas and coffeehouses frequented by the bourgeoisie. Before we ask whether the internet succeeds as a public sphere, we ought to ask whether that’s actually what we want it to be.

I take my lead here from journalism scholar Michael Schudson, who took issue with a hyperbolic statement made by media critic James Carey: “journalism as a practice is unthinkable except in the context of democracy; in fact, journalism is usefully understood as another name for democracy.” For Schudson, this was a step too far. Journalism may be necessary for democracy to function well, but journalism by itself is not democracy and cannot produce democracy. Instead, we should work to understand the “Six or Seven Things News Can Do for Democracy”, the title of an incisive essay Schudson wrote to anchor his book, Why Democracies Need an Unloveable Press….

In this same spirit, I’d like to suggest six or seven things social media can do for democracy. I am neither as learned or as wise as Schudson, so I fully expect readers to offer half a dozen functions that I’ve missed. In the spirit of Schudson’s public forum and Benkler’s digital public sphere, I offer these in the hopes of starting, not ending, a conversation.

Social media can inform us…

Social media can amplify important voices and issues…

Social media can be a tool for connection and solidarity…

Social media can be a space for mobilization…

Social media can be a space for deliberation and debate…

Social media can be a tool for showing us a diversity of views and perspectives…

Social media can be a model for democratically governed spaces…(More).

Technology and satellite companies open up a world of data


Gabriel Popkin at Nature: “In the past few years, technology and satellite companies’ offerings to scientists have increased dramatically. Thousands of researchers now use high-resolution data from commercial satellites for their work. Thousands more use cloud-computing resources provided by big Internet companies to crunch data sets that would overwhelm most university computing clusters. Researchers use the new capabilities to track and visualize forest and coral-reef loss; monitor farm crops to boost yields; and predict glacier melt and disease outbreaks. Often, they are analysing much larger areas than has ever been possible — sometimes even encompassing the entire globe. Such studies are landing in leading journals and grabbing media attention.

Commercial data and cloud computing are not panaceas for all research questions. NASA and the European Space Agency carefully calibrate the spectral quality of their imagers and test them with particular types of scientific analysis in mind, whereas the aim of many commercial satellites is to take good-quality, high-resolution pictures for governments and private customers. And no company can compete with Landsat’s free, publicly available, 46-year archive of images of Earth’s surface. For commercial data, scientists must often request images of specific regions taken at specific times, and agree not to publish raw data. Some companies reserve cloud-computing assets for researchers with aligned interests such as artificial intelligence or geospatial-data analysis. And although companies publicly make some funding and other resources available for scientists, getting access to commercial data and resources often requires personal connections. Still, by choosing the right data sources and partners, scientists can explore new approaches to research problems.

Mapping poverty

Joshua Blumenstock, an information scientist at the University of California, Berkeley (UCB), is always on the hunt for data he can use to map wealth and poverty, especially in countries that do not conduct regular censuses. “If you’re trying to design policy or do anything to improve living conditions, you generally need data to figure out where to go, to figure out who to help, even to figure out if the things you’re doing are making a difference.”

In a 2015 study, he used records from mobile-phone companies to map Rwanda’s wealth distribution (J. Blumenstock et al. Science 350, 1073–1076; 2015). But to track wealth distribution worldwide, patching together data-sharing agreements with hundreds of these companies would have been impractical. Another potential information source — high-resolution commercial satellite imagery — could have cost him upwards of US$10,000 for data from just one country….

Use of commercial images can also be restricted. Scientists are free to share or publish most government data or data they have collected themselves. But they are typically limited to publishing only the results of studies of commercial data, and at most a limited number of illustrative images.

Many researchers are moving towards a hybrid approach, combining public and commercial data, and running analyses locally or in the cloud, depending on need. Weiss still uses his tried-and-tested ArcGIS software from Esri for studies of small regions, and jumps to Earth Engine for global analyses.

The new offerings herald a shift from an era when scientists had to spend much of their time gathering and preparing data to one in which they’re thinking about how to use them. “Data isn’t an issue any more,” says Roy. “The next generation is going to be about what kinds of questions are we going to be able to ask?”…(More)”.

A Rule of Persons, Not Machines: The Limits of Legal Automation


Paper by Frank A. Pasquale: “For many legal futurists, attorneys’ work is a prime target for automation. They view the legal practice of most businesses as algorithmic: data (such as facts) are transformed into outputs (agreements or litigation stances) via application of set rules. These technophiles promote substituting computer code for contracts and descriptions of facts now written by humans. They point to early successes in legal automation as proof of concept. TurboTax has helped millions of Americans file taxes, and algorithms have taken over certain aspects of stock trading. Corporate efforts to “formalize legal code” may bring new efficiencies in areas of practice characterized by both legal and factual clarity.

However, legal automation can also elide or exclude important human values, necessary improvisations, and irreducibly deliberative governance. Due process, appeals, and narratively intelligible explanation from persons, for persons, depend on forms of communication that are not reducible to software. Language is constitutive of these aspects of law. To preserve accountability and a humane legal order, these reasons must be expressed in language by a responsible person. This basic requirement for legitimacy limits legal automation in several contexts, including corporate compliance, property recordation, and contracting. A robust and ethical legal profession respects the flexibility and subtlety of legal language as a prerequisite for a just and accountable social order. It ensures a rule of persons, not machines…(More)”

Algorithm Observatory: Where anyone can study any social computing algorithm.


About: “We know that social computing algorithms are used to categorize us, but the way they do so is not always transparent. To take just one example, ProPublica recently uncovered that Facebook allows housing advertisers to exclude users by race.

Even so, there are no simple and accessible resources for us, the public, to study algorithms empirically, and to engage critically with the technologies that are shaping our daily lives in such profound ways.

That is why we created Algorithm Observatory.

Part media literacy project and part citizen experiment, the goal of Algorithm Observatory is to provide a collaborative online lab for the study of social computing algorithms. The data collected through this site is analyzed to compare how a particular algorithm handles data differently depending on the characteristics of users.

Algorithm Observatory is a work in progress. This prototype only allows users to explore Facebook advertising algorithms, and the functionality is limited. We are currently looking for funding to realize the project’s full potential: to allow anyone to study any social computing algorithm….

Our future plans

This is a prototype, which only begins to showcase the things that Algorithm Observatory will be able to do in the future.

Eventually, the website will allow anyone to design an experiment involving a social computing algorithm. The platform will allow researchers to recruit volunteer participants, who will be able to contribute content to the site securely and anonymously. Researchers will then be able to conduct an analysis to compare how the algorithm handles users differently depending on individual characteristics. The results will be shared by publishing a report evaluating the social impact of the algorithm. All data and reports will become publicly available and open for comments and reviews. Researchers will be able to study any algorithm, because the site does not require direct access to the source code, but relies instead on empirical observation of the interaction between the algorithm and volunteer participants….(More)”.

How Citizens Can Hack EU Democracy


Stephen Boucher at Carnegie Europe: “…To connect citizens with the EU’s decisionmaking center, European politicians will need to provide ways to effectively hack this complex system. These democratic hacks need to be visible and accessible, easily and immediately implementable, viable without requiring changes to existing European treaties, and capable of having a traceable impact on policy. Many such devices could be imagined around these principles. Here are three ideas to spur debate.

Hack 1: A Citizens’ Committee for the Future in the European Parliament

The European Parliament has proposed that twenty-seven of the seventy-three seats left vacant by Brexit should be redistributed among the remaining member states. According to one concept, the other forty-six unassigned seats could be used to recruit a contingent of ordinary citizens from around the EU to examine legislation from the long-term perspective of future generations. Such a “Committee for the Future” could be given the power to draft a response to a yearly report on the future produced by the president of the European Parliament, initiate debates on important political themes of their own choosing, make submissions on future-related issues to other committees, and be consulted by members of the European Parliament (MEPs) on longer-term matters.

MEPs could decide to use these forty-six vacant seats to invite this Committee for the Future to sit, at least on a trial basis, with yearly evaluations. This arrangement would have real benefits for EU politics, acting as an antidote to the union’s existential angst and helping the EU think systemically and for the longer term on matters such as artificial intelligence, biodiversity, climate concerns, demography, mobility, and energy.

Hack 2: An EU Participatory Budget

In 1989, the city of Porto Alegre, Brazil, decided to cede control of a share of its annual budget for citizens to decide upon. This practice, known as participatory budgets, has since spread globally. As of 2015, over 1,500 instances of participatory budgets have been implemented across five continents. These processes generally have had a positive impact, with people proving that they take public spending matters seriously.

To replicate these experiences at the European level, the complex realities of EU budgeting would require specific features. First, participative spending probably would need to be both local and related to wider EU priorities in order to ensure that citizens see its relevance and its wider European implications. Second, significant resources would need to be allocated to help citizens come up with and promote projects. For instance, the city of Paris has ensured that each suggested project that meets the eligibility requirements has a desk officer within its administration to liaise with the idea’s promoters. It dedicates significant resources to reach out to citizens, in particular in the poorer neighborhoods of Paris, both online and face-to-face. Similar efforts would need to be deployed across Europe. And third, in order to overcome institutional complexities, the European Parliament would need to work with citizens as part of its role in negotiating the budget with the European Council.

Hack 3: An EU Collective Intelligence Forum

Many ideas have been put forward to address popular dissatisfaction with representative democracy by developing new forums such as policy labs, consensus conferences, and stakeholder facilitation groups. Yet many citizens still feel disenchanted with representative democracy, including at the EU level, where they also strongly distrust lobby groups. They need to be involved more purposefully in policy discussions.

A yearly Deliberative Poll could be run on a matter of significance, ahead of key EU summits and possibly around the president of the commission’s State of the Union address. On the model of the first EU-wide Deliberative Poll, Tomorrow’s Europe, this event would bring together in Brussels a random sample of citizens from all twenty-seven EU member states, and enable them to discuss various social, economic, and foreign policy issues affecting the EU and its member states. This concept would have a number of advantages in terms of promoting democratic participation in EU affairs. By inviting a truly representative sample of citizens to deliberate on complex EU matters over a weekend, within the premises of the European Parliament, the European Parliament would be the focus of a high-profile event that would draw media attention. This would be especially beneficial if—unlike Tomorrow’s Europe—the poll was not held at arm’s length by EU policymakers, but with high-level national officials attending to witness good-quality deliberation remolding citizens’ views….(More)”.

Data Governance in the Digital Age


Centre for International Governance Innovation: “Data is being hailed as “the new oil.” The analogy seems appropriate given the growing amount of data being collected, and the advances made in its gathering, storage, manipulation and use for commercial, social and political purposes.

Big data and its application in artificial intelligence, for example, promises to transform the way we live and work — and will generate considerable wealth in the process. But data’s transformative nature also raises important questions around how the benefits are shared, privacy, public security, openness and democracy, and the institutions that will govern the data revolution.

The delicate interplay between these considerations means that they have to be treated jointly, and at every level of the governance process, from local communities to the international arena. This series of essays by leading scholars and practitioners, which is also published as a special report, will explore topics including the rationale for a data strategy, the role of a data strategy for Canadian industries, and policy considerations for domestic and international data governance…

RATIONALE OF A DATA STRATEGY

THE ROLE OF A DATA STRATEGY FOR CANADIAN INDUSTRIES

BALANCING PRIVACY AND COMMERCIAL VALUES

DOMESTIC POLICY FOR DATA GOVERNANCE

INTERNATIONAL POLICY CONSIDERATIONS

EPILOGUE

Ten Reasons Not to Measure Impact—and What to Do Instead


Essay by Mary Kay Gugerty & Dean Karlan in the Stanford Social Innovation Review: “Good impact evaluations—those that answer policy-relevant questions with rigor—have improved development knowledge, policy, and practice. For example, the NGO Living Goods conducted a rigorous evaluation to measure the impact of its community health model based on door-to-door sales and promotions. The evidence of impact was strong: Their model generated a 27-percent reduction in child mortality. This evidence subsequently persuaded policy makers, replication partners, and major funders to support the rapid expansion of Living Goods’ reach to five million people. Meanwhile, rigorous evidence continues to further validate the model and help to make it work even better.

Of course, not all rigorous research offers such quick and rosy results. Consider the many studies required to discover a successful drug and the lengthy process of seeking regulatory approval and adoption by the healthcare system. The same holds true for fighting poverty: Innovations for Poverty Action (IPA), a research and policy nonprofit that promotes impact evaluations for finding solutions to global poverty, has conducted more than 650 randomized controlled trials (RCTs) since its inception in 2002. These studies have sometimes provided evidence about how best to use scarce resources (e.g., give away bed nets for free to fight malaria), as well as how to avoid wasting them (e.g., don’t expand traditional microcredit). But the vast majority of studies did not paint a clear picture that led to immediate policy changes. Developing an evidence base is more like building a mosaic: Each individual piece does not make the picture, but bit by bit a picture becomes clearer and clearer.

How do these investments in evidence pay off? IPA estimated the benefits of its research by looking at its return on investment—the ratio of the benefit from the scale-up of the demonstrated large-scale successes divided by the total costs since IPA’s founding. The ratio was 74x—a huge result. But this is far from a precise measure of impact, since IPA cannot establish what would have happened had IPA never existed. (Yes, IPA recognizes the irony of advocating for RCTs while being unable to subject its own operations to that standard. Yet IPA’s approach is intellectually consistent: Many questions and circumstances do not call for RCTs.)

Even so, a simple thought exercise helps to demonstrate the potential payoff. IPA never works alone—all evaluations and policy engagements are conducted in partnership with academics and implementing organizations, and increasingly with governments. Moving from an idea to the research phase to policy takes multiple steps and actors, often over many years. But even if IPA deserves only 10 percent of the credit for the policy changes behind the benefits calculated above, the ratio of benefits to costs is still 7.4x. That is a solid return on investment.

Despite the demonstrated value of high-quality impact evaluations, a great deal of money and time has been wasted on poorly designed, poorly implemented, and poorly conceived impact evaluations. Perhaps some studies had too small of a sample or paid insufficient attention to establishing causality and quality data, and hence any results should be ignored; others perhaps failed to engage stakeholders appropriately, and as a consequence useful results were never put to use.

The push for more and more impact measurement can not only lead to poor studies and wasted money, but also distract and take resources from collecting data that can actually help improve the performance of an effort. To address these difficulties, we wrote a book, The Goldilocks Challenge, to help guide organizations in designing “right-fit” evidence strategies. The struggle to find the right fit in evidence resembles the predicament that Goldilocks faces in the classic children’s fable. Goldilocks, lost in the forest, finds an empty house with a large number of options: chairs, bowls of porridge, and beds of all sizes. She tries each but finds that most do not suit her: The porridge is too hot or too cold, the bed too hard or too soft—she struggles to find options that are “just right.” Like Goldilocks, the social sector has to navigate many choices and challenges to build monitoring and evaluation systems that fit their needs. Some will push for more and more data; others will not push for enough….(More)”.

City Data Exchange – Lessons Learned From A Public/Private Data Collaboration


Report by the Municipality of Copenhagen: “The City Data Exchange (CDE) is the product of a collaborative project between the Municipality of Copenhagen, the Capital Region of Denmark, and Hitachi. The purpose of the project is to examine the possibilities of creating a marketplace for the exchange of data between public and private organizations.

The CDE consists of three parts:

  • A collaboration between the different partners on supply, and demand of specific data;
  • A platform for selling and purchasing data aimed at both public, and private organizations;
  • An effort to establish further experience in the field of data exchange between public, and private organizations.

In 2013, the City of Copenhagen, and the Copenhagen Region decided to invest in the creation of a marketplace for the exchange of public, and private sector data. The initial investment was meant as a seed towards a self-sustained marketplace. This was an innovative approach to test the readiness of the market to deliver new data-sharing solutions.

The CDE is the result of a tender by the Municipality of Copenhagen and the Capital Region of Denmark in 2015. Hitachi Consulting won the tender and has invested, and worked with the Municipality of Copenhagen, and the Capital Region of Denmark to establish an organization and a technical platform.

The City Data Exchange (CDE) has closed a gap in regional data infrastructure. Both public-and private sector organizations have used the CDE to gain insights into data use cases, new external data sources, GDPR issues, and to explore the value of their data. Before the CDE was launched, there were only a few options available to purchase or sell data.

The City and the Region of Copenhagen are utilizing the insights from the CDE project to improve their internal activities and to shape new policies. The lessons from the CDE also provide insights into a wider national infrastructure for effective data sharing. Based on the insights from approximately 1000 people that the CDE has been in contact with, the recommendations are:

  • Start with the use case, as it is key to engage the data community that will use the data;
  • Create a data competence hub, where the data community can meet and get support;
  • Create simple standards and guidelines for data publishing.

The following paper presents some of the key findings from our work with the CDE. It has been compiled by Smart City Insights on behalf of the partners of the City Data Exchange project…(More)”.

Skills for a Lifetime


Nate Silver’s commencement address at Kenyon College: “….Power has shifted toward people and companies with a lot of proficiency in data science.

I obviously don’t think that’s entirely a bad thing. But it’s by no means entirely a good thing, either. You should still inherently harbor some suspicion of big, powerful institutions and their potentially self-serving and short-sighted motivations. Companies and governments that are capable of using data in powerful ways are also capable of abusing it.

What worries me the most, especially at companies like Facebook and at other Silicon Valley behemoths, is the idea that using data science allows one to remove human judgment from the equation. For instance, in announcing a recent change to Facebook’s News Feed algorithm, Mark Zuckerberg claimed that Facebook was not “comfortable” trying to come up with a way to determine which news organizations were most trustworthy; rather, the “most objective” solution was to have readers vote on trustworthiness instead. Maybe this is a good idea and maybe it isn’t — but what bothered me was in the notion that Facebook could avoid responsibility for its algorithm by outsourcing the judgment to its readers.

I also worry about this attitude when I hear people use terms such as “artificial intelligence” and “machine learning” (instead of simpler terms like “computer program”). Phrases like “machine learning” appeal to people’s notion of a push-button solution — meaning, push a button, and the computer does all your thinking for you, no human judgment required.

But the reality is that working with data requires lots of judgment. First, it requires critical judgment — and experience — when drawing inferences from data. And second, it requires moral judgment in deciding what your goals are and in establishing boundaries for your work.

Let’s talk about that first type of judgment — critical judgment. The more experience you have in working with different data sets, the more you’ll realize that the correct interpretation of the data is rarely obvious, and that the obvious-seeming interpretation isn’t always correct. Sometimes changing a single assumption or a single line of code can radically change your conclusion. In the 2016 U.S. presidential election, for instance, there were a series of models that all used almost exactly the same inputs — but they ranged in giving Trump as high as roughly a one-in-three chance of winning the presidency (that was FiveThirtyEight’s model) to as low as one chance in 100, based on fairly subtle aspects of how each algorithm was designed….(More)”.