From Theory to Practice : Open Government Data, Accountability, and Service Delivery


Report by Michael Christopher Jelenic: “Open data and open government data have recently attracted much attention as a means to innovate, add value, and improve outcomes in a variety of sectors, public and private. Although some of the benefits of open data initiatives have been assessed in the past, particularly their economic and financial returns, it is often more difficult to evaluate their social and political impacts. In the public sector, a murky theory of change has emerged that links the use of open government data with greater government accountability as well as improved service delivery in key sectors, including health and education, among others. In the absence of cross-country empirical research on this topic, this paper asks the following: Based on the evidence available, to what extent and for what reasons is the use of open government data associated with higher levels of accountability and improved service delivery in developing countries?

To answer this question, the paper constructs a unique data set that operationalizes open government data, government accountability, service delivery, as well as other intervening and control variables. Relying on data from 25 countries in Sub-Saharan Africa, the paper finds a number of significant associations between open government data, accountability, and service delivery. However, the findings suggest differentiated effects of open government data across the health and education sectors, as well as with respect to service provision and service delivery outcomes. Although this early research has limitations and does not attempt to establish a purely causal relationship between the variables, it provides initial empirical support for claims about the efficacy of open government data for improving accountability and service delivery….(More)”

The Tricky Ethics of Using YouTube Videos for Academic Research


Jane C.Hu in P/S Magazine: “…But just because something is legal doesn’t mean it’s ethical. That doesn’t mean it’s necessarily unethical, either, but it’s worth asking questions about how and why researchers use social media posts, and whether those uses could be harmful. I was once a researcher who had to obtain human-subjects approval from a university institutional review board, and I know it can be a painstaking application process with long wait times. Collecting data from individuals takes a long time too. If you could just sub in YouTube videos in place of collecting your own data, that saves time, money, and effort. But that could be at the expense of the people whose data you’re scraping.

But, you might say, if people don’t want to be studied online, then they shouldn’t post anything. But most people don’t fully understand what “publicly available” really means or its ramifications. “You might know intellectually that technically anyone can see a tweet, but you still conceptualize your audience as being your 200 Twitter followers,” Fiesler says. In her research, she’s found that the majority of people she’s polled have no clue that researchers study public tweets.

Some may disagree that it’s researchers’ responsibility to work around social media users’ ignorance, but Fiesler and others are calling for their colleagues to be more mindful about any work that uses publicly available data. For instance, Ashley Patterson, an assistant professor of language and literacy at Penn State University, ultimately decided to use YouTube videos in her dissertation work on biracial individuals’ educational experiences. That’s a decision she arrived at after carefully considering her options each step of the way. “I had to set my own levels of ethical standards and hold myself to it, because I knew no one else would,” she says. One of Patterson’s first steps was to ask herself what YouTube videos would add to her work, and whether there were any other ways to collect her data. “It’s not a matter of whether it makes my life easier, or whether it’s ‘just data out there’ that would otherwise go to waste. The nature of my question and the response I was looking for made this an appropriate piece [of my work],” she says.

Researchers may also want to consider qualitative, hard-to-quantify contextual cues when weighing ethical decisions. What kind of data is being used? Fiesler points out that tweets about, say, a television show are way less personal than ones about a sensitive medical condition. Anonymized written materials, like Facebook posts, could be less invasive than using someone’s face and voice from a YouTube video. And the potential consequences of the research project are worth considering too. For instance, Fiesler and other critics have pointed out that researchers who used YouTube videos of people documenting their experience undergoing hormone replacement therapy to train an artificial intelligence to identify trans people could be putting their unwitting participants in danger. It’s not obvious how the results of Speech2Face will be used, and, when asked for comment, the paper’s researchers said they’d prefer to quote from their paper, which pointed to a helpful purpose: providing a “representative face” based on the speaker’s voice on a phone call. But one can also imagine dangerous applications, like doxing anonymous YouTubers.

One way to get ahead of this, perhaps, is to take steps to explicitly inform participants their data is being used. Fiesler says that, when her team asked people how they’d feel after learning their tweets had been used for research, “not everyone was necessarily super upset, but most people were surprised.” They also seemed curious; 85 percent of participants said that, if their tweet were included in research, they’d want to read the resulting paper. “In human-subjects research, the ethical standard is informed consent, but inform and consent can be pulled apart; you could potentially inform people without getting their consent,” Fiesler suggests….(More)”.

How Can We Overcome the Challenge of Biased and Incomplete Data?


Knowledge@Wharton: “Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.  

In a conversation with Knowledge@Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases….

….Knowledge@Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?

Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you….(More)”.

The 100 Questions Initiative: Sourcing 100 questions on key societal challenges that can be answered by data insights


100Q Screenshot

Press Release: “The Governance Lab at the NYU Tandon School of Engineering announced the launch of the 100 Questions Initiative — an effort to identify the most important societal questions whose answers can be found in data and data science if the power of data collaboratives is harnessed.

The initiative, launched with initial support from Schmidt Futures, seeks to address challenges on numerous topics, including migration, climate change, poverty, and the future of work.

For each of these areas and more, the initiative will seek to identify questions that could help unlock the potential of data and data science with the broader goal of fostering positive social, environmental, and economic transformation. These questions will be sourced by leveraging “bilinguals” — practitioners across disciplines from all over the world who possess both domain knowledge and data science expertise.

The 100 Questions Initiative starts by identifying 10 key questions related to migration. These include questions related to the geographies of migration, migrant well-being, enforcement and security, and the vulnerabilities of displaced people. This inaugural effort involves partnerships with the International Organization for Migration (IOM) and the European Commission, both of which will provide subject-matter expertise and facilitation support within the framework of the Big Data for Migration Alliance (BD4M).

“While there have been tremendous efforts to gather and analyze data relevant to many of the world’s most pressing challenges, as a society, we have not taken the time to ensure we’re asking the right questions to unlock the true potential of data to help address these challenges,” said Stefaan Verhulst, co-founder and chief research and development officer of The GovLab. “Unlike other efforts focused on data supply or data science expertise, this project seeks to radically improve the set of questions that, if answered, could transform the way we solve 21st century problems.”

In addition to identifying key questions, the 100 Questions Initiative will also focus on creating new data collaboratives. Data collaboratives are an emerging form of public-private partnership that help unlock the public interest value of previously siloed data. The GovLab has conducted significant research in the value of data collaboration, identifying that inter-sectoral collaboration can both increase access to information (e.g., the vast stores of data held by private companies) as well as unleash the potential of that information to serve the public good….(More)”.

MegaPixels


About: “…MegaPixels is an art and research project first launched in 2017 for an installation at Tactical Technology Collective’s GlassRoom about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a commission by Elevate Arts festival in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.

MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the several of the same technology companies who have created datasets presented on this site.

MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing an academic paper, MegaPixels is a website-first research project, with an academic publication to follow.

One of the main focuses of the dataset investigations presented on this site is to uncover where funding originated. Because of our emphasis on other researcher’s funding sources, it is important that we are transparent about our own….(More)”.

Virtual Briefing at the Supreme Court


Paper by Alli Orr Larsen and Jeffrey L. Fisher: “The open secret of Supreme Court advocacy in a digital era is that there is a new way to argue to the Justices. Today’s Supreme Court arguments are developed online: They are dissected and explored in blog posts, fleshed out in popular podcasts, and analyzed and re-analyzed by experts who do not represent parties or have even filed a brief in the case at all. This “virtual briefing” (as we call it) is intended to influence the Justices and their law clerks but exists completely outside of traditional briefing rules. This article describes virtual briefing and makes a case that the key players inside the Court are listening. In particular, we show that the Twitter patterns of law clerks indicate they are paying close attention to producers of virtual briefing, and threads of these arguments (proposed and developed online) are starting to appear in the Court’s decisions.

We argue that this “crowdsourcing” dynamic to Supreme Court decision-making is at least worth a serious pause. There is surely merit to enlarging the dialogue around the issues the Supreme Court decides – maybe the best ideas will come from new voices in the crowd. But the confines of the adversarial process have been around for centuries, and there are significant risks that come with operating outside of it particularly given the unique nature and speed of online discussions. We analyze those risks in this article and suggest it is time to think hard about embracing virtual briefing — truly assessing what can be gained and what will be lost along the way….(More)”.

Principles and Policies for “Data Free Flow With Trust”


Paper by Nigel Cory, Robert D. Atkinson, and Daniel Castro: “Just as there was a set of institutions, agreements, and principles that emerged out of Bretton Woods in the aftermath of World War II to manage global economic issues, the countries that value the role of an open, competitive, and rules-based global digital economy need to come together to enact new global rules and norms to manage a key driver of today’s global economy: data. Japanese Prime Minister Abe’s new initiative for “data free flow with trust,” combined with Japan’s hosting of the G20 and leading role in e-commerce negotiations at the World Trade Organization (WTO), provides a valuable opportunity for many of the world’s leading digital economies (Australia, the United States, and European Union, among others) to rectify the gradual drift toward a fragmented and less-productive global digital economy. Prime Minister Abe is right in proclaiming, “We have yet to catch up with the new reality, in which data drives everything, where the D.F.F.T., the Data Free Flow with Trust, should top the agenda in our new economy,” and right in his call “to rebuild trust toward the system for international trade. That should be a system that is fair, transparent, and effective in protecting IP and also in such areas as e-commerce.”

The central premise of this effort should be a recognition that data and data-driven innovation are a force for good. Across society, data innovation—the use of data to create value—is creating more productive and innovative economies, transparent and responsive governments, better social outcomes (improved health care, safer and smarter cities, etc.).3But to maximize the innovative and productivity benefits of data, countries that support an open, rules-based global trading system need to agree on core principles and enact common rules. The benefits of a rules-based and competitive global digital economy are at risk as a diverse range of countries in various stages of political and economic development have policy regimes that undermine core processes, especially the flow of data and its associated legal responsibilities; the use of encryption to protect data and digital activities and technologies; and the blocking of data constituting illegal, pirated content….(More)”.

Privacy and Identity in a Networked Society: Refining Privacy Impact Assessment,


Book by Stefan Strauß: “This book offers an analysis of privacy impacts resulting from and reinforced by technology and discusses fundamental risks and challenges of protecting privacy in the digital age.

Privacy is among the most endangered “species” in our networked society: personal information is processed for various purposes beyond our control. Ultimately, this affects the natural interplay between privacy, personal identity and identification. This book investigates that interplay from a systemic, socio-technical perspective by combining research from the social and computer sciences. It sheds light on the basic functions of privacy, their relation to identity, and how they alter with digital identification practices. The analysis reveals a general privacy control dilemma of (digital) identification shaped by several interrelated socio-political, economic and technical factors. Uncontrolled increases in the identification modalities inherent to digital technology reinforce this dilemma and benefit surveillance practices, thereby complicating the detection of privacy risks and the creation of appropriate safeguards.

Easing this problem requires a novel approach to privacy impact assessment (PIA), and this book proposes an alternative PIA framework which, at its core, comprises a basic typology of (personally and technically) identifiable information. This approach contributes to the theoretical and practical understanding of privacy impacts and thus, to the development of more effective protection standards….(More)”.

Ethics of identity in the time of big data


Paper by James Brusseau in First Monday: “Compartmentalizing our distinct personal identities is increasingly difficult in big data reality. Pictures of the person we were on past vacations resurface in employers’ Google searches; LinkedIn which exhibits our income level is increasingly used as a dating web site. Whether on vacation, at work, or seeking romance, our digital selves stream together.

One result is that a perennial ethical question about personal identity has spilled out of philosophy departments and into the real world. Ought we possess one, unified identity that coherently integrates the various aspects of our lives, or, incarnate deeply distinct selves suited to different occasions and contexts? At bottom, are we one, or many?

The question is not only palpable today, but also urgent because if a decision is not made by us, the forces of big data and surveillance capitalism will make it for us by compelling unity. Speaking in favor of the big data tendency, Facebook’s Mark Zuckerberg promotes the ethics of an integrated identity, a single version of selfhood maintained across diverse contexts and human relationships.

This essay goes in the other direction by sketching two ethical frameworks arranged to defend our compartmentalized identities, which amounts to promoting the dis-integration of our selves. One framework connects with natural law, the other with language, and both aim to create a sense of selfhood that breaks away from its own past, and from the unifying powers of big data technology….(More)”.

How Technology Could Revolutionize Refugee Resettlement


Krishnadev Calamur in The Atlantic: “… For nearly 70 years, the process of interviewing, allocating, and accepting refugees has gone largely unchanged. In 1951, 145 countries came together in Geneva, Switzerland, to sign the Refugee Convention, the pact that defines who is a refugee, what refugees’ rights are, and what legal obligations states have to protect them.

This process was born of the idealism of the postwar years—an attempt to make certain that those fleeing war or persecution could find safety so that horrific moments in history, such as the Holocaust, didn’t recur. The pact may have been far from perfect, but in successive years, it was a lifeline to Afghans, Bosnians, Kurds, and others displaced by conflict.

The world is a much different place now, though. The rise of populism has brought with it a concomitant hostility toward immigrants in general and refugees in particular. Last October, a gunman who had previously posted anti-Semitic messages online against HIAS killed 11 worshippers in a Pittsburgh synagogue. Many of the policy arguments over resettlement have shifted focus from humanitarian relief to security threats and cost. The Trump administration has drastically cut the number of refugees the United States accepts, and large parts of Europe are following suit.

If it works, Annie could change that dynamic. Developed at Worcester Polytechnic Institute in Massachusetts, Lund University in Sweden, and the University of Oxford in Britain, the software uses what’s known as a matching algorithm to allocate refugees with no ties to the United States to their new homes. (Refugees with ties to the United States are resettled in places where they have family or community support; software isn’t involved in the process.)

Annie’s algorithm is based on a machine learning model in which a computer is fed huge piles of data from past placements, so that the program can refine its future recommendations. The system examines a series of variables—physical ailments, age, levels of education and languages spoken, for example—related to each refugee case. In other words, the software uses previous outcomes and current constraints to recommend where a refugee is most likely to succeed. Every city where HIAS has an office or an affiliate is given a score for each refugee. The higher the score, the better the match.

This is a drastic departure from how refugees are typically resettled. Each week, HIAS and the eight other agencies that allocate refugees in the United States make their decisions based largely on local capacity, with limited emphasis on individual characteristics or needs….(More)”.