Data Activism


ˈdeɪtə ˈæktɪˌvɪzəm

New social practices enabled by data and technology which aim to create political change (Milan and Gutiérrez).

The large-scale generation of data that has occurred over the past decade has given rise to data activism, defined by Stefania Milan and Miren Gutiérrez, scholars in technology and society at the University of Amsterdam and University of Deusto, as “new social practices rooted in technology and data.” These authors further discuss this term, arguing:

“Data activism indicates social practices that take a critical approach to big data. Examples include the collective mapping and geo-referencing of the messages of victims of natural disasters in order to facilitate disaster relief operations, or the elaboration of open government data for advocacy and campaigning. But data activism also embraces tactics of resistance to massive data collection by private companies and governments, such as the encryption of private communication, or obfuscation tactics that put sand into the data collection machine.

Milan and Gutiérrez further elaborate on these two forms of data activism in their paper “Technopolitics in the Age of Big Data.” Here, they argue all data activism is either proactive and reactive. They state:

“We identify two forms of data activism: proactive data activism, whereby citizens take advantage of the possibilities offered by big data infrastructure for advocacy and social change, and reactive data activism, namely grassroots efforts aimed at resisting massive data collection and protecting users from malicious snooping.”

An example of reactive data activism comes from Media Action Grassroots Network, a network of social justice organizations based in the United States. This network provides digital security training to grassroot activists working on racial justice issues.

An example of proactive data activism is discussed in “Data witnessing: attending to injustice with data in Amnesty International’s Decoders project.” There, author Jonathan Gray, a critical data scholar, examines “what digital data practices at Amnesty International’s Decoders initiative can add to the understanding of witnessing.” According to Gray, witnessing is a concept that has been used in law, religion, and media, among others, to explore the construction of evidence and experience. In this paper, Gray references four data witnessing projects, which are:

“(i) witnessing historical abuses with structured data from digitised documents; (ii) witnessing the destruction of villages with satellite imagery and machine learning; (iii) witnessing environmental injustice with company reports and photographs; and (iv) witnessing online abuse through the classification of Twitter data. These projects illustrate the configuration of experimental apparatuses for witnessing injustices with data.”

Within the more recent context, proactive data activism has several notable examples. Civil rights activists in Zanesville, Ohio used data to demonstrate the inequitable access to clean water between predominantly white communities and black communities. A collection of activists, organizers, and mathematicians formed Data 4 Black Lives to promote justice for Black communities through data and data science. Finally, in an effort to monitor government accountability in providing COVID-19 case data, Indonesian activists created a platform where citizens can independently report COVID-19 cases.

Multisolving


ˌmʌltiˈsɑlvɪŋ

pooling expertise, funding, and political will to solve multiple problems with a single investment of time and money (Sawin, 2018).

Co-Director of Climate Interactive, a not-for-profit energy and environment think tank, Elizabeth Sawin wrote an article in Stanford Social Innovation Review (SSIR) on multisolving after a year-long study of the implementation of such approach for climate and health. Defined as a way of solving multiple problems with a single investment of time and money, the multisolving approach brings together stakeholders from different sectors and disciplines to tackle public issues in a cost-efficient manner.

In the article, Sawin provides examples of multisolving that have been implemented in countries across the globe:

In Japan, manufacturing facilities use “green curtains”—living panels of climbing plants—to clean the air, provide vegetables for company cafeterias, and reduce energy use for cooling. A walk-to-school program in the United Kingdom fights a decline in childhood physical activity while reducing traffic congestion and greenhouse gas emissions from transportation. A food-gleaning program staffed by young volunteers and families facing food insecurity in Spain addresses food waste, hunger, and a desire for sustainability.

A Climate Interactive report provides three principles and three practices that can help stakeholders develop multisolving strategy. In the SSIR article, Sawin summarizes those principles into three points. First, she argues that a solution must serve everyone in a system without an exception. Second, she suggests that multisolvers must recognize that problems are multifaceted and that multisolving provides solution to multiple facets of a big issue. Third, Sawin posits that experimentation and learning are key to measuring the success of multisolving.

Further, in the article Sawin also outlined three good multisolving practices. First, she identifies openness to collaboration with actors from different sectors or groups in a society as a critical ingredient in developing a multisolving strategy. Second, Sawin stresses the importance of learning, documenting, and improving to ensure optimal benefits of multisolving for the public. Finally, she argues that communicating the benefits of multisolving to various stakeholders can help generate buy-in for a multisolving project.

In concluding the article, Sawin wrote “[n]one of these multisolving principles or tools, on their own, are revolutionary. They need no new apps or state-of-the-art techniques to work. What makes multisolving unique is that it weaves together these principles and practices in a way that builds over time to create big results.”

Informational Autocrats


ˌɪnfərˈmeɪʃənəl ˈɔtəˌkræts

Rulers who control and manipulate information in order to maintain power. (Guriev and Treisman, 2019)

Sergei Guriev (Professor of Economics, Sciences Po, Paris) and Daniel Treisman (Professor of Political Science, University of California, Los Angeles) detail in their paper, Informational Autocrats, a term for new, more surreptitious type of authoritarian leaders. The authors write:

“In this article, we document the changing characteristics of authoritarian states worldwide. Using newly collected data, we show that recent autocrats employ violent repression and impose official ideologies far less often than their predecessors. They also appear more prone to conceal rather than to publicize cases of state brutality. Analyzing texts of leaders’ speeches, we show that “informational autocrats” favor a rhetoric of economic performance and provision of public services that resembles that of democratic leaders far more than it does the discourse of threats and fear embraced by old-style dictators. Authoritarian leaders are increasingly mimicking democracy by holding elections and, where necessary, falsifying the results.

Today, informational autocrats often employ “cyber troops” to spread disinformation. They specifically target and take advantage of the “uninformed masses”  in order to advance their interests. Guriev and Treisman further argue:

“A key element in our theory of informational autocracy is the gap in political knowledge between the “informed elite” and the general public. While the elite accurately observes the limitations of an incompetent incumbent, the public is susceptible to the ruler’s propaganda. Using individual-level data from the Gallup World Poll, we show that such a gap does indeed exist in many authoritarian states today. Unlike in democracies, where the highly educated are more likely than others to approve of their government, in authoritarian states the highly educated tend to be more critical. The highly educated are also more aware of media censorship than their less-schooled compatriots.”

Separately, Andrea Kendall-Taylor, Erica Frantz, and Joseph Wright, in Foreign Affairs, echo the above suggestion, in that: 

“Dictatorships can also use new technologies to shape public perception of the regime and its legitimacy. Automated accounts (or “bots”) on social media can amplify influence campaigns and produce a flurry of distracting or misleading posts that crowd out opponents’ messaging.”

Additionally:

“Digital tools might even help regimes make themselves appear less repressive and more responsive to their citizens. In some cases, authoritarian regimes have deployed new technologies to mimic components of democracy, such as participation and deliberation.”

Globalization of ideas and technological advances have contributed to creating a hostile environment for traditional and overt dictatorship. At the same time, this combination has also been misused by informational autocrats to advance their own interests. Promoting accountability across all sectors through open government data and algorithmic transparency, for example, can prevent such effort to control and manipulate information.

Kludge


ˈklʌdʒ

A clumsy but temporarily effective solution to a particular problem (Oxford English Dictionary).

The term kludge is often used in the world of computer programming to refer to an inelegant temporary patch intended to solve a problem.

In an article for the Washington Post, Mike Konczal—a fellow at the Roosevelt Institute—discusses how kludges are also found in policymaking. Konczal argues that in a well-intentioned effort to make governing simpler, policymakers tend to adopt simple fixes, instead of policies that would make decision-making process actually simple.

Policies that make decision-making process simple can involve “nudges”—a behavioral economics concept proposed by Richard Thaler and Cass Sunstein. In the article, Konczal writes:

“A simple policy is one that simply “nudges” people into one choice or another using a variety of default rules, disclosure requirements, and other market structures. Think, for instance, of rules that require fast-food restaurants to post calories on their menus, or a mortgage that has certain terms clearly marked in disclosures.

“These sorts of regulations are deemed “choice preserving.” Consumers are still allowed to buy unhealthy fast-food meals or sign up for mortgages they can’t reasonably afford. The regulations are just there to inform people about their choices. These rules are designed to keep the market “free,” where all possibilities are ultimately possible, although there are rules to encourage certain outcomes.”

On the other hand, there are policy “kludges”, which according to Steve Teles—professor of political science at Johns Hopkins University—illustrate the current public policy situation in the United States, reflected in the complexity of the healthcare, education, and environmental protection system, to which Teles further arguesAmerica has chosen to govern itself through more indirect and incoherent policy mechanisms than can be found in any comparable country.” 

According to Teles, these kludges can accumulate to be costly and complex with no clear principles. Continued iteration of policy kludges has increased the transaction costs for individuals to access services, the compliance costs for government and business, and created unequal opportunity for individuals and institutions to benefit from democracy. In Teles’ words, the costs of kludges are outlined as follows:

“The most insidious feature of kludgeocracy is the hidden, indirect, and frequently corrupt distribution of its costs. Those costs can be put into three categories — costs borne by individual citizens, costs borne by the government that must implement the complex policies, and costs to the character of our democracy.”

Technochauvinism


ˈtɛknoʊˈʃoʊvəˌnɪzəm

The belief that technology is always the solution (Broussard, 2018).

Since the beginning of its rise in the late 20th century, digital and computer technology promised to improve many ways the society operates. Personal computers, mobile phones, and the internet are some of the most ubiquitous examples of technology that have demonstrable capabilities to make lives easier to a certain extent.

However, recent years have shown increasing techlash—defined by The Oxford English Dictionary as “a strong and widespread negative reaction to the growing power and influence of large technology companies, particularly those based in Silicon Valley”—as a response to the harm that technology has helped create. Misinformation, privacy violation, and algorithmic bias are phrases that can often be found in the same sentence as one or more tech companies.

Computer scientist and data journalist Meredith Boussard, who is a professor at New York University, argues that these problems stem from technochauvinism—the belief that technology is always the solution. The summary of her book, Artificial Unintelligence, writes:

“… it’s just not true that social problems would inevitably retreat before a digitally enabled Utopia. To prove her point, she undertakes a series of adventures in computer programming. She goes for an alarming ride in a driverless car, concluding “the cyborg future is not coming any time soon”; uses artificial intelligence to investigate why students can’t pass standardized tests; deploys machine learning to predict which passengers survived the Titanic disaster; and attempts to repair the U.S. campaign finance system by building AI software. If we understand the limits of what we can do with technology, Broussard tells us, we can make better choices about what we should do with it to make the world better for everyone.”

The term technochauvinism is similar to technosolutionism. In that, they both describe the belief that most, if not all, complex issues can be solved with the right computation and engineering. However, the use of “chauvinism” is intentional because part of the criticism is about the rampant gender inequality in the tech industry, which manifest in many ways including algorithmic sexism.

“In Artificial Unintelligence, Meredith Broussard argues that our collective enthusiasm for applying computer technology to every aspect of life has resulted in a tremendous amount of poorly designed systems. We are so eager to do everything digitally—hiring, driving, paying bills, even choosing romantic partners—that we have stopped demanding that our technology actually work. Broussard, a software developer and journalist, reminds us that there are fundamental limits to what we can (and should) do with technology. With this book, she offers a guide to understanding the inner workings and outer limits of technology—and issues a warning that we should never assume that computers always get things right.”

Nowcasting


naʊˈkæstɪŋ

A method of describing the present or the near future by analyzing datasets that are not traditionally included in the analysis (e.g. web searches, reviews, social media data, etc.)

Nowcasting is a term that originates in meteorology, which refers to “the detailed description of the current weather along with forecasts obtained by extrapolation for a period of 0 to 6 hours ahead.” Today, nowcasting is also used in other fields, such as macroeconomics and health, to provide more up-to-date statistics.

Traditionally, macroeconomic statistics are collected on a quarterly basis and released with a substantial lag. For example, GDP data for euro area “is only available at quarterly frequency and is released six weeks after the close of the quarter.” Further, economic datasets from government agencies such as the US Census Bureau “typically appear only after multi-year lags, and the public-facing versions are aggregated to the county or ZIP code level.

The arrival of big data era has shown some promise to improve nowcasting. A paper by Edward L. Glaeser, Hyunjin Kim, and Michael Luca presents “evidence that Yelp data can complement government surveys by measuring economic activity in close to real time, at a granular level, and at almost any geographic scale.” In the paper, the authors concluded:

“Our analyses of one possible data source, Yelp, suggests that these new data sources can be a useful complement to official government data. Yelp can help predict contemporaneous changes in the local economy. It can also provide a snapshot of economic change at the local level. It is a useful addition to the data tools that local policy-makers can access.

“Yet our analysis also highlights the challenges with the idea of replacing the Census altogether at any point in the near future. Government statistical agencies invest heavily in developing relatively complete coverage, for a wide set of metrics. The variation in coverage inherent in data from online platforms make it difficult to replace the role of providing official statistics that government data sources play.

“Ultimately, data from platforms like Yelp –combined with official government statistics – can provide valuable complementary datasets that will ultimately allow for more timely and granular forecasts and policy analyses, with a wider set of variables and more complete view of the local economy.”

Another example comes from the United States Federal Reserve (The Fed), which used data from payroll-processing company ADP to payroll employment. This data is traditionally provided by Current Employment Statistics (CES) survey. Despite being “one of the most carefully conducted measures of labor market activity and uses an extremely large sample, it is still subject to significant sampling error and nonsampling errors.” The Fed sought to improve the reliability of this survey by including data provided by ADP. The study found that combining CES and ADP data “reduces the error inherent in both data sources.”

However, nowcasting using big data comes with some limitations. Several researchers evaluated the accuracy of Google Flu Trends (GFT) in the 2012-2013 and 2013-2014 seasons. GFT uses flu-related google searches to make its prediction. The study found that GFT data showed significant overestimation compared to Centers for Disease Control and Prevention (CDC) flu trends prediction.

Jesse Dunietz wrote in Nautilus describing how to address the limitations of big data and make nowcasting efforts more accurate: 

“But when big data isn’t seen as a panacea, it can be transformative. Several groups, like Columbia University researcher Jeffrey Shaman’s, for example, have outperformed the flu predictions of both the CDC and GFT by using the former to compensate for the skew of the latter. “Shaman’s team tested their model against actual flu activity that had already occurred during the season,” according to the CDC. By taking the immediate past into consideration, Shaman and his team fine-tuned their mathematical model to better predict the future. All it takes is for teams to critically assess their assumptions about their data.”

Bilingual


baɪˈlɪŋgwəl

Practitioners across disciplines who possess both domain knowledge and data science expertise.

The Governance Lab (GovLab) at the NYU Tandon School of Engineering just launched the 100 Questions Initiative, “an effort to identify the most important societal questions whose answers can be found in data and data science if the power of data collaboratives is harnessed.”

The initiative will seek to identify questions that could help unlock the potential of data and data science in solving various global and domestic issues, including but not limited to, climate change, economic inequality, and migration. These questions will be sourced from individuals who have expertise in both a public issue and data science or what The GovLab calls “bilinguals.”

Tom Kalil, the Chief Innovation Officer at Schmidt Futures, argues that the emergent use of data science and machine learning in the public sector will increase the demand for individuals “who speak data science and social sector.”

Similarly within the business context, David Meer wrote that “being bilingual isn’t just a matter of native English speakers learning how to conjugate verbs in French or Spanish. Rather, it’s important that businesses cultivate talent that can simultaneously speak the language of advanced data analysis and nuts-and-bolts business operations. As data analysis becomes a more prevalent and powerful lever for strategy and growth, organizations increasingly need bilinguals to form the bridge between the work of advanced data scientists and business decision makers.”

For more info, visit www.the100questions.org

Digital Serfdom


ˈdɪʤətəl ˈsɜrfdəm

A condition where consumers give up their personal and private information in order to be able to use a particular product or service.

Serfdom is a system of forced labor that exists in a feudalistic society. It was very common in Europe during the medieval age. In this system, serfs or peasants do a variety of labor for their lords in exchange for protection from bandits and a small piece of land that they can cultivate for themselves. Serfs are also required to pay some form of tax often in the form of chickens or crops yielded from their piece of land.

Hassan Khan in The Next Web points out that the decline of property ownership is indicative that we are living in a digital serfdom. In an article he says:

“The percentage of households without a car is increasing. Ride-hailing services have multiplied. Netflix boasts over 188 million subscribers. Spotify gains ten million paid members every five to six months.

“The model of “impermanence” has become the new normal. But there’s still one place where permanence finds its home, with over two billion active monthly users, Facebook has become a platform of record for the connected world. If it’s not on social media, it may as well have never happened.”

Joshua A. T. Fairfield elaborates this phenomenon in his book “Owned: Property, Privacy, and the New Digital Serfdom.” Fairfield discusses his book in an article in The Conversation, stating that:

“The issue of who gets to control property has a long history. In the feudal system of medieval Europe, the king owned almost everything, and everyone else’s property rights depended on their relationship with the king. Peasants lived on land granted by the king to a local lord, and workers didn’t always even own the tools they used for farming or other trades like carpentry and blacksmithing.

[…]

“Yet the expansion of the internet of things seems to be bringing us back to something like that old feudal model, where people didn’t own the items they used every day. In this 21st-century version, companies are using intellectual property law – intended to protect ideas – to control physical objects consumers think they own.”

In other words, Fairfield is suggesting that the devices and services that we use — iPhones, Fitbits, Roomba, digital door locks, Spotify, Uber, and many more — are constantly capturing data about behaviors. By using these products, consumers have no choice but to trade their personal data in order to access the full functionalities of these devices or services. This data is used by private corporations for targeted advertisement, among others. This system of digital serfdom binds consumers to private corporations who dictate the terms of use for their products or services.

Janet Burns wrote about Alex Rosenblat’s “UBERLAND: How Algorithms Are Rewriting The Rules Of Work” and gave some examples of how algorithms use personal data to manipulate consumers’ behaviors:

“For example, algorithms in control of assigning and pricing rides have often surprised drivers and riders, quietly taking into account other traffic in the area, regionally adjusted rates, and data on riders and drivers themselves.

“In recent years, we’ve seen similar adjustments happen behind the scenes in online shopping, as UBERLAND points out: major retailers have tweaked what price different customers see for the same item based on where they live, and how feasibly they could visit a brick-and-mortar store for it.”

To conclude, an excerpt from Fairfield’s book cautions: 

“In the coming decade, if we do not take back our ownership rights, the same will be said of our self-driving cars and software-enabled homes. We risk becoming digital peasants, owned by software and advertising companies, not to mention overreaching governments.”

Sources and Further readings:

Fairfield, Joshua A. T. “Owned: Property, Privacy, and the New Digital Serfdom.” Cambridge Press. https://www.cambridge.org/gb/academic/subjects/law/property-law/owned-property-privacy-and-new-digital-serfdom#JiVMgvsMOg6Zer5x.97 

Fairfield, Joshua A.T. “The ‘internet of things’ is sending us back to the Middle Ages.” The Conversation. https://theconversation.com/the-internet-of-things-is-sending-us-back-to-the-middle-ages-81435 

Burns, Janet. “Algorithms And ‘Uberland’ Are Driving Us Into Digital Serfdom.” Forbes. https://www.forbes.com/sites/janetwburns/2018/10/28/algorithms-and-uberland-are-driving-us-into-technocratic-serfdom/#7887dccc6705

Khan, Hassan. “We’re living in digital serfdom — trading privacy for convenience.” The Next Web. https://thenextweb.com/contributors/2018/11/10/were-living-in-a-digital-serfdom-trading-privacy-for-convenience/

Self-Sovereign Identity


sɛlf-ˈsɑvrən aɪˈdɛntəti

A decentralized identification mechanism that gives individuals control over what, when, and to whom their personal information is shared.

Identification document (ID) is a crucial part of every individual’s life, in that it is often a prerequisite for accessing a variety of services — ranging from creating a bank account to enrolling children in school to buying alcoholic beverages to signing up for an email account to voting in an election — and also a proof of simply being. This system poses fundamental problems, which a field report by the GovLab on Blockchain and Identity frames as follows:

“One of the central challenges of modern identity is its fragmentation and variation across platform and individuals. There are also issues related to interoperability between different forms of identity, and the fact that different identities confer very different privileges, rights, services or forms of access. The universe of identities is vast and manifold. Every identity in effect poses its own set of challenges and difficulties—and, of course, opportunities.”

A report published in New America echoed this point, by arguing that:

“Societally, we lack a coherent approach to regulating the handling of personal data. Users share and generate far too much data—both personally identifiable information (PII) and metadata, or “data exhaust”—without a way to manage it. Private companies, by storing an increasing amount of PII, are taking on an increasing level of risk. Solution architects are recreating the wheel, instead of flying over the treacherous terrain we have just described.”

SSI is dubbed as the solution for those identity problems mentioned above. Identity Woman, a researcher and advocate for SSI, goes even further by arguing that generating “a digital identity that is not under the control of a corporation, an organization or a government” is essential “in pursuit of social justice, deep democracy, and the development of new economies that share wealth and protect the environment.”

To inform the analysis on blockchain-based Self-Sovereign Identity (SSI), the GovLab report argues that identity is “a process, not a thing” and breaks it into a 5-stage lifecycle, which are provisioning, administration, authentication, authorization, and auditing/monitoring. At each stage, identification serves a unique function and poses different challenges.

With SSI, individuals have full control over how their personal information is shared, who gets access to it, and when. The New America report, summarizes the potential of SSI in the following paragraphs:

“We believe that the great potential of SSI is that it can make identity in the digital world function more like identity in the physical world, in which every person has a unique and persistent identity which is represented to others by means of both their physical attributes and a collection of credentials attested to by various external sources of authority.”

[…]

“SSI, in contrast, gives the user a portable, digital credential (like a driver’s license or some other document that proves your age), the authenticity of which can be securely validated via cryptography without the recipient having to check with the authority that issued it. This means that while the credential can be used to access many different sites and services, there is no third-party broker to track the services to which the user is authenticating. Furthermore, cryptographic techniques called “zero-knowledge proofs” (ZKPs) can be used to prove possession of a credential without revealing the credential itself. This makes it possible, for example, for users to prove that they are over the age of 21 without having to share their actual birth dates, which are both sensitive information and irrelevant to a binary, yes-or-no ID transaction.”

Some case studies on the application of SSI in the real world presented in the GovLab Blockchange website include a government-issued self-sovereign ID using blockchain technology in the city of Zug in Switzerland; a mobile election voting platform, secured via smart biometrics, real time ID verification and the blockchain for irrefutability piloted in West Virginia; and a blockchain based land and property transaction/registration in Sweden.

Nevertheless, on the hype of this new and emerging technology, the authors write:

“At their core, blockchain technologies offer new capacity for increasing the immutability, integrity, and resilience of information capture and disclosure mechanisms, fostering the potential to address some of the information asymmetries described above. By leveraging a shared and verified database of ledgers stored in a distributed manner, blockchain seeks to redesign information ecosystems in a more transparent, immutable, and trusted manner. Solving information asymmetries may turn out to be the real contribution of blockchain, and this—much more than the current enthusiasm over virtual currencies—is the real reason to assess its potential.

“It is important to emphasize, of course, that blockchain’s potential remains just that for the moment—only potential. Considerable hype surrounds the emerging technology, and much remains to be done and many obstacles to overcome if blockchain is to achieve the enthusiasts’ vision of “radical transparency.”

Further readings:

Allen, Christopher (2016). The Path to Self-Sovereign Identity. Coindesk. https://www.coindesk.com/path-self-sovereign-identity

Apostle, Julia (2018). Lessons from Cambridge Analytica: one way to protect your data. Financial Times. https://www.ft.com/content/43bc6d18-2b6f-11e8-97ec-4bd3494d5f14

Graglia, Michael, Christopher Mellon, and Tim Robustelli (2018). The Nail Finds a Hammer: Self-Sovereign Identity, Design Principles, and Property Rights in the Developing World. New America. https://www.newamerica.org/future-property-rights/reports/nail-finds-hammer/

Identity Woman, Kaliya (2017). Humanizing Technology. Open Democracy. https://www.opendemocracy.net/en/transformation/humanizing-technology/

Verhulst, Stefaan G. and Andrew Young (2018). On the Emergent Use of Distributed Ledger Technologies for Identity Management. The GovLab. https://blockchan.ge/fieldreport.html

Grey Data


greɪ ˈdeɪtə

A term for data accumulated by an institution for operational purposes and does not fall under any traditional data protection policies.

Organizations across all sectors accumulate a massive amount of data just by virtue of operating alone, and universities are among such organizations. In a paper, Christine L. Borgman categorizes these as grey data and further suggested that universities should take a lead in demonstrating stewardship of these data, which include student applications, faculty dossier, registrar records, ID card data, security cameras, and many others.

“Some of these data are collected for mandatory reporting obligations such as enrollments, diversity, budgets, grants, and library collections. Many types of data about individuals are collected for operational and design purposes, whether for instruction, libraries, travel, health, or student services.” (Borgman, p. 380)

Grey data typically does not fall under traditional data protection policies such as Health Insurance Portability and Accountability Act (HIPAA), Family Educational Rights and Privacy Act (FERPA), or Institutional Review Boards. Consequently, there are a lot of debates about how to use (or misuse) them. Borgman points out that universities have been “exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters.” On top of this, for-profit companies “are besieging universities with requests for access to data or for partnerships to mine them.”

Recognizing both the value of data and the risks arising from the accumulation of grey data, Borgman proposes a model of Data Stewardship by drawing on the practices of data protection in the University of California which concern information security, data governance, and cyber risk.

This model is an example of a good Data Stewardship practice that the GovLab is advocating amidst the rise of public-private collaboration in leveraging data for public good.

The GovLab’s Data Stewards website presents the need for such practice as follows:

“With these new practices of data collaborations come the need to reimagine roles and responsibilities to steer the process of using private data, and the insights it can generate, to address some of society’s biggest questions and challenges: Data Stewards.

“Today, establishing and sustaining these new collaborative and accountable approaches requires significant and time-consuming effort and investment of resources for both data holders on the supply side, and institutions that represent the demand. By establishing Data Stewardship as a function, recognized within the private sector as a valued responsibility, the practice of Data Collaboratives can become more predictable, scaleable, sustainable and de-risked.”

Resources:

Borgman, C. L. (2018). Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier. ArXiv. https://doi.org/10.15779/Z38B56D489

Young, A. (2018, November 26). About the Data Stewards Network. Retrieved March 6, 2019, from https://medium.com/data-stewards-network/about-the-data-stewards-network-1cb9db0c0792