Is Facebook’s advertising data accurate enough for use in social science research? Insights from a cross-national online survey

Paper by André Grow et al: “Social scientists increasingly use Facebook’s advertising platform for research, either in the form of conducting digital censuses of the general population, or for recruiting participants for survey research. Both approaches depend on the accuracy of the data that Facebook provides about its users, but little is known about how accurate these data are. We address this gap in a large-scale, cross-national online survey (N = 137,224), in which we compare self-reported and Facebook-classified demographic information (sex, age and region of residence). Our results suggest that Facebook’s advertising platform can be fruitfully used for conducing social science research if additional steps are taken to assess the accuracy of the characteristics under consideration…(More)”.

The Data Liberation Project 

About: “The Data Liberation Project is a new initiative I’m launching today to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest. Vast troves of government data are inaccessible to the people and communities who need them most. These datasets are inaccessible. The Process:

  • Identify: Through its own research, as well as through consultations with journalists, community groups, government-data experts, and others, the Data Liberation Project aims to identify a large number of datasets worth pursuing.
  • Obtain: The Data Liberation Project plans to use a wide range of methods to obtain the datasets, including via Freedom of Information Act requests, intervening in lawsuits, web-scraping, and advanced document parsing. To improve public knowledge about government data systems, the Data Liberation Project also files FOIA requests for essential metadata, such as database schemas, record layouts, data dictionaries, user guides, and glossaries.
  • Reformat: Many datasets are delivered to journalists and the public in difficult-to-use formats. Some may follow arcane conventions or require proprietary software to access, for instance. The Data Liberation Project will convert these datasets into open formats, and restructure them so that they can be more easily examined.
  • Clean: The Data Liberation Project will not alter the raw records it receives. But when the messiness of datasets inhibits their usefulness, the project will create secondary, “clean” versions of datasets that fix these problems.
  • Document: Datasets are meaningless without context, and practically useless without documentation. The Data Liberation Project will gather official documentation for each dataset into a central location. It will also fill observed gaps in the documentation through its own research, interviews, and analysis.
  • Disseminate: The Data Liberation Project will not expect reporters and other members of the public simply to stumble upon these datasets. Instead, it will reach out to the newsrooms and communities that stand to benefit most from the data. The project will host hands-on workshops, webinars, and other events to help others to understand and use the data.”…(More)”

Five-year campaign breaks science’s citation paywall

Article by Dalmeet Singh Chawla: “The more than 60 million scientific-journal papers indexed by Crossref — the database that registers DOIs, or digital object identifiers, for many of the world’s academic publications — now contain reference lists that are free to access and reuse.

The milestone, announced on Twitter on 18 August, is the result of an effort by the Initiative for Open Citations (I4OC), launched in 2017. Open-science advocates have for years campaigned to make papers’ citation data accessible under liberal copyright licences so that they can be studied, and those analyses shared. Free access to citations enables researchers to identify research trends, lets them conduct studies on which areas of research need funding, and helps them to spot when scientists are manipulating citation counts….

The move means that bibliometricians, scientometricians and information scientists will be able to reuse citation data in any way they please under the most liberal copyright licence, called CC0. This, in turn, allows other researchers to build on their work.

Before I4OC, researchers generally had to obtain permission to access data from major scholarly databases such as Web of Science and Scopus, and weren’t able to share the samples.

However, the opening up of Crossref articles’ citations doesn’t mean that all the world’s scholarly content now has open references. Although most major international academic publishers, including Elsevier, Springer Nature (which publishes Nature) and Taylor & Francis, index their papers on Crossref, some do not. These often include regional and non-English-language publications.

I4OC co-founder Dario Taraborelli, who is science programme officer at the Chan Zuckerberg Initiative and based in San Francisco, California, says that the next challenge will be to encourage publishers who don’t already deposit reference data in Crossref to do so….(More)”.

Unlocking the Potential of Open 990 Data

Article by Cinthia Schuman Ottinger & Jeff Williams: “As the movement to expand public use of nonprofit data collected by the Internal Revenue Service advances, it’s a good time to review how far the social sector has come and how much work remains to reach the full potential of this treasure trove…Organizations have employed open Form 990 data in numerous ways, including to:

  • Create new tools for donors.For instance, the Nonprofit Aid Visualizer, a partnership between Candid and Vanguard Charitable, uses open 990 data to find communities vulnerable to COVID-19, and help address both their immediate needs and long-term recovery. Another tool, COVID-19 Urgent Service Provider Support Tool, developed by the consulting firm BCT Partners, uses 990 data to direct donors to service providers that are close to communities most affected by COVID-19.
  • More efficiently prosecute charitable fraud. This includes a campaign by the New York Attorney General’s Office that recovered $1.7 million from sham charities and redirected funds to legitimate groups.
  • Generate groundbreaking findings on fundraising, volunteers, equity, and management. researcher at Texas Tech University, for example, explored more than a million e-filed 990s to overturn long-held assumptions about the role of cash in fundraising. He found that when nonprofits encourage noncash gifts as opposed to only cash contributions, financial contributions to those organizations increase over time.
  • Shed light on harmful practices that hurt the poor. A large-scale investigative analysis of nonprofit hospitals’ tax forms revealed that 45 percent of them sent a total of $2.7 billion in medical bills to patients whose incomes were likely low enough to qualify for free or discounted care. When this practice was publicly exposed, some hospitals reevaluated their practices and erased unpaid bills for qualifying patients. The expense of mining data like this previously made such research next to impossible.
  • Help donors make more informed giving decisions. In hopes of maximizing contributions to Ukrainian relief efforts, a record number of donors are turning to resources like Charity Navigator, which can now use open Form 990 data to evaluate and rate a large number of charities based on finances, governance, and other factors. At the same time, donors informed by open 990 data can seek more accountability from the organizations they support. For example, anti-corruption researchers scouring open 990 data and other records uncovered donations by Russian oligarchs aligned with President Putin. This pressured US nonprofits that accepted money from the oligarchs to disavow this funding…(More)”.

The wealth of (Open Data) nations? Open government data, country-level institutions and entrepreneurial activity

Paper by Franz Huber, Alan Ponce, Francesco Rentocchini & Thomas Wainwright: “Lately, Open Data (OD) has been promoted by governments around the world as a resource to accelerate innovation within entrepreneurial ventures . However,it remains unclear to what extent OD drives innovative entrepreneurship. This paper sheds light on this open question by providing novel empirical evidence on the relationship between OD publishing and (digital) entrepreneurship at the country-level. We draw upon a longitudinal dataset comprising 90 countries observed over the period 2013–2016. We find a significant and positive association between OD publishing and entrepreneurship at the country level. The results also show that OD publishing and entrepreneurship is strong in countries with high institutional quality. We argue that publishing OD is not sufficient to improve innovative entrepreneurship alone, so states need to move beyond a focus on OD initiatives and promotion, to focus on a broader set of policy initiatives that promote good governance…(More)”.

A journey toward an open data culture through transformation of shared data into a data resource

Paper by Scott D. Kahn and Anne Koralova: “The transition to open data practices is straightforward albeit surprisingly challenging to implement largely due to cultural and policy issues. A general data sharing framework is presented along with two case studies that highlight these challenges and offer practical solutions that can be adjusted depending on the type of data collected, the country in which the study is initiated, and the prevailing research culture. Embracing the constraints imposed by data privacy considerations, especially for biomedical data, must be emphasized for data outside of the United States until data privacy law(s) are established at the Federal and/or State level…(More).”

Without appropriate metadata, data-sharing mandates are pointless

Article by Mark A. Musen: “Last month, the US government announced that research articles and most underlying data generated with federal funds should be made publicly available without cost, a policy to be implemented by the end of 2025. That’s atop other important moves. The European Union’s programme for science funding, Horizon Europe, already mandates that almost all data be FAIR (that is, findable, accessible, interoperable and reusable). The motivation behind such data-sharing policies is to make data more accessible so others can use them to both verify results and conduct further analyses.

But just getting those data sets online will not bring anticipated benefits: few data sets will really be FAIR, because most will be unfindable. What’s needed are policies and infrastructure to organize metadata.

Imagine having to search for publications on some topic — say, methods for carbon reclamation — but you could use only the article titles (no keywords, abstracts or search terms). That’s essentially the situation for finding data sets. If I wanted to identify all the deposited data related to carbon reclamation, the task would be futile. Current metadata often contain only administrative and organizational information, such as the name of the investigator and the date when the data were acquired.

What’s more, for scientific data to be useful to other researchers, metadata must sensibly and consistently communicate essentials of the experiments — what was measured, and under what conditions. As an investigator who builds technology to assist with data annotation, it’s frustrating that, in the majority of fields, the metadata standards needed to make data FAIR don’t even exist.

Metadata about data sets typically lack experiment-specific descriptors. If present, they’re sparse and idiosyncratic. An investigator searching the Gene Expression Omnibus (GEO), for example, might seek genomic data sets containing information on how a disease or condition manifests itself in young animals or humans. Performing such a search requires knowledge of how the age of individuals is represented — which in the GEO repository, could be age, AGE, age (after birth), age (years), Age (yr-old) or dozens of other possibilities. (Often, such information is missing from data sets altogether.) Because the metadata are so ad hoc, automated searches fail, and investigators waste enormous amounts of time manually sifting through records to locate relevant data sets, with no guarantee that most (or any) can be found…(More)”.

A User’s Guide to the Periodic Table of Open Data

Guide by Stefaan Verhulst and Andrew Zahuranec: “Leveraging our research on the variables that determine Open Data’s Impact, The Open Data Policy Lab is pleased to announce the publication of a new report designed to assist organizations in implementing the elements of a successful data collaborative: A User’s Guide to The Periodic Table of Open Data.

The User’s Guide is a fillable document designed to empower data stewards and others seeking to improve data access. It can be used as a checklist and tool to weigh different elements based on their context and priorities. By completing the forms (offline/online), you will be able to take a more comprehensive and strategic view of what resources and interventions may be required.

Download and fill out the User’s Guide to operationalize the elements in your data initiative

In conjunction with the release of our User’s Guide, the Open Data Policy Lab is pleased to present a completely reworked version of our Periodic Table of Open Data Elements, first launched alongside in 2016. We sought to categorize the elements that matter in open data initiatives into five categories: problem and demand definition, capacity and culture, governance and standards, personnel and partnerships, and risk mitigation. More information on each can be found in the attached report or in the interactive table below.

Read more about the Periodic Table of Open Data Elements and how you can use it to support your work…(More)”.

Closing the Data Divide for a More Equitable U.S. Digital Economy

Report by Gillian Diebold: “In the United States, access to many public and private services, including those in the financial, educational, and health-care sectors, are intricately linked to data. But adequate data is not collected equitably from all Americans, creating a new challenge: the data divide, in which not everyone has enough high-quality data collected about them or their communities and therefore cannot benefit from data-driven innovation. This report provides an overview of the data divide in the United States and offers recommendations for how policymakers can address these inequalities…(More)”.

Making Government Data Publicly Available: Guidance for Agencies on Releasing Data Responsibly

Report by Hugh Grant-Chapman, and Hannah Quay-de la Vallee: “Government agencies rely on a wide range of data to effectively deliver services to the populations with which they engage. Civic-minded advocates frequently argue that the public benefits of this data can be better harnessed by making it available for public access. Recent years, however, have also seen growing recognition that the public release of government data can carry certain risks. Government agencies hoping to release data publicly should consider those potential risks in deciding which data to make publicly available and how to go about releasing it.

This guidance offers an introduction to making data publicly available while addressing privacy and ethical data use issues. It is intended for administrators at government agencies that deliver services to individuals — especially those at the state and local levels — who are interested in publicly releasing government data. This guidance focuses on challenges that may arise when releasing aggregated data derived from sensitive information, particularly individual-level data.

The report begins by highlighting key benefits and risks of making government data publicly available. Benefits include empowering members of the general public, supporting research on program efficacy, supporting the work of organizations providing adjacent services, reducing agencies’ administrative burden, and holding government agencies accountable. Potential risks include breaches of individual privacy; irresponsible uses of the data by third parties; and the possibility that the data is not used at all, resulting in wasted resources.

In light of these benefits and risks, the report presents four recommended actions for publishing government data responsibly:

  1. Establish data governance processes and roles;
  2. Engage external communities;
  3. Ensure responsible use and privacy protection; and
  4. Evaluate resource constraints.

These key considerations also take into account federal and state laws as well as emerging computational and analytical techniques for protecting privacy when releasing data, such as differential privacy techniques and synthetic data. Each of these techniques involves unique benefits and trade-offs to be considered in context of the goals of a given data release…(More)”.