What is a fair exchange for access to public data?


Blog and policy brief by Jeni Tennison: “The most obvious approach to get companies to share value back to the public sector in return for access to data is to charge them. However, there are a number of challenges with a “pay to access” approach: it’s hard to set the right price; it creates access barriers, particularly for cash-poor start-ups; and it creates a public perception that the government is willing to sell their data, and might be tempted to loosen privacy-protecting governance controls in exchange for cash.

Are there other options? The policy brief explores a range of other approaches and assesses these against five goals that a value-sharing framework should ideally meet, to:

  • Encourage use of public data, including by being easy for organisations to understand and administer.
  • Provide a return on investment for the public sector, offsetting at least some of the costs of supporting the NDL infrastructure and minimising administrative costs.
  • Promote equitable innovation and economic growth in the UK, which might mean particularly encouraging smaller, home-grown businesses.
  • Create social value, particularly towards this Government’s other missions, such as achieving Net Zero or unlocking opportunity for all.
  • Build public trust by being easily explainable, avoiding misaligned incentives that encourage the breaking of governance guardrails, and feeling like a fair exchange.

In brief, alternatives to a pay-to-access model that still provide direct financial returns include:

  • Discounts: the public sector could secure discounts on products and services created using public data. However, this could be difficult to administer and enforce.
  • Royalties: taking a percentage of charges for products and services created using public data might be similarly hard to administer and enforce, but applies to more companies.
  • Equity: taking equity in startups can provide long-term returns and align with public investment goals.
  • Levies: targeted taxes on businesses that use public data can provide predictable revenue and encourage data use.
  • General taxation: general taxation can fund data infrastructure, but it may lack the targeted approach and public visibility of other methods.

It’s also useful to consider non-financial conditions that could be put on organisations accessing public data..(More)”.

A crowd-sourced repository for valuable government data


About: “DataLumos is an ICPSR archive for valuable government data resources. ICPSR has a long commitment to safekeeping and disseminating US government and other social science data. DataLumos accepts deposits of public data resources from the community and recommendations of public data resources that ICPSR itself might add to DataLumos. Please consider making a monetary donation to sustain DataLumos…(More)”.

Elon Musk Also Has a Problem with Wikipedia


Article by Margaret Talbot: “If you have spent time on Wikipedia—and especially if you’ve delved at all into the online encyclopedia’s inner workings—you will know that it is, in almost every aspect, the inverse of Trumpism. That’s not a statement about its politics. The thousands of volunteer editors who write, edit, and fact-check the site manage to adhere remarkably well, over all, to one of its core values: the neutral point of view. Like many of Wikipedia’s s principles and procedures, the neutral point of view is the subject of a practical but sophisticated epistemological essay posted on Wikipedia. Among other things, the essay explains, N.P.O.V. means not stating opinions as facts, and also, just as important, not stating facts as opinions. (So, for example, the third sentence of the entry titled “Climate change” states, with no equivocation, that “the current rise in global temperatures is driven by human activities, especially fossil fuel burning since the Industrial Revolution.”)…So maybe it should come as no surprise that Elon Musk has lately taken time from his busy schedule of dismantling the federal government, along with many of its sources of reliable information, to attack Wikipedia. On January 21st, after the site updated its page on Musk to include a reference to the much-debated stiff-armed salute he made at a Trump inaugural event, he posted on X that “since legacy media propaganda is considered a ‘valid’ source by Wikipedia, it naturally simply becomes an extension of legacy media propaganda!” He urged people not to donate to the site: “Defund Wikipedia until balance is restored!” It’s worth taking a look at how the incident is described on Musk’s page, quite far down, and judging for yourself. What I see is a paragraph that first describes the physical gesture (“Musk thumped his right hand over his heart, fingers spread wide, and then extended his right arm out, emphatically, at an upward angle, palm down and fingers together”), goes on to say that “some” viewed it as a Nazi or a Roman salute, then quotes Musk disparaging those claims as “politicized,” while noting that he did not explicitly deny them. (There is also now a separate Wikipedia article, “Elon Musk salute controversy,” that goes into detail about the full range of reactions.)

This is not the first time Musk has gone after the site. In December, he posted on X, “Stop donating to Wokepedia.” And that wasn’t even his first bad Wikipedia pun. “I will give them a billion dollars if they change their name to Dickipedia,” he wrote, in an October, 2023, post. It seemed to be an ego thing at first. Musk objected to being described on his page as an “early investor” in Tesla, rather than as a founder, which is how he prefers to be identified, and seemed frustrated that he couldn’t just buy the site. But lately Musk’s beef has merged with a general conviction on the right that Wikipedia—which, like all encyclopedias, is a tertiary source that relies on original reporting and research done by other media and scholars—is biased against conservatives.

The Heritage Foundation, the think tank behind the Project 2025 policy blueprint, has plans to unmask Wikipedia editors who maintain their privacy using pseudonyms (these usernames are displayed in the article history but don’t necessarily make it easy to identify the people behind them) and whose contributions on Israel it deems antisemitic…(More)”.

To Stop Tariffs, Trump Demands Opioid Data That Doesn’t Yet Exist


Article by Josh Katz and Margot Sanger-Katz: “One month ago, President Trump agreed to delay tariffs on Canada and Mexico after the two countries agreed to help stem the flow of fentanyl into the United States. On Tuesday, the Trump administration imposed the tariffs anyway, saying that the countries had failed to do enough — and claiming that tariffs would be lifted only when drug deaths fall.

But the administration has seemingly established an impossible standard. Real-time, national data on fentanyl overdose deaths does not exist, so there is no way to know whether Canada and Mexico were able to “adequately address the situation” since February, as the White House demanded.

“We need to see material reduction in autopsied deaths from opioids,” said Howard Lutnick, the commerce secretary, in an interview on CNBC on Tuesday, indicating that such a decline would be a precondition to lowering tariffs. “But you’ve seen it — it has not been a statistically relevant reduction of deaths in America.”

In a way, Mr. Lutnick is correct that there is no evidence that overdose deaths have fallen in the last month — since there is no such national data yet. His stated goal to measure deaths again in early April will face similar challenges.

But data through September shows that fentanyl deaths had already been falling at a statistically significant rate for months, causing overall drug deaths to drop at a pace unlike any seen in more than 50 years of recorded drug overdose mortality data.

The declines can be seen in provisional data from the Centers for Disease Control and Prevention, which compiles death records from states, which in turn collect data from medical examiners and coroners in cities and towns. Final national data generally takes more than a year to produce. But, as the drug overdose crisis has become a major public health emergency in recent years, the C.D.C. has been publishing monthly data, with some holes, at around a four-month lag…(More)”.

Open Data Under Attack: How to Find Data and Why It Is More Important Than Ever


Article by Jessica Hilburn: “This land was made for you and me, and so was the data collected with our taxpayer dollars. Open data is data that is accessible, shareable, and able to be used by anyone. While any person, company, or organization can create and publish open data, the federal and state governments are by far the largest providers of open data.

President Barack Obama codified the importance of government-created open data in his May 9, 2013, executive order as a part of the Open Government Initiative. This initiative was meant to “ensure the public trust and establish a system of transparency, public participation, and collaboration” in furtherance of strengthening democracy and increasing efficiency. The initiative also launched Project Open Data (since replaced by the Resources.data.gov platform), which documented best practices and offered tools so government agencies in every sector could open their data and contribute to the collective public good. As has been made readily apparent, the era of public good through open data is now under attack.

Immediately after his inauguration, President Donald Trump signed a slew of executive orders, many of which targeted diversity, equity, and inclusion (DEI) for removal in federal government operations. Unsurprisingly, a large number of federal datasets include information dealing with diverse populations, equitable services, and inclusion of marginalized groups. Other datasets deal with information on topics targeted by those with nefarious agendas—vaccination rates, HIV/AIDS, and global warming, just to name a few. In the wake of these executive orders, datasets and website pages with blacklisted topics, tags, or keywords suddenly disappeared—more than 8,000 of them. In addition, President Trump fired the National Archivist, and top National Archives and Records Administration officials are being ousted, putting the future of our collective history at enormous risk.

While it is common practice to archive websites and information in the transition between administrations, it is unprecedented for the incoming administration to cull data altogether. In response, unaffiliated organizations are ramping up efforts to separately archive data and information for future preservation and access. Web scrapers are being used to grab as much data as possible, but since this method is automated, data requiring a login or bot challenger (like a captcha) is left behind. The future information gap that researchers will be left to grapple with could be catastrophic for progress in crucial areas, including weather, natural disasters, and public health. Though there are efforts to put out the fire, such as the federal order to restore certain resources, the people’s library is burning. The losses will be permanently felt…Data is a weapon, whether we like it or not. Free and open access to information—about democracy, history, our communities, and even ourselves—is the foundation of library service. It is time for anyone who continues to claim that libraries are not political to wake up before it is too late. Are libraries still not political when the Pentagon barred library access for tens of thousands of American children attending Pentagon schools on military bases while they examined and removed supposed “radical indoctrination” books? Are libraries still not political when more than 1,000 unique titles are being targeted for censorship annually, and soft censorship through preemptive restriction to avoid controversy is surely occurring and impossible to track? It is time for librarians and library workers to embrace being political.

In a country where the federal government now denies that certain people even exist, claims that children are being indoctrinated because they are being taught the good and bad of our nation’s history, and rescinds support for the arts, humanities, museums, and libraries, there is no such thing as neutrality. When compassion and inclusion are labeled the enemy and the diversity created by our great American experiment is lambasted as a social ill, claiming that libraries are neutral or apolitical is not only incorrect, it’s complicit. To update the quote, information is the weapon in the war of ideas. Librarians are the stewards of information. We don’t want to be the Americans who protested in 1933 at the first Nazi book burnings and then, despite seeing the early warning signs of catastrophe, retreated into the isolation of their own concerns. The people’s library is on fire. We must react before all that is left of our profession is ash…(More)”.

Data Sovereignty and Open Sharing: Reconceiving Benefit-Sharing and Governance of Digital Sequence Information


Paper by Masanori Arita: “There are ethical, legal, and governance challenges surrounding data, particularly in the context of digital sequence information (DSI) on genetic resources. I focus on the shift in the international framework, as exemplified by the CBD-COP15 decision on benefit-sharing from DSI and discuss the growing significance of data sovereignty in the age of AI and synthetic biology. Using the example of the COVID-19 pandemic, the tension between open science principles and data control rights is explained. This opinion also highlights the importance of inclusive and equitable data sharing frameworks that respect both privacy and sovereign data rights, stressing the need for international cooperation and equitable access to data to reduce global inequalities in scientific and technological advancement…(More)”.

AI crawler wars threaten to make the web more closed for everyone


Article by Shayne Longpre: “We often take the internet for granted. It’s an ocean of information at our fingertips—and it simply works. But this system relies on swarms of “crawlers”—bots that roam the web, visit millions of websites every day, and report what they see. This is how Google powers its search engines, how Amazon sets competitive prices, and how Kayak aggregates travel listings. Beyond the world of commerce, crawlers are essential for monitoring web security, enabling accessibility tools, and preserving historical archives. Academics, journalists, and civil societies also rely on them to conduct crucial investigative research.  

Crawlers are endemic. Now representing half of all internet traffic, they will soon outpace human traffic. This unseen subway of the web ferries information from site to site, day and night. And as of late, they serve one more purpose: Companies such as OpenAI use web-crawled data to train their artificial intelligence systems, like ChatGPT. 

Understandably, websites are now fighting back for fear that this invasive species—AI crawlers—will help displace them. But there’s a problem: This pushback is also threatening the transparency and open borders of the web, that allow non-AI applications to flourish. Unless we are thoughtful about how we fix this, the web will increasingly be fortified with logins, paywalls, and access tolls that inhibit not just AI but the biodiversity of real users and useful crawlers…(More)”.

Recommendations for Better Sharing of Climate Data


Creative Commons: “…the culmination of a nine-month research initiative from our Open Climate Data project. These guidelines are a result of collaboration between Creative Commons, government agencies and intergovernmental organizations. They mark a significant milestone in our ongoing effort to enhance the accessibility, sharing, and reuse of open climate data to address the climate crisis. Our goal is to share strategies that align with existing data sharing principles and pave the way for a more interconnected and accessible future for climate data.

Our recommendations offer practical steps and best practices, crafted in collaboration with key stakeholders and organizations dedicated to advancing open practices in climate data. We provide recommendations for 1) legal and licensing terms, 2) using metadata values for attribution and provenance, and 3) management and governance for better sharing.

Opening climate data requires an examination of the public’s legal rights to access and use the climate data, often dictated by copyright and licensing. This legal detail is sometimes missing from climate data sharing and legal interoperability conversations. Our recommendations suggest two options: Option A: CC0 + Attribution Request, in order to maximize reuse by dedicating climate data to the public domain, plus a request for attribution; and Option B: CC BY 4.0, for retaining data ownership and legal enforcement of attribution. We address how to navigate license stacking and attribution stacking for climate data hosts and for users working with multiple climate data sources.

We also propose standardized human- and machine-readable metadata values that enhance transparency, reduce guesswork, and ensure broader accessibility to climate data. We built upon existing model metadata schemas and standards, including those that address license and attribution information. These recommendations address a gap and provide metadata schema that standardize the inclusion of upfront, clear values related to attribution, licensing and provenance.

Lastly, we highlight four key aspects of effective climate data management: designating a dedicated technical managing steward, designating a legal and/or policy steward, encouraging collaborative data sharing, and regularly revisiting and updating data sharing policies in accordance with parallel open data policies and standards…(More)”.

Empowering open data sharing for social good: a privacy-aware approach


Paper by Tânia Carvalho et al: “The Covid-19 pandemic has affected the world at multiple levels. Data sharing was pivotal for advancing research to understand the underlying causes and implement effective containment strategies. In response, many countries have facilitated access to daily cases to support research initiatives, fostering collaboration between organisations and making such data available to the public through open data platforms. Despite the several advantages of data sharing, one of the major concerns before releasing health data is its impact on individuals’ privacy. Such a sharing process should adhere to state-of-the-art methods in Data Protection by Design and by Default. In this paper, we use a Covid-19 data set from Portugal’s second-largest hospital to show how it is feasible to ensure data privacy while improving the quality and maintaining the utility of the data. Our goal is to demonstrate how knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility in de-identified data…(More).”

Why Digital Public Goods, including AI, Should Depend on Open Data


Article by Cable Green: “Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors…(More)”.