Git for Law Revisited


Ari Hershowitz at Linked Legislation: “Laws change. Each time a new U.S. law is enacted, it enters a backdrop of approximately 22 million words of existing law. The new law may strike some text, add some text, and make other adjustments that trickle through the legal corpus. Seeing these changes in context would help lawmakers and the public better understand their impact.

To software engineers, this problem sounds like a perfect application for automated change management. Input an amendment, output tracked changes (see sample below). In the ideal system such changes could be seen as soon as the law is enacted — or even while a bill is being debated. We are now much closer to this ideal.

Changes to 16 U.S.C. 3835 by law 113-79

On Quora, on this blog, and elsewhere, I’ve discussed some of the challenges to using git, an automated change management system, to track laws. The biggest technical challenge has been that most laws, and most amendments to those laws, have not been structured in a computer friendly way. But that is changing.

The Law Revision Counsel (LRC) compiles the U.S. Code, through careful analysis of new laws, identifying the parts of existing law that will be changed (in a process called Classification), and making those changes by hand. The drafting and revision process takes great skill and legal expertise.

So, for example, the LRC makes changes to the current U.S. Code, following the language of a law such as this one:

Sample provision, 113-79 section 2006(a)

LRC attorneys identify the affected provisions of the U.S. Code and then carry out each of these instructions (strike “The Secretary”, insert “During fiscal year”…”). Since 2011, the LRC is using and publishing the final result of this analysis in XML format. One of the consequences of this format change is that it becomes feasible to automatically match the “before” to the “after” text, and produce a redlined version as seen above, showing the changes in context.

To produce this redlined version, I ran xml_diff, an open-source program written by Joshua Tauberer of govtrack.us, who also works with my company, Xcential, on modernization projects for the U.S. House. The results can be remarkably accurate. As a pre-requisite, it is necessary to have a “before” and “after” version in XML format and a small enough stretch of text to make the comparison manageable….(More)”

Room for a View: Democracy as a Deliberative System


Involve: “Democratic reform comes in waves, propelled by technological, economic, political and social developments. There are periods of rapid change, followed by relative quiet.

We are currently in a period of significant political pressure for change to our institutions of democracy and government. With so many changes under discussion it is critically important that those proposing and carrying out reforms understand the impact that different reforms might have.

Most discussions of democratic reform focus on electoral democracy. However, for all their importance in the democratic system, elections rarely reveal what voters think clearly enough for elected representatives to act on them. Changing the electoral system will not alone significantly increase the level of democratic control held by citizens.

Room for a View, by Involve’s director Simon Burall, looks at democratic reform from a broader perspective than that of elections. Drawing on the work of democratic theorists, it uses a deliberative systems approach to examine the state of UK democracy. Rather than focusing exclusively on the extent to which individuals and communities are represented within institutions, it is equally concerned with the range of views present and how they interact.

Adapting the work of the democratic theorist John Dryzek, the report identifies seven components of the UK’s democratic system, describing and analysing the condition of each in turn. Assessing the UK’s democracy though this lens reveals it to be in fragile health. The representation of alternative views and narratives in all of the UK system’s seven components is poor, the components are weakly connected and, despite some positive signs, deliberative capacity is decreasing.

Room for a View suggests that a focus on the key institutions isn’t enough. If the health of UK democracy is to be improved, we need to move away from thinking about the representation of individual voters to thinking about the representation of views, perspectives and narratives. Doing this will fundamentally change the way we approach democratic reform.

Big data problems we face today can be traced to the social ordering practices of the 19th century.


Hamish Robertson and Joanne Travaglia in LSE’s The Impact Blog: “This is not the first ‘big data’ era but the second. The first was the explosion in data collection that occurred from the early 19th century – Hacking’s ‘avalanche of numbers’, precisely situated between 1820 and 1840. This was an analogue big data era, different to our current digital one but characterized by some very similar problems and concerns. Contemporary problems of data analysis and control include a variety of accepted factors that make them ‘big’ and these generally include size, complexity and technology issues. We also suggest that digitisation is a central process in this second big data era, one that seems obvious but which has also appears to have reached a new threshold. Until a decade or so ago ‘big data’ looked just like a digital version of conventional analogue records and systems. Ones whose management had become normalised through statistical and mathematical analysis. Now however we see a level of concern and anxiety, similar to the concerns that were faced in the first big data era.

This situation brings with it a socio-political dimension of interest to us, one in which our understanding of people and our actions on individuals, groups and populations are deeply implicated. The collection of social data had a purpose – understanding and controlling the population in a time of significant social change. To achieve this, new kinds of information and new methods for generating knowledge were required. Many ideas, concepts and categories developed during that first data revolution remain intact today, some uncritically accepted more now than when they were first developed. In this piece we draw out some connections between these two data ‘revolutions’ and the implications for the politics of information in contemporary society. It is clear that many of the problems in this first big data age and, more specifically, their solutions persist down to the present big data era….Our question then is how do we go about re-writing the ideological inheritance of that first data revolution? Can we or will we unpack the ideological sequelae of that past revolution during this present one? The initial indicators are not good in that there is a pervasive assumption in this broad interdisciplinary field that reductive categories are both necessary and natural. Our social ordering practices have influenced our social epistemology. We run the risk in the social sciences of perpetuating the ideological victories of the first data revolution as we progress through the second. The need for critical analysis grows apace not just with the production of each new technique or technology but with the uncritical acceptance of the concepts, categories and assumptions that emerged from that first data revolution. That first data revolution proved to be a successful anti-revolutionary response to the numerous threats to social order posed by the incredible changes of the nineteenth century, rather than the Enlightenment emancipation that was promised. (More)”

This is part of a wider series on the Politics of Data. For more on this topic, also see Mark Carrigan’sPhilosophy of Data Science interview series and the Discover Society special issue on the Politics of Data (Science).

In post-earthquake Nepal, open data accountability


Deepa Rai at the Worldbank blog: “….Following the earthquake, there was an overwhelming response from technocrats and data crunchers to use data visualizations for disaster risk assessment. The Government of Nepal made datasets available through its Disaster Data Portal and many organizations and individuals also pitched in and produced visual data platforms.
However, the use of open data has not been limited to disaster response. It was, and still is, instrumental in tracking how much funding has been received and how it’s being allocated. Through the use of open data, people can make their own analysis based on the information provided online.

Direct Relief, a not-for-profit company, has collected such information and helped gathered data from the Prime Minister’s relief fund and then created infographics which have been useful for media and immediate distribution on social platforms. MapJournal’s visual maps became vital during the Post Disaster Needs Assessment (PDNA) to assess and map areas where relief and reconstruction efforts were urgently needed.

Direct Relief Medical Relief partner locations
Direct Relief medical relief partner locations in context of population affected and injuries by district
Photo Credit: Data Relief Services

Open data and accountability
However, the work of open data doesn’t end with relief distribution and disaster risk assessment. It is also hugely impactful in keeping track of how relief money is pledged, allocated, and spent. One such web application,openenet.net is making this possible by aggregating post disaster funding data from international and national sources into infographics. “The objective of the system,” reads the website “is to ensure transparency and accountability of relief funds and resources to ensure that it reaches to targeted beneficiaries. We believe that transparency of funds in an open and accessible manner within a central platform is perhaps the first step to ensure effective mobilization of available resources.”
Four months after the earthquake, Nepali media have already started to report on aid spending — or the lack of it. This has been made possible by the use of open data from the Ministry of Home Affairs (MoHA) and illustrates how critical data is for the effective use of aid money.
Open data platforms emerging after the quakes have been crucial in questioning the accountability of aid provisions and ultimately resulting in more successful development outcomes….(More)”

How the USGS uses Twitter data to track earthquakes


Twitter Blog: “After the disastrous Sichuan earthquake in 2008, people turned to Twitter to share firsthand information about the earthquake. What amazed many was the impression that Twitter was faster at reporting the earthquake than the U.S. Geological Survey (USGS), the official government organization in charge of tracking such events.

This Twitter activity wasn’t a big surprise to the USGS. The USGS National Earthquake Information Center (NEIC) processes about 2,000 realtime earthquake sensors, with the majority based in the United States. That leaves a lot of empty space in the world with no sensors. On the other hand, there are hundreds of millions of people using Twitter who can report earthquakes. At first, the USGS staff was a bit skeptical that Twitter could be used as a detection system for earthquakes – but when they looked into it, they were surprised at the effectiveness of Twitter data for detection.

USGS staffers Paul Earle, a seismologist, and Michelle Guy, a software developer, teamed up to look at how Twitter data could be used for earthquake detection and verification. By using Twitter’s Public API, they decided to use the same time series event detection method they use when detecting earthquakes. This gave them a baseline for earthquake-related chatter, but they decided to dig in even further. They found that people Tweeting about actual earthquakes kept their Tweets really short, even just to ask, “earthquake?” Concluding that people who are experiencing earthquakes aren’t very chatty, they started filtering out Tweets with more than seven words. They also recognized that people sharing links or the size of the earthquake were significantly less likely to be offering firsthand reports, so they filtered out any Tweets sharing a link or a number. Ultimately, this filtered stream proved to be very significant at determining when earthquakes occurred globally.

USGS Modeling Twitter Data to Detect Earthquakes

While I was at the USGS office in Golden, Colo. interviewing Michelle and Paul, three earthquakes happened in a relatively short time. Using Twitter data, their system was able to pick up on an aftershock in Chile within one minute and 20 seconds – and it only took 14 Tweets from the filtered stream to trigger an email alert. The other two earthquakes, off Easter Island and Indonesia, weren’t picked up because they were not widely felt…..

The USGS monitors for earthquakes in many languages, and the words used can be a clue as to the magnitude and location of the earthquake. Chile has two words for earthquakes: terremotoand temblor; terremoto is used to indicate a bigger quake. This one in Chile started with people asking if it was a terremoto, but others realizing that it was a temblor.

As the USGS team notes, Twitter data augments their own detection work on felt earthquakes. If they’re getting reports of an earthquake in a populated area but no Tweets from there, that’s a good indicator to them that it’s a false alarm. It’s also very cost effective for the USGS, because they use Twitter’s Public API and open-source software such as Kibana and ElasticSearch to help determine when earthquakes occur….(More)”

Accelerating Citizen Science and Crowdsourcing to Address Societal and Scientific Challenges


Tom Kalil et al at the White House Blog: “Citizen science encourages members of the public to voluntarily participate in the scientific process. Whether by asking questions, making observations, conducting experiments, collecting data, or developing low-cost technologies and open-source code, members of the public can help advance scientific knowledge and benefit society.

Through crowdsourcing – an open call for voluntary assistance from a large group of individuals – Americans can study and tackle complex challenges by conducting research at large geographic scales and over long periods of time in ways that professional scientists working alone cannot easily duplicate. These challenges include understanding the structure of proteins related viruses in order to support development of new medications, or preparing for, responding to, and recovering from disasters.

…OSTP is today announcing two new actions that the Administration is taking to encourage and support the appropriate use of citizen science and crowdsourcing at Federal agencies:

  1. OSTP Director John Holdren, is issuing a memorandum entitled Addressing Societal and Scientific Challenges through Citizen Science and Crowdsourcing. This memo articulates principles that Federal agencies should embrace to derive the greatest value and impact from citizen science and crowdsourcing projects. The memo also directs agencies to take specific actions to advance citizen science and crowdsourcing, including designating an agency-specific coordinator for citizen science and crowdsourcing projects, and cataloguing citizen science and crowdsourcing projects that are open for public participation on a new, centralized website to be created by the General Services Administration: making it easy for people to find out about and join in these projects.
  2. Fulfilling a commitment made in the 2013 Open Government National Action Plan, the U.S. government is releasing the first-ever Federal Crowdsourcing and Citizen Science Toolkit to help Federal agencies design, carry out, and manage citizen science and crowdsourcing projects. The toolkit, which was developed by OSTP in partnership with the Federal Community of Practice for Crowdsourcing and Citizen Science and GSA’s Open Opportunities Program, reflects the input of more than 125 Federal employees from over 25 agencies on ideas, case studies, best management practices, and other lessons to facilitate the successful use of citizen science and crowdsourcing in a Federal context….(More)”

 

Open Science Revolution – New Ways of Publishing Research in The Digital Age


Scicasts: “A massive increase in the power of digital technology over the past decade allows us today to publish any article, blog post or tweet in a matter of seconds.

Much of the information on the web is also free – newspapers are embracing open access to their articles and many websites are copyrighting their content under the Creative Commons licenses, most of which allow the re-use and sharing of the original work at no cost.

As opposed to this openness, science publishing is still lagging behind. Most of the scientific knowledge generated in the past two centuries is hidden behind a paywall, requiring an average reader to pay tens to hundreds of euros to access an original study report written by scientists.

Can we not do things differently?

An answer to this question led to the creation of a number of new concepts that emerged over the past few years. A range of innovative open online science platforms are now trying “to do things differently”, offering researchers alternative ways of publishing their discoveries, making the publishing process faster and more transparent.

Here is a handful of examples, implemented by three companies – a recently launched open access journal Research Ideas and Outcomes (RIO), an open publishing platform F1000Research from The Faculty of 1000 and a research and publishing network ScienceOpen. Each has something different to offer, yet all of them seem to agree that science research should be open and accessible to everyone.

New concept – publish all research outputs

While the two-centuries-old tradition of science publishing lives and dies on exposing only the final outcomes of a research project, the RIO journal suggests a different approach. If we can follow new stories online step by step as they unfold (something that journalists have figured out and use in live reporting), they say, why not apply similar principles to research projects?

“RIO is the first journal that aims at publishing the whole research cycle and definitely the first one, to my knowledge, that tries to do that across all science branches – all of humanities, social sciences, engineering and so on,” says a co-founder of the RIO journal, Prof. Lyubomir Penev, in an interview to Scicasts.

From the original project outline, to datasets, software and methodology, each part of the project can be published separately. “The writing platform ARPHA, which underpins RIO, handles the whole workflow – from the stage when you write the first letter, to the end,” explains Prof. Penev.

At an early stage, the writing process is closed from public view and researchers may invite their collaborators and peers to view their project, add data and contribute to its development. Scientists can choose to publish any part of their project as it progresses – they can submit to the open platform their research idea, hypothesis or a newly developed experimental protocol, alongside future datasets and whole final manuscripts.

Some intermediate research stages and preliminary results can also be submitted to the platform F1000Research, which developed their own online authoring tool F1000Workspace, similar to ARPHA….(More)”

Ethical, Safe, and Effective Digital Data Use in Civil Society


Blog by Lucy Bernholz, Rob Reich, Emma Saunders-Hastings, and Emma Leeds Armstrong: “How do we use digital data ethically, safely, and effectively in civil society. We have developed three early principles for consideration:

  • Default to person-centered consent.
  • Prioritize privacy and minimum viable data collection.
  • Plan from the beginning to open (share) your work.

This post provides a synthesis from a one day workshop that informed these principles. It concludes with links to draft guidelines you can use to inform partnerships between data consultants/volunteers and nonprofit organizations….(More)

These three values — consent, minimum viable data collection, and open sharing- comprise a basic framework for ethical, safe, and effective use of digital data by civil society organizations. They should be integrated into partnerships with data intermediaries and, perhaps, into general data practices in civil society.

We developed two tools to guide conversations between data volunteers and/or consultants and nonprofits. These are downloadable below. Please use them, share them, improve them, and share them again….

  1. Checklist for NGOs and external data consultants
  2. Guidelines for NGOs and external data consultants (More)”

Public service coding: the BBC as an open software developer


Juan Mateos-Garcia at NESTA: “On Monday, the BBC published British, Bold, Creative, a paper where it put forward a vision for its future based on openness and collaboration with its audiences and the UK’s wider creative industries.

In this blog post, we focus on an area where the BBC is already using an open and collaborative model for innovation: software development.

The value of software

Although less visible to the public than its TV, radio and online content programming, the BBC’s software development activities may create value and drive innovation beyond the BBC, providing an example of how the corporation can put its “technology and digital capabilities at the service of the wider industry.

Software is an important form of innovation investment that helps the BBC deliver new products and services, and become more efficient. One might expect that much of the software developed by the BBC would also be of value to other media and digital organisations. Such beneficial “spillovers” are encouraged by the BBC’s use of open source licensing, which enables other organisations to download its software for free, change it as they see fit, and share the results.

Current debates about the future of the BBC – including the questions about its role in influencing the future technology landscape in the Government’s Charter Review Consultation – need to be informed by robust evidence about how it develops software, and the impact that this has.

In this blog post, we use data from the world’s biggest collaborative software development platform, GitHub, to study the BBC as an open software developer.

GitHub gives organisations and individuals hosting space to store their projects (referred to as “repos”), and tools to coordinate development. This includes the option to “fork” (copy) other users’ software, change it and redistribute the improvements. Our key questions are:

  • How active is the BBC on GitHub?
  • How has its presence on GitHub changed over time?
  • What is the level of adoption (forking) of BBC projects on GitHub?
  • What types of open source projects is the BBC developing?
  • Where in the UK and in the rest of the world are the people interested in BBC projects based?

But before tackling these questions, it is important to address a question often raised in relation to open source software:

Why might an organisation like the BBC want to share its valuable code on a platform like GitHub?

There are several possible reasons:

  • Quality: Opening up a software project attracts help from other developers, making it better
  • Adoption: Releasing software openly can help turn it into a widely adopted standard
  • Signalling: It signals the organisation as an interesting place to work and partner with
  • Public value: Some organisations release their code openly with the explicit goal of creating public value

The webpage introducing TAL (Television Application Layer), a BBC project on GitHub, is a case in point: “Sharing TAL should make building applications on TV easier for others, helping to drive the uptake of this nascent technology. The BBC has a history of doing this and we are always looking at new ways to reach our audience.”…(More)

US citizenship and immigration services to host Twitter ‘office hours’


NextGov: “U.S. Citizenship and Immigration Services wants to get more social with prospective immigrants.

USCIS will host its first-ever Twitter office hours Tuesday from 3-4 p.m. using the #AskUSCIS hashtag. Agency officials hope to provide another avenue for customers to ask questions and receive real-time feedback, according to a USCIS blog post.

To participate, customers just have to follow @USCIS on Twitter, use the hashtag and ask away, although the blog post makes clear staff won’t answer case-specific questions and case status updates.

The post also warns Twitter users not to post Social Security numbers, receipt numbers or any other personally identifiable information.

“With Twitter office hours, we want to help you – either as you’re preparing forms or after you’ve filed,” the blog post states.

USCIS will post a transcript of the questions and answers to its blog following the Twitter office hours, and if the concept is successful, the agency plans to host the sessions on a regular basis.

The agency’s social outreach plan is part of broader effort among federal agencies to improve customer experience.

This particular variant of digital engagement mirrors an effort championed by the Office of Federal Student Aid. FAFSA answers questions on the last Wednesday of every month using the hashtag #AskFAFSA, an effort that’s helped build its digital engagement significantly in recent years….(More)”