New report by Laura Drees and Daniel Castro at the Center for Data Innovation: “This report provides a snapshot of states’ efforts to create open data policies and portals and ranks states on their progress. The six top-scoring states are Hawaii, Illinois, Maryland, New York, Oklahoma, and Utah. Each of these states has established an open data policy that requires basic government data, such as expenditure information, as well as other agency data, to be published on their open data portals in a machine-readable format. These portals contain extensive catalogs of open data, are relatively simple to navigate, and provide data in machine-readable formats as required. The next highest-ranked state, Connecticut, offers a similarly serviceable, machine-readable open data portal that provides wide varieties of information, but its policy does not requiremachine readability. Of the next three top-ranking states, Texas’s and Rhode Island’s policies require neither machine readability nor government data beyond expenditures; New Hampshire’s policy requires machine readability and many types of data, but its open data portal is not yet fully functional. States creating new open data policies or portals, or refreshing old ones, have many opportunities to learn from the experiences of early adopters in order to fully realize the benefits of data-driven innovation.”
Download printer-friendly PDF.
Opening Health Data: What Do Researchers Want? Early Experiences With New York's Open Health Data Platform.
Paper by Martin, Erika G. PhD, MPH; Helbig, Natalie PhD, MPA; and Birkhead, Guthrie S. MD, MPH in the Journal of Public Health Management & Practice: “Governments are rapidly developing open data platforms to improve transparency and make information more accessible. New York is a leader, with currently the only state platform devoted to health. Although these platforms could build public health departments’ capabilities to serve more researchers, agencies have little guidance on releasing meaningful and usable data.
Objective: Structured focus groups with researchers and practitioners collected stakeholder feedback on potential uses of open health data and New York’s open data strategy….
Results: There was low awareness of open data, with 67% of researchers reporting never using open data portals prior to the workshop. Participants were interested in data sets that were geocoded, longitudinal, or aggregated to small area granularity and capabilities to link multiple data sets. Multiple environmental conditions and barriers hinder their capacity to use health data for research. Although open data platforms cannot address all barriers, they provide multiple opportunities for public health research and practice, and participants were overall positive about the state’s efforts to release open data.
Conclusions: Open data are not ideal for some researchers because they do not contain individually identifiable data, indicating a need for tiered data release strategies. However, they do provide important new opportunities to facilitate research and foster collaborations among agencies, researchers, and practitioners.”
How you can help build a more agile government
Luke Fretwell at GovFresh: “Earlier this year, I began doing research work with CivicActions on agile development in government — who was doing it, how and what the needs were to successfully get it deployed.
After the Healthcare.gov launch mishaps, calls for agile practices as the panacea to all of government IT woes reached a high. While agile as the ultimate solution oversimplifies the issue, we’ve evolved as a profession (both software development and public service) that moving towards an iterative approach to operations is the way of the future.
My own formal introduction with agile began with my work with CivicActions, so the research coincided with an introductory immersion into how government is using it. Having been involved with startups for the past 15 years, iterative development is the norm, however, the layer of project management processes has forced me to be a better professional overall.
What I’ve found through many discussions and interviews is that you can’t just snap your fingers and execute agile within the framework of government bureaucracy. There are a number of issues — from procurement to project management training to executive-level commitment to organizational-wide culture change — that hinder its adoption. For IT, launching a new website or app is this easy part. Changing IT operational processes and culture is often overlooked or avoided, especially for a short-term executive, because they reach into the granular organizational challenges most people don’t want to bother with.
After talking with a number of agile government and private sector practitioners, it was clear there was enthusiasm around how it could be applied to fundamentally change the way government works. Beyond just execution from professional project management professionals, everyone I spoke with talked about how deploying agile gives them a stronger sense of public service.
What came from these discussions is the desire to have a stronger community of practitioners and those interested in deploying it to better support one another.
To meet that need, a group of federal, state, local government and private sector professionals have formed Agile for Gov, a “community-powered network of agile government professionals.”…
Monitoring Arms Control Compliance With Web Intelligence
Chris Holden and Maynard Holliday at Commons Lab: “Traditional monitoring of arms control treaties, agreements, and commitments has required the use of National Technical Means (NTM)—large satellites, phased array radars, and other technological solutions. NTM was a good solution when the treaties focused on large items for observation, such as missile silos or nuclear test facilities. As the targets of interest have shrunk by orders of magnitude, the need for other, more ubiquitous, sensor capabilities has increased. The rise in web-based, or cloud-based, analytic capabilities will have a significant influence on the future of arms control monitoring and the role of citizen involvement.
Since 1999, the U.S. Department of State has had at its disposal the Key Verification Assets Fund (V Fund), which was established by Congress. The Fund helps preserve critical verification assets and promotes the development of new technologies that support the verification of and compliance with arms control, nonproliferation, and disarmament requirements.
Sponsored by the V Fund to advance web-based analytic capabilities, Sandia National Laboratories, in collaboration with Recorded Future (RF), synthesized open-source data streams from a wide variety of traditional and nontraditional web sources in multiple languages along with topical texts and articles on national security policy to determine the efficacy of monitoring chemical and biological arms control agreements and compliance. The team used novel technology involving linguistic algorithms to extract temporal signals from unstructured text and organize that unstructured text into a multidimensional structure for analysis. In doing so, the algorithm identifies the underlying associations between entities and events across documents and sources over time. Using this capability, the team analyzed several events that could serve as analogs to treaty noncompliance, technical breakout, or an intentional attack. These events included the H7N9 bird flu outbreak in China, the Shanghai pig die-off and the fungal meningitis outbreak in the United States last year.
For H7N9 we found that open source social media were the first to report the outbreak and give ongoing updates. The Sandia RF system was able to roughly estimate lethality based on temporal hospitalization and fatality reporting. For the Shanghai pig die-off the analysis tracked the rapid assessment by Chinese authorities that H7N9 was not the cause of the pig die-off as had been originally speculated. Open source reporting highlighted a reduced market for pork in China due to the very public dead pig display in Shanghai. Possible downstream health effects were predicted (e.g., contaminated water supply and other overall food ecosystem concerns). In addition, legitimate U.S. food security concerns were raised based on the Chinese purchase of the largest U.S. pork producer (Smithfield) because of a fear of potential import of tainted pork into the United States….
To read the full paper, please click here.”
The infrastructure Africa really needs is better data reporting
Quartz: “This week African leaders met with officials in Washington and agreed to billions of dollars of US investments and infrastructure deals. at But the terrible state of statistical reporting in most of Africa means that it will be nearly impossible to gauge how effective these deals are at making Africans, or the American investors, better off.
Data reporting on the continent is sketchy. Just look at the recent GDP revisions of large countries. How is it that Nigeria’s April GDP recalculation catapulted it ahead of South Africa, making it the largest economy in Africa overnight? Or that Kenya’s economy is actually 20% larger (paywall) than previously thought?
Indeed, countries in Africa get noticeably bad scores on the World Bank’s Bulletin Board on Statistical Capacity, an index of data reporting integrity.
A recent working paper from the Center for Global Development (CGD) shows how politics influence the statistics released by many African countries…
But in the long run, dodgy statistics aren’t good for anyone. They “distort the way we understand the opportunities that are available,” says Amanda Glassman, one of the CGD report’s authors. US firms have pledged $14 billion in trade deals at the summit in Washington. No doubt they would like to know whether high school enrollment promises to create a more educated workforce in a given country, or whether its people have been immunized for viruses.
Overly optimistic indicators also distort how a government decides where to focus its efforts. If school enrollment appears to be high, why implement programs intended to increase it?
The CGD report suggests increased funding to national statistical agencies, and making sure that they are wholly independent from their governments. President Obama is talking up $7 billion into African agriculture. But unless cash and attention are given to improving statistical integrity, he may never know whether that investment has borne fruit”
Using technology, data and crowdsourcing to hack infrastructure problems
Courtney M. Fowler at CAFWD.ORG: “Technology has become a way of life for most Americans, not just for communication but also for many daily activities. However, there’s more that can be done than just booking a trip or crushing candy. With a majority of Americans now owning smartphones, it’s only becoming more obvious that there’s room for governments to engage the public and provide more bang for their buck via technology.
CA Fwd has been putting on an “Open Data roadshow” around the state to highlight ways the marriage of tech and info can make government more efficient and transparent.
Jurisdictions have also been discovering that using technology and smartphone apps can be beneficial in the pursuit of improving infrastructure. Saving any amount of money on such projects is especially important for California, where it’s been estimated the state will only have half of the $765 billion needed for infrastructure investments over the next decade.
One of the best examples of applying technology to infrastructure problems comes from South Carolina, where an innovative bridge-monitoring system is producing real savings, despite being in use on only eight bridges.
Girder sensors are placed on each bridge so that they can measure its carrying capacity and can be monitored 24/7. Although, the monitors don’t eliminate the need for inspections, the technology does make the need for them significantly less frequent. Data from the monitors also led the South Carolina Department of Transportation to correct one bridge’s problems with a $100,000 retrofit, rather than spending $800,000 to replace it…”
In total, having the monitors on just eight bridges, at a cost of about $50,000 per bridge, saved taxpayers $5 million.
That kind of innovation and savings is exactly what California needs to ensure that infrastructure projects happen in a more timely and efficient fashion in the future. It’s also what is driving civic innovators to bring together technology and crowdsourcing and make sure infrastructure projects also are results oriented.
App enables citizens to report water waste in drought regions
Springwise: “Rallying citizens to take a part in looking after the community they live in has become easier thanks to smartphones. In the past, the Creek Watch app has enabled anyone to help monitor their local water quality by sending data back to the state water board. Now Everydrop LA wants to use similar techniques to avoid drought in California, encouraging residents to report incidents of water wastage.
According to the team behind the app — which also created the CitySourced platform for engaging users in civic issues — even the smallest amount of water wastage can lead to meaningful losses over time. A faucet that drips just once a minute will lose over 2000 gallons of drinkable water each year. Using the Everydrop LA, citizens can report the location of leaking faucets and fire hydrants as well as occurrences of blatant water wastage. They can also see how much water is being wasted in their local area and learn about what they can do to cut their own water usage. In times when drought is a risk, the app notifies users to conserve. Cities and counties can use the data in their reports and learn more about how water wastage is affecting their jurisdiction.”
Fifteen open data insights
Tim Davies from ODRN: “…below are the 15 points from the three-page briefing version, and you can find a full write-up of these points for download. You can also find reports from all the individual project partners, including a collection of quick-read research posters over on the Open Data Research Network website.
15 insights into open data supply, use and impacts
(1) There are many gaps to overcome before open data availability, can lead to widespread effective use and impact. Open data can lead to change through a ‘domino effect’, or by creating ripples of change that gradually spread out. However, often many of the key ‘domino pieces’ are missing, and local political contexts limit the reach of ripples. Poor data quality, low connectivity, scarce technical skills, weak legal frameworks and political barriers may all prevent open data triggering sustainable change. Attentiveness to all the components of open data impact is needed when designing interventions.
(2) There is a frequent mismatch between open data supply and demand in developing countries. Counting datasets is a poor way of assessing the quality of an open data initiative. The datasets published on portals are often the datasets that are easiest to publish, not the datasets most in demand. Politically sensitive datasets are particularly unlikely to be published without civil society pressure. Sometimes the gap is on the demand side – as potential open data users often do not articulate demands for key datasets.
(3) Open data initiatives can create new spaces for civil society to pursue government accountability and effectiveness. The conversation around transparency and accountability that ideas of open data can support is as important as the datasets in some developing countries.
(4) Working on open data projects can change how government creates, prepares and uses its own data. The motivations behind an open data initiative shape how government uses the data itself. Civil society and entrepreneurs interacting with government through open data projects can help shape government data practices. This makes it important to consider which intermediaries gain insider roles shaping data supply.
(5) Intermediaries are vital to both the supply and the use of open data. Not all data needed for governance in developing countries comes from government. Intermediaries can create data, articulate demands for data, and help translate open data visions from political leaders into effective implementations. Traditional local intermediaries are an important source of information, in particular because they are trusted parties.
(6) Digital divides create data divides in both the supply and use of data. In some developing countries key data is not digitised, or a lack of technical staff has left data management patchy and inconsistent. Where Internet access is scarce, few citizens can have direct access to data or services built with it. Full access is needed for full empowerment, but offline intermediaries, including journalists and community radio stations, also play a vital role in bridging the gaps between data and citizens.
(7) Where information is already available and used, the shift to open data involves data evolution rather than data revolution. Many NGOs and intermediaries already access the information which is now becoming available as data. Capacity building should start from existing information and data practices in organisations, and should look for the step-by-step gains to be made from a data-driven approach.
(8) Officials’ fears about the integrity of data are a barrier to more machine-readable data being made available. The publication of data as PDF or in scanned copies is often down to a misunderstanding of how open data works. Only copies can be changed, and originals can be kept authoritative. Helping officials understand this may help increase the supply of data.
(9) Very few datasets are clearly openly licensed, and there is low understanding of what open licenses entail. There are mixed opinions on the importance of a focus on licensing in different contexts. Clear licenses are important to building a global commons of interoperable data, but may be less relevant to particular uses of data on the ground. In many countries wider conversation about licensing are yet to take place.
(10) Privacy issues are not on the radar of most developing country open data projects, although commercial confidentiality does arise as a reason preventing greater data transparency. Much state held data is collected either from citizens or from companies. Few countries in the ODDC study have weak or absent privacy laws and frameworks, yet participants in the studies raised few personal privacy considerations. By contrast, a lack of clarity, and officials’ concerns, about potential breaches of commercial confidentiality when sharing data gathered from firms was a barrier to opening data.
(11) There is more to open data than policies and portals. Whilst central open data portals act as a visible symbol of open data initiatives, a focus on portal building can distract attention from wider reforms. Open data elements can also be built on existing data sharing practices, and data made available through the locations where citizens, NGOs are businesses already go to access information.
(12) Open data advocacy should be aware of, and build upon, existing policy foundations in specific countries and sectors. Sectoral transparency policies for local government, budget and energy industry regulation, amongst others, could all have open data requirements and standards attached, drawing on existing mechanisms to secure sustainable supplies of relevant open data in developing countries. In addition, open data conversations could help make existing data collection and disclosure requirements fit better with the information and data demands of citizens.
(13) Open data is not just a central government issue: local government data, city data, and data from the judicial and legislative branches are all important. Many open data projects focus on the national level, and only on the executive branch. However, local government is closer to citizens, urban areas bring together many of the key ingredients for successful open data initiatives, and transparency in other branches of government is important to secure citizens democratic rights.
(14) Flexibility is needed in the application of definitions of open data to allow locally relevant and effective open data debates and advocacy to emerge. Open data is made up of various elements, including proactive publication, machine-readability and permissions to re-use. Countries at different stages of open data development may choose to focus on one or more of these, but recognising that adopting all elements at once could hinder progress. It is important to find ways to both define open data clearly, and to avoid a reductive debate that does not recognise progressive steps towards greater openness.
(15) There are many different models for an open data initiative: including top-down, bottom-up and sector-specific. Initiatives may also be state-led, civil society-led and entrepreneur-led in their goals and how they are implemented – with consequences for the resources and models required to make them sustainable. There is no one-size-fits-all approach to open data. More experimentation, evaluation and shared learning on the components, partners and processes for putting open data ideas into practice must be a priority for all who want to see a world where open-by-default data drives real social, political and economic change.
You can read more about each of these points in the full report.”
The Quiet Revolution: Open Data Is Transforming Citizen-Government Interaction
Maury Blackman at Wired: “The public’s trust in government is at an all-time low. This is not breaking news.
But what if I told you that just this past May, President Obama signed into law a bill that passed Congress with unanimous support. A bill that could fundamentally transform the way citizens interact with their government. This legislation could also create an entirely new, trillion-dollar industry right here in the U.S. It could even save lives.
On May 9th, the Digital Accountability and Transparency Act of 2014 (DATA Act) became law. There were very few headlines, no Rose Garden press conference.
I imagine most of you have never heard of the DATA Act. The bill with the nerdy name has the potential to revolutionize government. It requires federal agencies to make their spending data available in standardized, publicly accessible formats. Supporters of the legislation included Tea Partiers and the most liberal Democrats. But the bill is only scratches the surface of what’s possible.
So What’s the Big Deal?
On his first day in Office, President Obama signed a memorandum calling for a more open and transparent government. The President wrote, “Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” This was followed by the creation of Data.gov, a one-stop shop for all government data. The site does not just include financial data, but also a wealth of other information related to education, public safety, climate and much more—all available in open and machine-readable format. This has helped fuel an international movement.
Tech minded citizens are building civic apps to bring government into the digital age; reporters are now more able to connect the dots easier, not to mention the billions of taxpayer dollars saved. And last year the President took us a step further. He signed an Executive Order making open government data the default option.
Cities and states have followed Washington’s lead with similar open data efforts on the local level. In San Francisco, the city’s Human Services Agency has partnered with Promptly; a text message notification service that alerts food stamp recipients (CalFresh) when they are at risk of being disenrolled from the program. This service is incredibly beneficial, because most do not realize any change in status, until they are in the grocery store checkout line, trying to buy food for their family.
Other products and services created using open data do more than just provide an added convenience—they actually have the potential to save lives. The PulsePoint mobile app sends text messages to citizens trained in CPR when someone in walking distance is experiencing a medical emergency that may require CPR. The app is currently available in almost 600 cities in 18 states, which is great. But shouldn’t a product this valuable be available to every city and state in the country?…”
Request for Proposals: Exploring the Implications of Government Release of Large Datasets
“The Berkeley Center for Law & Technology and Microsoft are issuing this request for proposals (RFP) to fund scholarly inquiry to examine the civil rights, human rights, security and privacy issues that arise from recent initiatives to release large datasets of government information to the public for analysis and reuse. This research may help ground public policy discussions and drive the development of a framework to avoid potential abuses of this data while encouraging greater engagement and innovation.
This RFP seeks to:
- Gain knowledge of the impact of the online release of large amounts of data generated by citizens’ interactions with government
- Imagine new possibilities for technical, legal, and regulatory interventions that avoid abuse
- Begin building a body of research that addresses these issues
– BACKGROUND –
Governments at all levels are releasing large datasets for analysis by anyone for any purpose—“Open Data.” Using Open Data, entrepreneurs may create new products and services, and citizens may use it to gain insight into the government. A plethora of time saving and other useful applications have emerged from Open Data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods. Sometimes governments release large datasets in order to encourage the development of unimagined new applications. For instance, New York City has made over 1,100 databases available, some of which contain information that can be linked to individuals, such as a parking violation database containing license plate numbers and car descriptions.
Data held by the government is often implicitly or explicitly about individuals—acting in roles that have recognized constitutional protection, such as lobbyist, signatory to a petition, or donor to a political cause; in roles that require special protection, such as victim of, witness to, or suspect in a crime; in the role as businessperson submitting proprietary information to a regulator or obtaining a business license; and in the role of ordinary citizen. While open government is often presented as an unqualified good, sometimes Open Data can identify individuals or groups, leading to a more transparent citizenry. The citizen who foresees this growing transparency may be less willing to engage in government, as these transactions may be documented and released in a dataset to anyone to use for any imaginable purpose—including to deanonymize the database—forever. Moreover, some groups of citizens may have few options or no choice as to whether to engage in governmental activities. Hence, open data sets may have a disparate impact on certain groups. The potential impact of large-scale data and analysis on civil rights is an area of growing concern. A number of civil rights and media justice groups banded together in February 2014 to endorse the “Civil Rights Principles for the Era of Big Data” and the potential of new data systems to undermine longstanding civil rights protections was flagged as a “central finding” of a recent policy review by White House adviser John Podesta.
The Berkeley Center for Law & Technology (BCLT) and Microsoft are issuing this request for proposals in an effort to better understand the implications and potential impact of the release of data related to U.S. citizens’ interactions with their local, state and federal governments. BCLT and Microsoft will fund up to six grants, with a combined total of $300,000. Grantees will be required to participate in a workshop to present and discuss their research at the Berkeley Technology Law Journal (BTLJ) Spring Symposium. All grantees’ papers will be published in a dedicated monograph. Grantees’ papers that approach the issues from a legal perspective may also be published in the BTLJ. We may also hold a followup workshop in New York City or Washington, DC.
While we are primarily interested in funding proposals that address issues related to the policy impacts of Open Data, many of these issues are intertwined with general societal implications of “big data.” As a result, proposals that explore Open Data from a big data perspective are welcome; however, proposals solely focused on big data are not. We are open to proposals that address the following difficult question. We are also open to methods and disciplines, and are particularly interested in proposals from cross-disciplinary teams.
- To what extent does existing Open Data made available by city and state governments affect individual profiling? Do the effects change depending on the level of aggregation (neighborhood vs. cities)? What releases of information could foreseeably cause discrimination in the future? Will different groups in society be disproportionately impacted by Open Data?
- Should the use of Open Data be governed by a code of conduct or subject to a review process before being released? In order to enhance citizen privacy, should governments develop guidelines to release sampled or perturbed data, instead of entire datasets? When datasets contain potentially identifiable information, should there be a notice-and-comment proceeding that includes proposed technological solutions to anonymize, de-identify or otherwise perturb the data?
- Is there something fundamentally different about government services and the government’s collection of citizen’s data for basic needs in modern society such as power and water that requires governments to exercise greater due care than commercial entities?
- Companies have legal and practical mechanisms to shield data submitted to government from public release. What mechanisms do individuals have or should have to address misuse of Open Data? Could developments in the constitutional right to information policy as articulated in Whalen and Westinghouse Electric Co address Open Data privacy issues?
- Collecting data costs money, and its release could affect civil liberties. Yet it is being given away freely, sometimes to immensely profitable firms. Should governments license data for a fee and/or impose limits on its use, given its value?
- The privacy principle of “collection limitation” is under siege, with many arguing that use restrictions will be more efficacious for protecting privacy and more workable for big data analysis. Does the potential of Open Data justify eroding state and federal privacy act collection limitation principles? What are the ethical dimensions of a government system that deprives the data subject of the ability to obscure or prevent the collection of data about a sensitive issue? A move from collection restrictions to use regulation raises a number of related issues, detailed below.
- Are use restrictions efficacious in creating accountability? Consumer reporting agencies are regulated by use restrictions, yet they are not known for their accountability. How could use regulations be implemented in the context of Open Data efficaciously? Can a self-learning algorithm honor data use restrictions?
- If an Open Dataset were regulated by a use restriction, how could individuals police wrongful uses? How would plaintiffs overcome the likely defenses or proof of facts in a use regulation system, such as a burden to prove that data were analyzed and the product of that analysis was used in a certain way to harm the plaintiff? Will plaintiffs ever be able to beat first amendment defenses?
- The President’s Council of Advisors on Science and Technology big data report emphasizes that analysis is not a “use” of data. Such an interpretation suggests that NSA metadata analysis and large-scale scanning of communications do not raise privacy issues. What are the ethical and legal implications of the “analysis is not use” argument in the context of Open Data?
- Open Data celebrates the idea that information collected by the government can be used by another person for various kinds of analysis. When analysts are not involved in the collection of data, they are less likely to understand its context and limitations. How do we ensure that this knowledge is maintained in a use regulation system?
- Former President William Clinton was admitted under a pseudonym for a procedure at a New York Hospital in 2004. The hospital detected 1,500 attempts by its own employees to access the President’s records. With snooping such a tempting activity, how could incentives be crafted to cause self-policing of government data and the self-disclosure of inappropriate uses of Open Data?
- It is clear that data privacy regulation could hamper some big data efforts. However, many examples of big data successes hail from highly regulated environments, such as health care and financial services—areas with statutory, common law, and IRB protections. What are the contours of privacy law that are compatible with big data and Open Data success and which are inherently inimical to it?
- In recent years, the problem of “too much money in politics” has been addressed with increasing disclosure requirements. Yet, distrust in government remains high, and individuals identified in donor databases have been subjected to harassment. Is the answer to problems of distrust in government even more Open Data?
- What are the ethical and epistemological implications of encouraging government decision-making based upon correlation analysis, without a rigorous understanding of cause and effect? Are there decisions that should not be left to just correlational proof? While enthusiasm for data science has increased, scientific journals are elevating their standards, with special scrutiny focused on hypothesis-free, multiple comparison analysis. What could legal and policy experts learn from experts in statistics about the nature and limits of open data?…
To submit a proposal, visit the Conference Management Toolkit (CMT) here.
Once you have created a profile, the site will allow you to submit your proposal.
If you have questions, please contact Chris Hoofnagle, principal investigator on this project.”