Amid Open Data Push, Agencies Feel Urge for Analytics


Jack Moore at NextGov: “Federal agencies, thanks to their unique missions, have long been collectors of valuable, vital and, no doubt, arcane data. Under a nearly two-year-old executive order from President Barack Obama, agencies are releasing more of this data in machine-readable formats to the public and entrepreneurs than ever before.
But agencies still need a little help parsing through this data for their own purposes. They are turning to industry, academia and outside researchers for cutting-edge analytics tools to parse through their data to derive insights and to use those insights to drive decision-making.
Take the U.S. Agency for International Development, for example. The agency administers U.S. foreign aid programs aimed at ending extreme poverty and helping support democratic societies around the globe.
Under the agency’s own recent open data policy, it’s started collecting reams of data from its overseas missions. Starting Oct. 1, organizations doing development work on the ground – including through grants and contracts – have been directed to also collect data generated by their work and submit it to back to agency headquarters. Teams go through the data, scrub it to remove sensitive material and then publish it.
The data spans the gamut from information on land ownership in South Sudan to livestock demographics in Senegal and HIV prevention activities in Zambia….The agency took the first step in solving that problem with a Jan. 20 request for information from outside groups for cutting-edge data analytics tools.
“Operating units within USAID are sometimes constrained by existing capacity to transform data into insights that could inform development programming,” the RFI stated.
The RFI queries industry on their capabilities in data mining and social media analytics and forecasting and systems modeling.
USAID is far from alone in its quest for data-driven decision-making.
A Jan. 26 RFI from the Transportation Department’s Federal Highway Administration also seeks innovative ideas from industry for “advanced analytical capabilities.”…(More)”

'From Atoms to Bits': A Visual History of American Ideas


in The Atlantic: “A new paper employs a simple technique—counting words in patent texts—to trace the history of American invention, from chemistry to computers….in a new paper, Mikko Packalen at the University of Waterloo and Jay Bhattacharya of Stanford University, devised a brilliant way to address this question empirically. In short, they counted words in patent texts.

In a series of papers studying the history of American innovation, Packalen and Bhattacharya indexed every one-word, two-word, and three-word phrase that appeared in more than 4 million patent texts in the last 175 years. To focus their search on truly new concepts, they recorded the year those phrases first appeared in a patent. Finally, they ranked each concept’s popularity based on how many times it reappeared in later patents. Essentially, they trawled the billion-word literature of patents to document the birth-year and the lifespan of American concepts, from “plastic” to “world wide web” and “instant messaging.”

Here are the 20 most popular sequences of words in each decade from the 1840s to the 2000s. You can see polymerase chain reactions in the middle of the 1980s stack. Since the timeline, as it appears in the paper, is too wide to be visible on this article page, I’ve chopped it up and inserted the color code both above and below the timeline….

Another theme of Packalen and Bhattacharya’s research is that innovation has become more collaborative. Indeed, computers have not only taken over the world of inventions, but also they have changed the geography of innovation, Bhattacharya said. Larger cities have historically held an innovative advantage, because (the theory goes) their density of smarties speeds up debate on the merits of new ideas, which are often born raw and poorly understood. But the researchers found that in the last few decades, larger cities are no more likely to produce new ideas in patents than smaller cities that can just as easily connect online with their co-authors. “Perhaps due to the Internet, the advantage of larger cities appears to be eroding,” Packalen wrote in an email….(More)”

Dataset Inventorying Tool


at US Open Data: “Today we’re releasing Let Me Get That Data For You (LMGTDFY), a free, open source tool that quickly and automatically creates a machine-readable inventory of all the data files found on a given website.
When government agencies create an open data repository, they need to start by inventorying the data that the agency is already publishing on their website. This is a laborious process. It means searching their own site with a query like this:

site:example.gov filetype:csv OR filetype:xls OR filetype:json

Then they have to read through all of the results, download all of the files, and create a spreadsheet that they can load into their repository. It’s a lot of work, and as a result it too often goes undone, resulting in a data repository that doesn’t actually contain all of that government‘s data.
Realizing that this was a common problem, we hired Silicon Valley Software Group to create a tool to automate the inventorying process. We worked with Dan Schultz and Ted Han, who created a system built on Django and Celery, using Microsoft’s great Bing Search API as its data source. The result is a free, installable tool, which produces a CSV file that lists all CSV, XML, JSON, XLS, XLSX, XML, and Shapefiles found on a given domain name.
We use this tool to power our new Let Me Get That Data For You website. We’re trying to keep our site within Bing’s free usage tier, so we’re limiting results to 300 datasets per site….(More)”

The Tricky Task of Rating Neighborhoods on 'Livability'


Tanvi Misra at CityLab: “Jokubas Neciunas was looking to buy an apartment almost two years back in Vilnius, Lithuania. He consulted real estate platforms and government data to help him decide the best option for him. In the process, he realized that there was a lot of information out there, but no one was really using it very well.
Fast-forward two years, and Neciunas and his colleagues have created PlaceILive.com—a start-up trying to leverage open data from cities and information from social media to create a holistic, accessible tool that measures the “livability” of any apartment or house in a city.
“Smart cities are the ones that have smart citizens,” says PlaceILive co-founder Sarunas Legeckas.
The team recognizes that foraging for relevant information in the trenches of open data might not be for everyone. So they tried to “spice it up” by creating a visually appealing, user-friendly portal for people looking for a new home to buy or rent. The creators hope PlaceILive becomes a one-stop platform where people find ratings on every quality-of-life metric important to them before their housing hunt begins.
In its beta form, the site features five cities—New York, Chicago, San Francisco, London and Berlin. Once you click on the New York portal, for instance, you can search for the place you want to know about by borough, zip code, or address. I pulled up Brooklyn….The index is calculated using a variety of public information sources (from transit agencies, police departments, and the Census, for instance) as well as other available data (from the likes of Google, Socrata, and Foursquare)….(More)”

Open data: how mobile phones saved bananas from bacterial wilt in Uganda


Anna Scott in The Guardian:”Bananas are a staple food in Uganda. Ugandans eat more of the fruit than any other country in the world. Each person eats on average 700g (about seven small bananas) a day, according to the International Food Policy Research Institute, and they provide up to 27% of the population’s calorie intake.
But since 2002 a disease known as banana bacterial wilt (BBW) has wiped out crops across the country. When plants are infected, they cannot absorb water so their leaves start to shrivel and they eventually die….
The Ugandan government drew upon open data – data that is licensed and made available for anyone to access and share – about the disease made available by Unicef’s community polling project Ureport to deal with the problem.
Ureport mobilises a network of nearly 300,000 volunteers across Uganda, who use their mobiles to report on issues that affect them, from polio immunisation to malaria treatment, child marriage, to crop failure. It gathers data from via SMS polls and publishes the results as open sourced, open datasets.
The results are sent back to community members via SMS along with treatment options and advice on how best to protect their crops. Within five days of the first SMS being sent out, 190,000 Ugandans had learned about the disease and knew how to save bananas on their farms.
Via the Ureport platform, the datasets can also be accessed in real-time by community members, NGOs and the Ugandan government, allowing them to target treatments to where they we needed most. They are also broadcast on radio shows and analysed in articles produced by Ureport, informing wider audiences of scope and nature of the disease and how best to avoid it….
A report published this week by the Open Data Institute (ODI) features stories from around the world which reflect how people are using open date in development. Examples range from accessing school results in Tanzania to building smart cities in Latin America….(More).”

Scenario Planning Case Studies Using Open Government Data


New Paper by Robert Power, Bella Robinson, Lachlan Rudd, and Andrew Reeson: “The opportunity for improved decision making has been enhanced in recent years through the public availability of a wide variety of information. In Australia, government data is routinely made available and maintained in the http://data.gov.au repository. This is a single point of reference for data that can be reused for purposes beyond that originally considered by the data custodians. Similarly a wealth of citizen information is available from the Australian Bureau of Statistics. Combining this data allows informed decisions to be made through planning scenarios.”

We present two case studies that demonstrate the utility of data integration and web mapping. As a simple proof of concept the user can explore different scenarios in each case study by indicating the relative weightings to be used for the decision making process. Both case studies are demonstrated as a publicly available interactive map-based website….(More)”

U.S. to release indexes of federal data


The Sunlight Foundation: “For the first time, the United States government has agreed to release what we believe to be the largest index of government data in the world.
On Friday, the Sunlight Foundation received a letter from the Office of Management and Budget (OMB) outlining how they plan to comply with our FOIA request from December 2013 for agency Enterprise Data Inventories. EDIs are comprehensive lists of a federal agency’s information holdings, providing an unprecedented view into data held internally across the government. Our FOIA request was submitted 14 months ago.
These lists of the government’s data were not public, however, until now. More than a year after Sunlight’s FOIA request and with a lawsuit initiated by Sunlight about to be filed, we’re finally going to see what data the government holds.
Sunlight’s FOIA request built on President Obama’s Open Data Executive Order, which first required agency-wide data indexes to be built and maintained. According to implementation guidance prepared in response to the executive order, Enterprise Data Inventories are intended to help agencies “develop a clear and comprehensive understanding of what data assets they possess” by accounting “for all data assets created or collected by the agency.”
At the time, we argued that “without seeing the entire EDIs, it is impossible for the public to know what data is being collected and stored by the government and to debate whether or not that data should be made public.”
When OMB initially responded to our request, it didn’t cite an exemption to FOIA. Instead, OMB directed us to approach each agency individually for its EDIs. This, despite the fact that the agencies are required to submit their updated EDIs to OMB on a quarterly basis.
With that in mind, and with the help of some very talented lawyers from the firm of Garvey Schubert Barer, we filed an administrative appeal with OMB and prepared for court. We were ready to fight for the idea that government data cannot be leveraged to its fullest if the public only knows about a fraction of it.
We hoped that OMB would recognize that open data is worth the work it takes to disclose the indexes. We’re pleased to say that our hope looks like it is becoming reality.
Since 2013, federal agencies have been required to construct a list of all of their major data sets, subject only to a few exceptions detailed in President Obama’s executive order as well as some information exempted from disclosure under the FOIA.
Having access to a detailed index of agencies’ data is a key step in aiding the use and utility of government data. By publicly describing almost all data the government has in an index, the Enterprise Data Inventories should empower IT management, FOIA requestors and oversight — by government officials and citizens alike….(More)”.

More Power to the People: How Cities Are Letting Data Flow


Stephen Taylor at People4SmarterCities: “Smart cities understand that engaging the public in decision-making is vital to enhancing services and ensuring accountability. Here are three ideas that show how cities are embracing new technologies and opening up data to spur civic participation and improve citizens’ lives.

 City Texts Help Keep Food on the Table
In San Francisco, about a third of the 52,000 people that receive food stamps are disenrolled from the program because they miss certain deadlines, such as filing quarterly reports with the city’s Human Services Agency. To help keep recipients up to date on their status, the nonprofit organization Code for America worked with the city agency to create Promptly, an open-source software platform that sends alerts by text message when citizens need to take action to keep their benefits. Not only does it help ensure that low-income residents keep food on the table, it also helps the department run more efficiently as less staff time is spent on re-enrollments.
Fired Up in Los Angeles Over Open Data
For the Los Angeles Fire Department, its work is all about responding to citizens. Not only does it handle fire and medical calls, it’s also the first fire agency in the U.S. to gather and post data on its emergency-response times on the Internet through a program called FireStat. The data gives citizens the opportunity to review metrics such as the amount of time it takes for stations to process emergency calls, the time for firefighters to leave the station and the travel time to the incident for each of its 102 firehouses throughout the city. The goal of FireStat is to see where and how response times can be improved, while increasing management accountability….(More)”

The Future of Open and How To Stop It


Blogpost by Steve Song: “In 2008, Jonathan Zittrain wrote a book called The Future of the Internet and How To Stop It. In it he argued that the runaway success of the Internet is also the cause of it being undermined, that vested interests were in the process of locking down the potential for innovation by creating walled gardens.  He wrote that book because he loved the Internet and the potential it represents and was concerned about it going down a path that would diminish its potential.  It is in that spirit that I borrow his title to talk about the open movement.  By the term open movement, I am referring broadly to the group of initiatives inspired by the success of Open Source software that led to initiatives as varied as the Creative Commons, Open Data, Open Science, Open Access, Open Corporates, Open Government, the list goes on.   I write this because I love open initiatives but I fear that openness is in danger of becoming its own enemy as it becomes an orthodoxy difficult to question.
In June of last year, I wrote an article called The Morality of Openness which attempted to unpack my complicated feelings about openness.  Towards the end the essay, I wondered whether the word trust might not be a more important word than open for our current world.  I am now convinced of this.  Which is not to say that I have stopped believing in openness but openness; I believe openness is a means to an end, it is not the endgame.  Trust is the endgame.  Higher trust environments, whether in families or corporations or economies, tend to be both more effective and happier.  There is no similar body of evidence for open and yet open practices can be a critical element on the road to trust. Equally, when mis-applied, openness can achieve the opposite….
Openness can be a means of building trust.  Ironically though, if openness as behaviour is mandated, it stops building trust.  Listen to Nobel Laureate Vernon Smith talk about why that happens.  What Smith argues (building on the work of an earlier Smith, Adam Smith’s Theory of Moral Sentiments) is that intent matters.  That as human beings, we signal our intentions to each other with our behaviour and that influences how others behave.  When intention is removed by regulating or enforcing good behaviour, that signal is lost as well.
I watched this happen nearly ten years ago in South Africa when the government decided to embrace the success of Open Source software and make it mandatory for government departments to use Open Source software.  No one did.  It is choosing to share that make open initiatives work.  When you remove choice, you don’t inspire others to share and you don’t build trust.  Looking at the problem from the perspective of trust rather than from the perspective of open makes this problem much easier to see.
Lateral thinker Jerry Michalski gave a great talk last year entitled What If We Trusted You? in which he talked about how the architecture of systems either build or destroy trust.  He give a great example of wikipedia as an open, trust enabling architecture.  We don’t often think about what a giant leap of trust wikipedia makes in allowing anyone to edit it and what an enormous achievement it became…(More).”

The story of the sixth myth of open data and open government


Paper by Ann-Sofie Hellberg and Karin Hedström: “The aim of this paper is to describe a local government effort to realise an open government agenda. This is done using a storytelling approach….The empirical data is based on a case study. We participated in, as well as followed, the process of realising an open government agenda on a local level, where citizens were invited to use open public data as the basis for developing apps and external web solutions. Based on an interpretative tradition, we chose storytelling as a way to scrutinize the competition process. In this paper, we present a story about the competition process using the story elements put forward by Kendall and Kendall (2012).

….Our research builds on existing research by proposing the myth that the “public” wants to make use of open data. We provide empirical insights into the challenge of gaining benefits from open public data. In particular, we illustrate the difficulties in getting citizens interested in using open public data. Our case shows that people seem to like the idea of open public data, but do not necessarily participate actively in the data re-use process…..This study illustrates the difficulties of promoting the re-use of open public data. Public organisations that want to pursue an open government agenda can use our findings as empirical insights… (More)”