Benchmarking open government: An open data perspective


Paper by N Veljković, S Bogdanović-Dinić, and L Stoimenov in Government Information Quarterly: “This paper presents a benchmark proposal for the Open Government and its application from the open data perspective using data available on the U.S. government’s open data portal (data.gov). The benchmark is developed over the adopted Open Government conceptual model, which describes Open Government through data openness, transparency, participation and collaboration. Resulting in two measures, that is, one known as the e-government openness index (eGovOI) and the other Maturity, the benchmark indicates the progress of government over time, the efficiency of recognizing and implementing new concepts and the willingness of the government to recognize and embrace innovative ideas.”

How government can promote open data


Michael Chui, Diana Farrell, and Kate Jackson from McKinsey: “Institutions and companies across the public and private sectors have begun to release and share vast amounts of information in recent years, and the trend is only accelerating. Yet while some information is easily accessible, some is still trapped in paper records. Data may be free or come at a cost. And there are tremendous differences in reuse and redistribution rights. In short, there are degrees when it comes to just how “open” data is and, as a result, how much value it can create.
While businesses and other private organizations can make more information public, we believe that government has a critical role in unleashing the economic potential of open data. A recent McKinsey report, Open data: Unlocking innovation and performance with liquid information,1 identified more than $3 trillion in economic value globally that could be generated each year in seven domains through increasingly “liquid” information that is machine readable, accessible to a broad audience at little or no cost, and capable of being shared and distributed. These sources of value include new or increased revenue, savings, and economic surplus that flow from the insights provided by data as diverse as census demographics, crop reports, and information on product recalls.
Sitting at the nexus of key stakeholders—citizens, businesses, and nongovernmental organizations (NGOs)—government is ideally positioned to extract value from open data and to help others do the same. We believe government can spur value creation at all levels of society by concurrently fulfilling four important open-data roles (exhibit):

Government can serve as an open-data provider, catalyst, user, and policy maker to create value and mitigate risks.

Let’s get geeks into government


Gillian Tett in the Financial Times: “Fifteen years ago, Brett Goldstein seemed to be just another tech entrepreneur. He was working as IT director of OpenTable, then a start-up website for restaurant bookings. The company was thriving – and subsequently did a very successful initial public offering. Life looked very sweet for Goldstein. But when the World Trade Center was attacked in 2001, Goldstein had a moment of epiphany. “I spent seven years working in a startup but, directly after 9/11, I knew I didn’t want my whole story to be about how I helped people make restaurant reservations. I wanted to work in public service, to give something back,” he recalls – not just by throwing cash into a charity tin, but by doing public service. So he swerved: in 2006, he attended the Chicago police academy and then worked for a year as a cop in one of the city’s toughest neighbourhoods. Later he pulled the disparate parts of his life together and used his number-crunching skills to build the first predictive data system for the Chicago police (and one of the first in any western police force), to indicate where crime was likely to break out.

This was such a success that Goldstein was asked by Rahm Emanuel, the city’s mayor, to create predictive data systems for the wider Chicago government. The fruits of this effort – which include a website known as “WindyGrid” – went live a couple of years ago, to considerable acclaim inside the techie scene.

This tale might seem unremarkable. We are all used to hearing politicians, business leaders and management consultants declare that the computing revolution is transforming our lives. And as my colleague Tim Harford pointed out in these pages last week, the idea of using big data is now wildly fashionable in the business and academic worlds….

In America when top bankers become rich, they often want to “give back” by having a second career in public service: just think of all those Wall Street financiers who have popped up at the US Treasury in recent years. But hoodie-wearing geeks do not usually do the same. Sure, there are some former techie business leaders who are indirectly helping government. Steve Case, a co-founder of AOL, has supported White House projects to boost entrepreneurship and combat joblessness. Tech entrepreneurs also make huge donations to philanthropy. Facebook’s Mark Zuckerberg, for example, has given funds to Newark education. And the whizz-kids have also occasionally been summoned by the White House in times of crisis. When there was a disastrous launch of the government’s healthcare website late last year, the Obama administration enlisted the help of some of the techies who had been involved with the president’s election campaign.

But what you do not see is many tech entrepreneurs doing what Goldstein did: deciding to spend a few years in public service, as a government employee. There aren’t many Zuckerberg types striding along the corridors of federal or local government.
. . .
It is not difficult to work out why. To most young entrepreneurs, the idea of working in a state bureaucracy sounds like utter hell. But if there was ever a time when it might make sense for more techies to give back by doing stints of public service, that moment is now. The civilian public sector badly needs savvier tech skills (just look at the disaster of that healthcare website for evidence of this). And as the sector’s founders become wealthier and more powerful, they need to show that they remain connected to society as a whole. It would be smart political sense.
So I applaud what Goldstein has done. I also welcome that he is now trying to persuade his peers to do the same, and that places such as the University of Chicago (where he teaches) and New York University are trying to get more young techies to think about working for government in between doing those dazzling IPOs. “It is important to see more tech entrepreneurs in public service. I am always encouraging people I know to do a ‘stint in government”. I tell them that giving back cannot just be about giving money; we need people from the tech world to actually work in government, “ Goldstein says.

But what is really needed is for more technology CEOs and leaders to get involved by actively talking about the value of public service – or even encouraging their employees to interrupt their private-sector careers with the occasional spell as a government employee (even if it is not in a sector quite as challenging as the police). Who knows? Maybe it could be Sheryl Sandberg’s next big campaigning mission. After all, if she does ever jump back to Washington, that could have a powerful demonstration effect for techie women and men. And shake DC a little too.”

Politics and the Internet


Edited book by William H. Dutton (Routledge – 2014 – 1,888 pages: “It is commonplace to observe that the Internet—and the dizzying technologies and applications which it continues to spawn—has revolutionized human communications. But, while the medium’s impact has apparently been immense, the nature of its political implications remains highly contested. To give but a few examples, the impact of networked individuals and institutions has prompted serious scholarly debates in political science and related disciplines on: the evolution of ‘e-government’ and ‘e-politics’ (especially after recent US presidential campaigns); electronic voting and other citizen participation; activism; privacy and surveillance; and the regulation and governance of cyberspace.
As research in and around politics and the Internet flourishes as never before, this new four-volume collection from Routledge’s acclaimed Critical Concepts in Political Science series meets the need for an authoritative reference work to make sense of a rapidly growing—and ever more complex—corpus of literature. Edited by William H. Dutton, Director of the Oxford Internet Institute (OII), the collection gathers foundational and canonical work, together with innovative and cutting-edge applications and interventions.
With a full index and comprehensive bibliographies, together with a new introduction by the editor, which places the collected material in its historical and intellectual context, Politics and the Internet is an essential work of reference. The collection will be particularly useful as a database allowing scattered and often fugitive material to be easily located. It will also be welcomed as a crucial tool permitting rapid access to less familiar—and sometimes overlooked—texts. For researchers, students, practitioners, and policy-makers, it is a vital one-stop research and pedagogic resource.”

Eight (No, Nine!) Problems With Big Data


Gary Marcus and Ernest Davis in the New York Times: “BIG data is suddenly everywhere. Everyone seems to be collecting it, analyzing it, making money from it and celebrating (or fearing) its powers. Whether we’re talking about analyzing zillions of Google search queries to predict flu outbreaks, or zillions of phone records to detect signs of terrorist activity, or zillions of airline stats to find the best time to buy plane tickets, big data is on the case. By combining the power of modern computing with the plentiful data of the digital era, it promises to solve virtually any problem — crime, public health, the evolution of grammar, the perils of dating — just by crunching the numbers.

Or so its champions allege. “In the next two decades,” the journalist Patrick Tucker writes in the latest big data manifesto, “The Naked Future,” “we will be able to predict huge areas of the future with far greater accuracy than ever before in human history, including events long thought to be beyond the realm of human inference.” Statistical correlations have never sounded so good.

Is big data really all it’s cracked up to be? There is no doubt that big data is a valuable tool that has already had a critical impact in certain areas. For instance, almost every successful artificial intelligence computer program in the last 20 years, from Google’s search engine to the I.B.M. “Jeopardy!” champion Watson, has involved the substantial crunching of large bodies of data. But precisely because of its newfound popularity and growing use, we need to be levelheaded about what big data can — and can’t — do.

The first thing to note is that although big data is very good at detecting correlations, especially subtle correlations that an analysis of smaller data sets might miss, it never tells us which correlations are meaningful. A big data analysis might reveal, for instance, that from 2006 to 2011 the United States murder rate was well correlated with the market share of Internet Explorer: Both went down sharply. But it’s hard to imagine there is any causal relationship between the two. Likewise, from 1998 to 2007 the number of new cases of autism diagnosed was extremely well correlated with sales of organic food (both went up sharply), but identifying the correlation won’t by itself tell us whether diet has anything to do with autism.

Second, big data can work well as an adjunct to scientific inquiry but rarely succeeds as a wholesale replacement. Molecular biologists, for example, would very much like to be able to infer the three-dimensional structure of proteins from their underlying DNA sequence, and scientists working on the problem use big data as one tool among many. But no scientist thinks you can solve this problem by crunching data alone, no matter how powerful the statistical analysis; you will always need to start with an analysis that relies on an understanding of physics and biochemistry.

Third, many tools that are based on big data can be easily gamed. For example, big data programs for grading student essays often rely on measures like sentence length and word sophistication, which are found to correlate well with the scores given by human graders. But once students figure out how such a program works, they start writing long sentences and using obscure words, rather than learning how to actually formulate and write clear, coherent text. Even Google’s celebrated search engine, rightly seen as a big data success story, is not immune to “Google bombing” and “spamdexing,” wily techniques for artificially elevating website search placement.

Fourth, even when the results of a big data analysis aren’t intentionally gamed, they often turn out to be less robust than they initially seem. Consider Google Flu Trends, once the poster child for big data. In 2009, Google reported — to considerable fanfare — that by analyzing flu-related search queries, it had been able to detect the spread of the flu as accurately and more quickly than the Centers for Disease Control and Prevention. A few years later, though, Google Flu Trends began to falter; for the last two years it has made more bad predictions than good ones.

As a recent article in the journal Science explained, one major contributing cause of the failures of Google Flu Trends may have been that the Google search engine itself constantly changes, such that patterns in data collected at one time do not necessarily apply to data collected at another time. As the statistician Kaiser Fung has noted, collections of big data that rely on web hits often merge data that was collected in different ways and with different purposes — sometimes to ill effect. It can be risky to draw conclusions from data sets of this kind.

A fifth concern might be called the echo-chamber effect, which also stems from the fact that much of big data comes from the web. Whenever the source of information for a big data analysis is itself a product of big data, opportunities for vicious cycles abound. Consider translation programs like Google Translate, which draw on many pairs of parallel texts from different languages — for example, the same Wikipedia entry in two different languages — to discern the patterns of translation between those languages. This is a perfectly reasonable strategy, except for the fact that with some of the less common languages, many of the Wikipedia articles themselves may have been written using Google Translate. In those cases, any initial errors in Google Translate infect Wikipedia, which is fed back into Google Translate, reinforcing the error.

A sixth worry is the risk of too many correlations. If you look 100 times for correlations between two variables, you risk finding, purely by chance, about five bogus correlations that appear statistically significant — even though there is no actual meaningful connection between the variables. Absent careful supervision, the magnitudes of big data can greatly amplify such errors.

Seventh, big data is prone to giving scientific-sounding solutions to hopelessly imprecise questions. In the past few months, for instance, there have been two separate attempts to rank people in terms of their “historical importance” or “cultural contributions,” based on data drawn from Wikipedia. One is the book “Who’s Bigger? Where Historical Figures Really Rank,” by the computer scientist Steven Skiena and the engineer Charles Ward. The other is an M.I.T. Media Lab project called Pantheon.

Both efforts get many things right — Jesus, Lincoln and Shakespeare were surely important people — but both also make some egregious errors. “Who’s Bigger?” claims that Francis Scott Key was the 19th most important poet in history; Pantheon has claimed that Nostradamus was the 20th most important writer in history, well ahead of Jane Austen (78th) and George Eliot (380th). Worse, both projects suggest a misleading degree of scientific precision with evaluations that are inherently vague, or even meaningless. Big data can reduce anything to a single number, but you shouldn’t be fooled by the appearance of exactitude.

FINALLY, big data is at its best when analyzing things that are extremely common, but often falls short when analyzing things that are less common. For instance, programs that use big data to deal with text, such as search engines and translation programs, often rely heavily on something called trigrams: sequences of three words in a row (like “in a row”). Reliable statistical information can be compiled about common trigrams, precisely because they appear frequently. But no existing body of data will ever be large enough to include all the trigrams that people might use, because of the continuing inventiveness of language.

To select an example more or less at random, a book review that the actor Rob Lowe recently wrote for this newspaper contained nine trigrams such as “dumbed-down escapist fare” that had never before appeared anywhere in all the petabytes of text indexed by Google. To witness the limitations that big data can have with novelty, Google-translate “dumbed-down escapist fare” into German and then back into English: out comes the incoherent “scaled-flight fare.” That is a long way from what Mr. Lowe intended — and from big data’s aspirations for translation.

Wait, we almost forgot one last problem: the hype….

Effective metrics for measurement and target setting in online citizen engagement


Mathew Crozier at Bang the Table: “Target setting and measurement are arguably the most important aspects of any engagement process. If we are unable to properly understand the results, then have we really respected the community’s time and effort contributing to our project?
In building the latest version of the EngagementHQ software we not only thought about new tools and ways to engage the community, we also watched the ways our clients had been using the reports and set ourselves to thinking about how we could build a set of metrics for target setting and the measurement of results that will remain relevant as we add more and more functionality to EngagementHQ.
Things have changed a lot since we designed our old reports. You can now get information from your community using forums, guestbooks, a story tool, interactive mapping, surveys, quick polls, submission forms, a news feed with discussions or the QandA tool. You can provide information to the community not just through library, dates, photos and FAQs but also using videos, link boxes and embedded content from all over the web.
Our old reports could tell you that 600 people had viewed the documents and it could tell you that 70 people had read the FAQs but you could not tell if they were the same people so you didn’t really know how many people had accessed information through your site. Generally we used those who had viewed documents in the library as a proxy but as time goes on our more engaging clients are communicating less and less through documents and more through other channels.
Similarly, whilst registrations were a good proxy for engagement (why else would you sign up?), it was failing to keep pace with the technology. You can now configure all our tools to require sign up or to be exempt from it these days so the proxy doesn’t hold. Moreover, many of our clients bulk load groups into the database and therefore inflate the registrations number.
What we came up with was a simple solution. We would calculate Aware, Informed and Engaged cohorts in the reports.
Aware – a measure of the number of people who have visited your project;
Informed – a measure of the visitors who have clicked to access further information resources, to learn more;
Engaged – a measure of the number of people who have given you feedback using any of the means available on the site.”

Using Social Media to Measure Labor Market Flows


Paper by Dolan Antenucci, Michael Cafarella, Margaret C. Levenstein, Christopher Ré, and Matthew D. Shapiro: “Social media enable promising new approaches to measuring economic activity and analyzing economic behavior at high frequency and in real time using information independent from standard survey and administrative sources. This paper uses data from Twitter to create indexes of job loss, job search, and job posting. Signals are derived by counting job-related phrases in Tweets such as “lost my job.” The social media indexes are constructed from the principal components of these signals. The University of Michigan Social Media Job Loss Index tracks initial claims for unemployment insurance at medium and high frequencies and predicts 15 to 20 percent of the variance of the prediction error of the consensus forecast for initial claims. The social media indexes provide real-time indicators of events such as Hurricane Sandy and the 2013 government shutdown. Comparing the job loss index with the search and posting indexes indicates that the Beveridge Curve has been shifting inward since 2011.
The University of Michigan Social Media Job Loss index is update weeklyand is available at http://econprediction.eecs.umich.edu/.”

Smart cities are here today — and getting smarter


Computer World: “Smart cities aren’t a science fiction, far-off-in-the-future concept. They’re here today, with municipal governments already using technologies that include wireless networks, big data/analytics, mobile applications, Web portals, social media, sensors/tracking products and other tools.
These smart city efforts have lofty goals: Enhancing the quality of life for citizens, improving government processes and reducing energy consumption, among others. Indeed, cities are already seeing some tangible benefits.
But creating a smart city comes with daunting challenges, including the need to provide effective data security and privacy, and to ensure that myriad departments work in harmony.

The global urban population is expected to grow approximately 1.5% per year between 2025 and 2030, mostly in developing countries, according to the World Health Organization.

What makes a city smart? As with any buzz term, the definition varies. But in general, it refers to using information and communications technologies to deliver sustainable economic development and a higher quality of life, while engaging citizens and effectively managing natural resources.
Making cities smarter will become increasingly important. For the first time ever, the majority of the world’s population resides in a city, and this proportion continues to grow, according to the World Health Organization, the coordinating authority for health within the United Nations.
A hundred years ago, two out of every 10 people lived in an urban area, the organization says. As recently as 1990, less than 40% of the global population lived in a city — but by 2010 more than half of all people lived in an urban area. By 2050, the proportion of city dwellers is expected to rise to 70%.
As many city populations continue to grow, here’s what five U.S. cities are doing to help manage it all:

Scottsdale, Ariz.

The city of Scottsdale, Ariz., has several initiatives underway.
One is MyScottsdale, a mobile application the city deployed in the summer of 2013 that allows citizens to report cracked sidewalks, broken street lights and traffic lights, road and sewer issues, graffiti and other problems in the community….”

Crowdsourcing “Monopoly”


The Economist: “In 1904 a young American named Elizabeth Magie received a patent for a board game in which players used tokens to move around a four-sided board buying properties, avoiding taxes and jail, and collecting $100 every time they passed the board’s starting-point. Three decades later Charles Darrow, a struggling salesman in Pennsylvania, patented a tweaked version of the game as “Monopoly”. Now owned by Hasbro, a big toymaker, it has become one of the world’s most popular board games, available in dozens of languages and innumerable variations.
Magie was a devotee of Henry George, an economist who believed in common ownership of land; her game was designed to be a “practical demonstration of the present system of land-grabbing with all its usual outcomes and consequences.” And so it has become, though players snatch properties more in zeal than sadness. In “Monopoly” as in life, it is better to be rich than poor, children gleefully bankrupt their parents and nobody uses a flat iron any more.
Board-game makers have had to find their footing in a digital age. Hasbro’s game-and-puzzle sales fell by 4% in 2010—the year the iPad came to market—and 10% in 2011. Since then, however, its game-and-puzzle sales have rebounded, rising by 2% in 2012 and 10% in 2013. Stephanie Wissink, a youth-market analyst with Piper Jaffray, an investment bank, says that Hasbro has learned to become “co-creative…They’re infusing more social-generated content into their marketing and product development.”
Some of that content comes from Facebook. Last year, “Monopoly” fans voted on Hasbro’s Facebook page to jettison the poor old flat iron in favour of a new cat token. “Scrabble” players are voting on which word to add to the new dictionary (at press time, 16 remain, including “booyah”, “adorbs” and “cosplay”). “Monopoly” fans, meanwhile, are voting on which of ten house rules—among them collecting $400 rather than $200 for landing on “Go”, requiring players to make a full circuit of the board before buying property and “Mom always gets out of jail free. Always. No questions asked”—to make official…”

Facebook’s Connectivity Lab will develop advanced technology to provide internet across the world


and at GigaOm: “The Internet.org initiative will rely on a new team at Facebook called the Connectivity Lab, based at the company’s Menlo Park campus, to develop technology on the ground, in the air and in space, CEO Mark Zuckerberg announced Thursday. The team will develop technology like drones and satellites to expand access to the internet across the world.
“The team’s approach is based on the principle that different sized communities need different solutions and they are already working on new delivery platforms—including planes and satellites—to provide connectivity for communities with different population densities,” a post on Internet.org says.
Internet.org, which is backed by companies like Facebook, Samsung and Qualcomm, wants to provide internet to the two thirds of the world that remains disconnected due to cost, lack of infrastructure or remoteness. While many companies are  developing business models and partnerships in areas that lack internet, the Connectivity Lab will focus on sustainable technology that will transmit the signals. Facebook envisions using drones that could fly for months to connect suburban areas, while more rural areas would rely on satellites. Both would use infrared lasers to blanket whole areas with connectivity.
Members of the Connectivity Lab have backgrounds at NASA’s Jet Propulsion Laboratory, NASA’s Ames Research Center and the National Optical Astronomy Observatory. Facebook also confirmed today that it acquired five employees from Ascenta, a U.K.-based company that worked on the Zephyr–a solar-powered drone capable of flying for two weeks straight.
The lab’s work will build on work the company has already done in the Philippines and Paraguay, Zuckerberg said in a Facebook post. And, like the company’s Open Compute project, there is a possibility that the lab will seek partnerships with outside countries once the bulk of the technology has been developed.”