Can social media make every civil servant an innovator?


Steve Kelman at FCW: “Innovation, particularly in government, can be very hard. Lots of signoffs, lots of naysayers. For many, it’s probably not worth the hassle.
Yet all sorts of examples are surfacing about ways civil servants, non-profits, startups and researchers have thought to use social media — or data mining of government information — to get information that can either help citizens directly or help agencies serve citizens. I want to call attention to examples that I’ve seen just in the past few weeks — partly to recognize the creative people who have come up with these ideas, but partly to make a point about the relationship between these ideas and the general issue of innovation in government. I think that these social media and data-driven experiments are often a much simpler way for civil servants to innovate than many of the changes we typically think of under the heading “innovation in government.” They open the possibility to make innovation in government an activity for the civil service masses.
One example that was reported in The New York Times was about a pilot project at the New York City Department of Health and Mental Hygiene to do rapid keyword searches with phrases such as “vomit” and “diarrhea” associated with 294,000 Yelp restaurant reviews in New York City. The city is using a software program developed at Columbia University. They have now expended the monitoring to occur daily, to get quick information on possible problems at specific restaurants or with specific kinds of food.
A second example, reported in BloombergBusinessWeek, involved — perhaps not surprisingly, given the publication — an Israeli startup called Treato that is applying a similar idea to ferretting out adverse drug reactions before they come in through FDA studies and other systems. The founders are cooperating with researchers at Harvard Medical School and FDA officials, among others. Their software looks through Twitter and Facebook, along with a large number of patient forum sites, to cull out from all the reports of illnesses the incidents that may well reflect an unusual presence of adverse drug reactions.
These examples are fascinating in themselves. But one thing that caught my eye about both is that each seems high on the creativity dimension and low on the need-to-overcome-bureaucracy dimension. Both ideas reflect new and improved ways to do what these organizations do anyway, which is gather information to help inform regulatory and health decisions by government. Neither requires any upheaval in an agency’s existing culture, or steps on somebody’s turf in any serious way. Introducing the changes doesn’t require major changes in an agency’s internal procedures. Compared to many innovations in government, these are easy ones to make happen. (They do all need some funds, however.)
What I hope is that the information woven into social media will unlock a new era of innovation inside government. The limits of innovation are much less determined by difficult-to-change bureaucratic processes and can be much more responsive to an individual civil servant’s creativity…”

How NYC Open Data and Reddit Saved New Yorkers Over $55,000 a Year


IQuantNY: “NYC generates an enormous amount of data each year, and for the most part, it stays behind closed doors.  But thanks to the Open Data movement, signed into law by Bloomberg in 2012 and championed over the last several years by Borough President Gale Brewer, along with other council members, we now get to see a small slice of what the city knows. And that slice is growing.
There have been some detractors along the way; a senior attorney for the NYPD said in 2012 during a council hearing that releasing NYPD data in csv format was a problem because they were “concerned with the integrity of the data itself” and because “data could be manipulated by people who want ‘to make a point’ of some sort”.  But our democracy is built on the idea of free speech; we let all the information out and then let reason lead the way.
In some ways, Open Data adds another check and balance into government: its citizens.  I’ve watched the perfect example of this check work itself out over the past month.  You may have caught my post that used parking ticket data to identify the fire hydrant in New York City that was generating the most income for the city in the form of fines: $33,000 a year.  And on the next block, the second most profitable hydrant was generating $24,000 a year.  That’s two consecutive blocks with hydrants generating over $55,000 a year. But there was a problem.  In my post, I laid out why these two parking spots were extremely confusing and basically seemed like a trap; there was a wide “curb extension” between the street and the hydrant, making it appear like the hydrant was not by the street.  Additionally, the DOT had painted parking spots right where you would be fined if you parked.
Once the data was out there, the hydrant took on a life of its own.  First, it raised to the top of the nyc sub-reddit.  That is basically one way that the internet voted that this is in-fact “interesting”.  And that is how things go from small to big. From there, it travelled to the New York Observer, which was able to get a comment from the DOT. After that, it appeared in the New York Post, the post was republished in Gothamist and finally it even went global in the Daily Mail.
I guess the pressure was on the DOT at this point, as each media source reached out for comment, but what struck me was their response to the Observer:

“While DOT has not received any complaints about this location, we will review the roadway markings and make any appropriate alterations”

Why does someone have to complain in order for the DOT to see problems like this?  In fact, the DOT just redesigned every parking sign in New York because some of the old ones were considered confusing.  But if this hydrant was news to them, it implies that they did not utilize the very strongest source of measuring confusion on our streets: NYC parking tickets….”

How to Make Government Data Sites Better


Flowing Data: “Accessing government data from the source is frustrating. If you’ve done it, or at least tried to, you know the pain that is oddly formatted files, search that doesn’t work, and annotation that tells you nothing about the data in front of you.
The most frustrating part of the process is knowing how useful the data could be if only it were shared more simply. Unfortunately, ease-of-use is rarely the case, and we spend more time formatting and inspecting the data than we do actually putting it to use. Shouldn’t it be the other way around?
It’s this painstaking process that draws so much ire. It’s hard not to complain.
Maybe the people in charged of these sites just don’t know what’s going on. Or maybe they’re so overwhelmed by suck that they don’t know where to start. Or they’re unknowingly infected by the that-is-how-we’ve-always-done-it bug.
Whatever it may be, I need to think out loud about how to improve these sites. Empty complaints don’t help.
I use the Centers for Disease Control and Prevention as the test subject, but most of the things covered should easily generalize to other government sites (and non-government ones too). And I choose CDC not because they’re the worst but because they publish a lot of data that is of immediate and direct use to the general public.
I approach this from the point of view of someone who uses government data, beyond pulling a single data point from a spreadsheet. I’m also going to put on my Captain Obvious hat, because what seems obvious to some is apparently a black box to others.
Provide a useable data format
Sometimes it feels like government data is available in every format except the one that data users want. The worst one was when I downloaded a 2gb file, and upon unzipping it, I discovered it was a EXE file.
Data in PDF format is a kick in the face for people looking for CSV files. There might be ways to get the data out from PDFs, but it’s still a pain when you have more than a handful of files….
Useable data format is the most important, and if there’s just one thing you change, make it this.
(Raw data is fine too)
It’s rare to find raw government data, so it’s like striking gold when it actually happens. I realize you run into issues with data privacy, quality, missing data, etc. For these data sources, I appreciate the estimates with standard errors. However, the less aggregated (the more raw) you can provide, the better.
CSV for that too, please.
Never mind the fancy sharing tools
Not all government data is wedged into PDF files, and some of it is accessible via export tools that let you subset and layout your data exactly how you want it. The problem is that in an effort to please everyone, you end up with a tool shown on the left….
Tell people where to get the data
Get the things above done, and your government data site is exponentially better than it was before, but let’s keep going.
The navigation process to get to a dataset is incredibly convoluted, which makes it hard to find data and difficult to return to it….
Show visual previews
I’m all for visualization integrated with the data search tools. It always sucks when I spend time formatting data only to find that it wasn’t worth my time. Census Reporter is a fine example of how this might work.
That said, visual tools plus an upgrade to the previously mentioned things is a big undertaking, especially if you’re going to do it right. So I’m perfectly fine if you skip this step to focus your resources on data that’s easier to use and download. Leave the visualizing and analysis to us.
Decide what’s important, archive the rest
So much cruft. So many old documents. Broken links. Create an archive and highlight what people come to your site for.
Wrapping up
There’s plenty more stuff to update, especially once you start to work with the details, but this should be a good place to start. It’s a lot easier to point out what you can do to improve government data sharing than it is to actually do it of course. There are so many people, policies, and oh yes, politics, that it can be hard to change.”

Heteromation and its (dis)contents: The invisible division of labor between humans and machines


Paper by Hamid Ekbia and Bonnie Nardi in First Monday: “The division of labor between humans and computer systems has changed along both technical and human dimensions. Technically, there has been a shift from technologies of automation, the aim of which was to disallow human intervention at nearly all points in the system, to technologies of “heteromation” that push critical tasks to end users as indispensable mediators. As this has happened, the large population of human beings who have been driven out by the first type of technology are drawn back into the computational fold by the second type. Turning artificial intelligence on its head, one technology fills the gap created by the other, but with a vengeance that unsettles established mechanisms of reward, fulfillment, and compensation. In this fashion, replacement of human beings and their irrelevance to technological systems has given way to new “modes of engagement” with remarkable social, economic, and ethical implications. In this paper we provide a historical backdrop for heteromation and explore and explicate some of these displacements through analysis of a number of cases, including Mechanical Turk, the video games FoldIt and League of Legends, and social media.

Full Text: HTML

Why Governments Should Adopt a Digital Engagement Strategy


Lindsay Crudele at StateTech: “Government agencies increasingly value digital engagement as a way to transform a complaint-based relationship into one of positive, proactive constituent empowerment. An engaged community is a stronger one.
Creating a culture of participatory government, as we strive to do in Boston, requires a data-driven infrastructure supported by IT solutions. Data management and analytics solutions translate a huge stream of social media data, drive conversations and creative crowdsourcing, and support transparency.
More than 50 departments across Boston host public conversations using a multichannel, multidisciplinary portfolio of accounts. We integrate these using an enterprise digital engagement management tool that connects and organizes them to break down silos and boost collaboration. Moreover, the technology provides a lens into ways to expedite workflow and improve service delivery.

A Vital Link in Times of Need

Committed and creative daily engagement builds trusting collaboration that, in turn, is vital in an inevitable crisis. As we saw during the tragic events of the 2013 Boston Marathon bombings and recent major weather events, rapid response through digital media clarifies the situation, provides information about safety and manages constituent expectations.
Boston’s enterprise model supports coordinated external communication and organized monitoring, intake and response. This provides a superadmin with access to all accounts for governance and the ability to easily amplify central messaging across a range of cultivated communities. These communities will later serve in recovery efforts.
The conversations must be seeded by a keen, creative and data-driven content strategy. For an agency to determine the correct strategy for the organization and the community it serves, a growing crop of social analytics tools can provide efficient insight into performance factors: type of content, deployment schedule, sentiment, service-based response time and team performance, to name a few. For example, in February, the city of Boston learned that tweets from our mayor with video saw 300 percent higher engagement than those without.
These insights can inform resource deployment, eliminating guesswork to more directly reach constituents by their preferred methods. Being truly present in a conversation demonstrates care and awareness and builds trust. This increased positivity can be measured through sentiment analysis, including change over time, and should be monitored for fluctuation.
During a major event, engagement managers may see activity reach new peaks in volume. IT solutions can interpret Big Data and bring a large-scale digital conversation back into perspective, identifying public safety alerts and emerging trends, needs and community influencers who can be engaged as amplifying partners.

Running Strong One Year Later

Throughout the 2014 Boston Marathon, we used three monitoring tools to deliver smart alerts to key partners across the organization:
• An engagement management tool organized conversations for account performance and monitoring.
• A brand listening tool scanned for emerging trends across the city and uncovered related conversations.
• A location-based predictive tool identified early alerts to discover potential problems along the marathon route.
With the team and tools in place, policy-based training supports the sustained growth and operation of these conversation channels. A data-driven engagement strategy unearths all of our stories, where we, as public servants and neighbors, build better communities together….”

New Book on 25 Years of Participatory Budgeting


Tiago Peixoto at Democracy Spot: “A little while ago I mentioned the launch of the Portuguese version of the book organized by Nelson Dias, “Hope for Democracy: 25 Years of Participatory Budgeting Worldwide”.

The good news is that the English version is finally out. Here’s an excerpt from the introduction:

This book represents the effort  of more than forty authors and many other direct and indirect contributions that spread across different continents seek to provide an overview on the Participatory Budgeting (PB) in the World. They do so from different backgrounds. Some are researchers, others are consultants, and others are activists connected to several groups and social movements. The texts reflect this diversity of approaches and perspectives well, and we do not try to influence that.
(….)
The pages that follow are an invitation to a fascinating journey on the path of democratic innovation in very diverse cultural, political, social and administrative settings. From North America to Asia, Oceania to Europe, from Latin America to Africa, the reader will find many reasons to closely follow the proposals of the different authors.

The book can be downloaded here [PDF]. I had the pleasure of being one of the book’s contributors, co-authoring an article with Rafael Sampaio on the use of ICT in PB processes: “Electronic Participatory Budgeting: False Dilemmas and True Complexities” [PDF]...”

The Emerging Science of Computational Anthropology


Emerging Technology From the arXiv: The increasing availability of big data from mobile phones and location-based apps has triggered a revolution in the understanding of human mobility patterns. This data shows the ebb and flow of the daily commute in and out of cities, the pattern of travel around the world and even how disease can spread through cities via their transport systems.
So there is considerable interest in looking more closely at human mobility patterns to see just how well it can be predicted and how these predictions might be used in everything from disease control and city planning to traffic forecasting and location-based advertising.
Today we get an insight into the kind of detailed that is possible thanks to the work of Zimo Yang at Microsoft research in Beijing and a few pals. These guys start with the hypothesis that people who live in a city have a pattern of mobility that is significantly different from those who are merely visiting. By dividing travelers into locals and non-locals, their ability to predict where people are likely to visit dramatically improves.
Zimo and co begin with data from a Chinese location-based social network called Jiepang.com. This is similar to Foursquare in the US. It allows users to record the places they visit and to connect with friends at these locations and to find others with similar interests.
The data points are known as check-ins and the team downloaded more than 1.3 million of them from five big cities in China: Beijing, Shanghai, Nanjing, Chengdu and Hong Kong. They then used 90 per cent of the data to train their algorithms and the remaining 10 per cent to test it. The Jiapang data includes the users’ hometowns so it’s easy to see whether an individual is checking in in their own city or somewhere else.
The question that Zimo and co want to answer is the following: given a particular user and their current location, where are they most likely to visit in the near future? In practice, that means analysing the user’s data, such as their hometown and the locations recently visited, and coming up with a list of other locations that they are likely to visit based on the type of people who visited these locations in the past.
Zimo and co used their training dataset to learn the mobility pattern of locals and non-locals and the popularity of the locations they visited. The team then applied this to the test dataset to see whether their algorithm was able to predict where locals and non-locals were likely to visit.
They found that their best results came from analysing the pattern of behaviour of a particular individual and estimating the extent to which this person behaves like a local. That produced a weighting called the indigenization coefficient that the researchers could then use to determine the mobility patterns this person was likely to follow in future.
In fact, Zimo and co say they can spot non-locals in this way without even knowing their home location. “Because non-natives tend to visit popular locations, like the Imperial Palace in Beijing and the Bund in Shanghai, while natives usually check in around their homes and workplaces,” they add.
The team say this approach considerably outperforms the mixed algorithms that use only individual visiting history and location popularity. “To our surprise, a hybrid algorithm weighted by the indigenization coefficients outperforms the mixed algorithm accounting for additional demographical information.”
It’s easy to imagine how such an algorithm might be useful for businesses who want to target certain types of travelers or local people. But there is a more interesting application too.
Zimo and co say that it is possible to monitor the way an individual’s mobility patterns change over time. So if a person moves to a new city, it should be possible to see how long it takes them to settle in.
One way of measuring this is in their mobility patterns: whether they are more like those of a local or a non-local. “We may be able to estimate whether a non-native person will behave like a native person after a time period and if so, how long in average a person takes to become a native-like one,” say Zimo and co.
That could have a fascinating impact on the way anthropologists study migration and the way immigrants become part of a local community. This is computational anthropology a science that is clearly in its early stages but one that has huge potential for the future.”
Ref: arxiv.org/abs/1405.7769 : Indigenization of Urban Mobility

A brief history of open data


Article by Luke Fretwell in FCW: “In December 2007, 30 open-data pioneers gathered in Sebastopol, Calif., and penned a set of eight open-government data principles that inaugurated a new era of democratic innovation and economic opportunity.
“The objective…was to find a simple way to express values that a bunch of us think are pretty common, and these are values about how the government could make its data available in a way that enables a wider range of people to help make the government function better,” Harvard Law School Professor Larry Lessig said. “That means more transparency in what the government is doing and more opportunity for people to leverage government data to produce insights or other great business models.”
The eight simple principles — that data should be complete, primary, timely, accessible, machine-processable, nondiscriminatory, nonproprietary and license-free — still serve as the foundation for what has become a burgeoning open-data movement.

The benefits of open data for agencies

  • Save time and money when responding to Freedom of Information Act requests.
  • Avoid duplicative internal research.
  • Use complementary datasets held by other agencies.
  • Empower employees to make better-informed, data-driven decisions.
  • Attract positive attention from the public, media and other agencies.
  • Generate revenue and create new jobs in the private sector.

Source: Project Open Data

In the seven years since those principles were released, governments around the world have adopted open-data initiatives and launched platforms that empower researchers, journalists and entrepreneurs to mine this new raw material and its potential to uncover new discoveries and opportunities. Open data has drawn civic hacker enthusiasts around the world, fueling hackathons, challenges, apps contests, barcamps and “datapaloozas” focused on issues as varied as health, energy, finance, transportation and municipal innovation.
In the United States, the federal government initiated the beginnings of a wide-scale open-data agenda on President Barack Obama’s first day in office in January 2009, when he issued his memorandum on transparency and open government, which declared that “openness will strengthen our democracy and promote efficiency and effectiveness in government.” The president gave federal agencies three months to provide input into an open-government directive that would eventually outline what each agency planned to do with respect to civic transparency, collaboration and participation, including specific objectives related to releasing data to the public.
In May of that year, Data.gov launched with just 47 datasets and a vision to “increase public access to high-value, machine-readable datasets generated by the executive branch of the federal government.”
When the White House issued the final draft of its federal Open Government Directive later that year, the U.S. open-government data movement got its first tangible marching orders, including a 45-day deadline to open previously unreleased data to the public.
Now five years after its launch, Data.gov boasts more than 100,000 datasets from 227 local, state and federal agencies and organizations….”

Big Data, new epistemologies and paradigm shifts


Paper by Rob Kitchin in the Journal “Big Data and Society”: This article examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology”

How Long Is Too Long? The 4th Amendment and the Mosaic Theory


Law and Liberty Blog: “Volume 8.2 of the NYU Journal of Law and Liberty has been sent to the printer and physical copies will be available soon, but the articles in the issue are already available online here. One article that has gotten a lot of attention so far is by Steven Bellovin, Renee Hutchins, Tony Jebara, and Sebastian Zimmeck titled “When Enough is Enough: Location Tracking, Mosaic Theory, and Machine Learning.” A direct link to the article is here.
The mosaic theory is a modern corollary accepted by some academics – and the D.C. Circuit Court of Appeals in Maynard v. U.S. – as a twenty-first century extension of the Fourth Amendment’s prohibition on unreasonable searches of seizures. Proponents of the mosaic theory argue that at some point enough individual data collections, compiled and analyzed together, become a Fourth Amendment search. Thirty years ago the Supreme Court upheld the use of a tracking device for three days without a warrant, however the proliferation of GPS tracking in cars and smartphones has made it significantly easier for the police to access a treasure trove of information about our location at any given time.
It is easy to see why this theory has attracted some support. Humans are creatures of habit – if our public locations are tracked for a few days, weeks, or a month, it is pretty easy for machines to learn our ways and assemble a fairly detailed report for the government about our lives. Machines could basically predict when you will leave your house for work, what route you will take, when and where you go grocery shopping, all before you even do it, once it knows your habits. A policeman could observe you moving about in public without a warrant of course, but limited manpower will always reduce the probability of continuous mass surveillance. With current technology, a handful of trained experts could easily monitor hundreds of people at a time from behind a computer screen, and gather even more information than most searches requiring a warrant. The Supreme Court indicated a willingness to consider the mosaic theory in U.S. v. Jones, but has yet to embrace it…”

The article in Law & Liberty details the need to determine at which point machine learning creates an intrusion into our reasonable expectations of privacy, and even discusses an experiment that could be run to determine how long data collection can proceed before it is an intrusion. If there is a line at which individual data collection becomes a search, we need to discover where that line is. One of the articles’ authors, Steven Bollovin, has argued that the line is probably at one week – at that point your weekday and weekend habits would be known. The nation’s leading legal expert on criminal law, Professor Orin Kerr, fired back on the Volokh Conspiracy that Bollovin’s one week argument is not in line with previous iterations of the mosaic theory.