Twitter Datastream Used to Predict Flu Outbreaks


arXivBlog: “The rate at which people post flu-related tweets could become a powerful tool in the battle to spot epidemics earlier, say computer scientists.

Back in 2008, Google launched its now famous flu trends website. It works on the hypothesis that people make more flu-related search queries when they are suffering from the illness than when they are healthy. So counting the number of flu-related search queries in a given country gives a good indication of how the virus is spreading.
The predictions are pretty good. The data generally closely matches that produced by government organisations such as the Centers for Disease Control and Prevention (CDC) in the US. Indeed, in some cases, it has been able to spot an incipient epidemic more than a week before the CDC.
That’s been hugely important. An early indication that the disease is spreading in a population gives governments a welcome headstart in planning its response.
So an interesting question is whether other online services, in particular social media, can make similar or even better predictions. Today, we have an answer thanks to the work of Jiwei Li at Carnegie Mellon University in Pittsburgh, and Claire Cardie at Cornell University in New York State, who have been able to detect the early stages of an influenza outbreak using Twitter.
Their approach is in many ways similar to Google’s. They simply filter the Twitter datastream for flu-related tweets that are also geotagged. That allows them to create a map showing the distribution of these tweets and how it varies over time.
They also model the dynamics of the disease with some interesting subtleties. In the new model, a flu epidemic can be in one of four phases: non-epidemic phase, a rising phase where numbers are increasing, a stationary phase and a declining phase where numbers are falling.
The new approach uses an algorithm that attempts to spot the switch from one phase to another as early as possible. Indeed, Li and Cardie test the effectiveness of their approach using a Twitter dataset of 3.6 million flu-related tweets from about 1 million people in the US between June 2008 and June 2010…
Ref: arxiv.org/abs/1309.7340: Early Stage Influenza Detection from Twitter”

More Top-Down Participation, Please! Institutionalized empowerment through open participation


Michelle Ruesch and Oliver Märker in DDD: “…this is not another article on the empowering potential of bottom-up digital political participation. Quite the contrary: It instead seeks to stress the empowering potential of top-down digital political participation. Strikingly, the democratic institutionalization of (digital) political participation is rarely considered when we speak about power in the context of political participation. Wouldn’t it be true empowerment though if the right of citizens to speak their minds were directly integrated into political and administrative decision-making processes?

Institutionalized political participation

Political participation, defined as any act that aims to influence politics in some way, can be initiated either by citizens, referred to as “bottom-up” participation, or by government, often referred to as “top-down” participation.  For many, the word “top-down” instantly evokes negative connotations, even though top-down participatory spaces are actually the foundation of democracy. These are the spaces of participation offered by the state and guaranteed by democratic constitutions. For a long time, top-down participation could be equated with formal democratic participation such as elections, referenda or party politics. Today, however, in states like Germany we can observe a new form of top-down political participation, namely government-initiated participation that goes beyond what is legally required and usually makes extensive use of digital media.
Like many other Western states, Germany has to cope with decreasing voter turnout and a lack of trust in political parties. At the same time, according to a recent study from 2012, two-thirds of eligible voters would like to be more involved in political decisions. The case of “Stuttgart 21” served as a late wake-up call for many German municipalities. Plans to construct a new train station in the center of the city of Stuttgart resulted in a petition for a local referendum, which was rejected. Protests against the train station culminated in widespread demonstrations in 2010, forcing construction to be halted. Even though a referendum was finally held in 2011 and a slight majority voted in favor of the train station, the Stuttgart 21 case has since been cited by Chancellor Angela Merkel and others as an example of the negative consequences of taking decisions without consulting with citizens early on. More and more municipalities and federal ministries in Germany have therefore started acknowledging that the conventional democratic model of participation in elections every few years is no longer sufficient. The Federal Ministry of Transport, Building and Urban Development, for example, published a manual for “good participation” in urban development projects….

What’s so great about top-down participation?

Semi-formal top-down participation processes have one major thing in common, regardless of the topic they address: Governmental institutions voluntarily open up a space for dialogue and thereby obligate themselves to take citizens’ concerns and ideas into account.
As a consequence, government-initiated participation offers the potential for institutionalized empowerment beyond elections. It grants the possibility of integrating participation into political and administrative decision-making processes….
Bottom-up participation will surely always be an important mobilizer of democratic change. Nevertheless, the provision of spaces of open participation by governments can aid in the institutionalization of citizens’ involvement in political decision-making. Had Stuttgart offered an open space of participation early in the train station construction process, maybe protests would never have escalated the way they did.
So is top-down participation the next step in the process of democratization? It could be, but only under certain conditions. Most importantly, top-down open participation requires a genuine willingness to abandon the old principle of doing business behind closed doors. This is not an easy undertaking; it requires time and endurance. Serious open participation also requires creating state institutions that ensure the relevance of the results by evaluating them and considering them in political decisions. We have formulated ten conditions that we consider necessary for the genuine institutionalization of open political participation [14]:

  • There needs to be some scope for decision-making. Top-down participation only makes sense when the results of the participation can influence decisions.
  • The government must genuinely aim to integrate the results into decision-making processes.
  • The limits of participation must be communicated clearly. Citizens must be informed if final decision-making power rests with a political body, for example.
  • The subject matter, rules and procedures need to be transparent.
  • Citizens need to be aware that they have the opportunity to participate.
  • Access to participation must be easy, the channels of participation chosen according to the citizens’ media habits. Using the Internet should not be a goal in itself.
  • The participatory space should be “neutral ground”. A moderator can help ensure this.
  • The set-up must be interactive. Providing information is only a prerequisite for participation.
  • Participation must be possible without providing real names or personal data.
  • Citizens must receive continuous feedback regarding how results are handled and the implementation process.”

The Art of Making City Code Beautiful


Nancy Scola in Next City: “Some rather pretty legal websites have popped up lately: PhillyCode.org, ChicagoCode.org and, as of last Thursday, SanFranciscoCode.org. This is how municipal code would design itself if it actually wanted to be read.
The network of [city]Code.org sites is the output of The State Decoded, a project of the OpenGov Foundation (See correction below), which has its own fascinating provenance. That D.C.-based non-profit grew out of the fight in Congress over the SOPA and PIPA digital copyright bills a few winters ago. At the time, the office of Rep. Darrell Issa, more recently of Benghazi fame, built a platform called Madison that invited the public to help edit an alternative bill. Madison outlived the SOPA debate, and was spun out last summer as the flagship project of the OpenGov Foundation, helmed by former Issa staffer Seamus Kraft.
“What we discovered,” Kraft says, “is that co-authoring legislation is high up there on what [the public wants to] do with government information, but it’s not at the top.” What heads the list, he says, is simply knowing “what are the laws?” Pre-SanFranciscoCode, the city’s laws on everything from elections to electrical installations to transportation were trapped in an interface, run by publisher American Legal, that would not have looked out of place in “WarGames.” (Here’s the comparable “old” site for Chicago. It’s probably enough to say that Philadelphia’s comes with a “Frames/No Frames” option.) Madison needed a base of clean, structured municipal code upon which to function, and Kraft and company were finding that in cities across the country, that just didn’t exist.
Fixing the code, Kraft says, starts with him “unlawyering the text” that is either supplied to them by the city or scraped from online. This involves reading through the city code and looking for signposts that indicate when sections start, how provisions nest within them, and other structural cues that establish a pattern. That breakdown gets passed to the organization’s developers, who use it to automatically parse the full corpus. The process is time consuming. In San Francisco, 16 different patterns were required to capture each of the code’s sections. Often, the parser needs to be tweaked. “Sometimes it happens in a few minutes or a few hours,” Kraft says, “and sometimes it takes a few days.”

Over the long haul, Kraft has in mind adopting the customizability of YouVersion, the online digital Bible that allows users to choose fonts, colors and more. Kraft, a 2007 graduate of Georgetown who will cite the Catholic Church’s distributed structure as a model for networked government, proclaims YouVersion “the most kick-ass Bible you’ve ever seen. It’s stunning.” He’d like to do the same with municipal code, for the benefit of both the average American and those who have more regular engagement with local legal texts. “If you’re spending all day reading law,” he says, “you should at the very least have the most comfortable view possible.”

AskThem


AskThem is a project of the Participatory Politics Foundation, a 501(c)3 non-profit organization with a mission to increase civic engagement. AskThem is supported by a charitable grant from the Knight Foundation’s Tech For Engagement initiative.
AskThem is a free & open-source website for questions-and-answers with public figures. It’s a not-for-profit tool for a stronger democracy, with open data for informed and engaged communities.
AskThem allows you to:

  • Find and ask questions to over 142,000 elected officials nationwide: federal, state and city levels of government.
  • Get signatures for your question or petition, have it delivered over email or Twitter, and push for a public response.
  • See questions from people near you, sign-on to questions you care about, and review answers from public figures.

It’s like a version of “We The People” for every elected official, from local city council members all the way up to U.S. senators. Enter your email above to be the first to ask a question when we launch and see previews of the site this Fall.
Elected officials: enter your email above and we’ll send you more information about signing up to answer questions on AskThem. It’s a free and non-partisan service to respond to your constituents in an open public forum and update them over email about your work. Or, be a leader in open-government and sign up now.
Issue-based organizations and media: sign up to help promote questions to government from people in your area. We’re working to launch with partnerships that build greater public accountability.
Previously known as the OpenGovernment.org project, AskThem is open-source and uses open government data – our code is available on GitHub – contributions welcome. For more development updates & discussion, join our low-traffic Google Group.
We’re a small non-profit organization actively seeking charitable funding support – help us launch this powerful new tool for public dialogue! Email us for a copy of our non-profit funding prospectus. If you can make a tax-exempt gift to support our work, please donate to PPF via OpenCongress. More background on the project is available on our Knight NewsChallenge proposal from March 2013.
Questions, feedback, ideas? Email David Moore, Executive Director of PPF – david at ppolitics.org, Twitter: @ppolitics; like our page on Facebook & follow @AskThemPPF on Twitter. Stay tuned!”

Using Big Data to Ask Big Questions


Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?

Looking at Legislation

Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.

Analyze All the Things!

Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”

The Shutdown’s Data Blackout


Opinion piece by Katherine G. Abraham and John Haltiwanger in The New York Times: “Today, for the first time since 1996 and only the second time in modern memory, the Bureau of Labor Statistics will not issue its monthly jobs report, as a result of the shutdown of nonessential government services. This raises an important question: Are the B.L.S. report and other economic data that the government provides “nonessential”?

If we’re trying to understand how much damage the shutdown or sequestration cuts are doing to jobs or the fragile economic recovery, they are definitely essential. Without robust economic data from the federal government, we can speculate, but we won’t really know.

In the last two shutdowns, in 1995 and 1996, the Congressional Budget Office estimated the economic damage at around 0.5 percent of the gross domestic product. This time, Moody’s estimates that a three-to-four-week shutdown might subtract 1.4 percent (annualized) from gross domestic product growth this quarter and take $55 billion out of the economy. Democrats tend to play up such projections; Republicans tend to play them down. If the shutdown continues, though, we’ll all be less able to tell what impact it is having, because more reports like the B.L.S. jobs report will be delayed, while others may never be issued.

In fact, sequestration cuts that affected 2013 budgets are already leading federal statistics agencies to defer or discontinue dozens of reports on everything from income to overseas labor costs. The economic data these agencies produce are key to tracking G.D.P., earnings and jobs, and to informing the Federal Reserve, the executive branch and Congress on the state of the economy and the impact of economic policies. The data are also critical for decisions made by state and local policy makers, businesses and households.

The combined budget for all the federal statistics agencies totals less than 0.1 percent of the federal budget. Yet the same across-the-board-cut mentality that led to sequester and shutdown has shortsightedly cut statistics agencies, too, as if there were something “nonessential” about spending money on accurately assessing the economic effects of government actions and inactions. As a result, as we move through the shutdown, the debt-ceiling fight and beyond, reliable, essential data on the impact of policy decisions will be harder to come by.

Unless the sequester cuts are reversed, funding for economic data will shrink further in 2014, on top of a string of lean budget years. More data reports will be eliminated at the B.L.S., the Census Bureau, the Bureau of Economic Analysis and other agencies. Even more insidious damage will come from compromising the methods for producing the reports that still are paid for and from failing to prepare for the future.

To save money, survey sample sizes will be cut, reducing the reliability of national data and undermining local statistics. Fewer resources will be devoted to maintaining the listings used to draw business survey samples, running the risk that surveys based on those listings won’t do as good a job of capturing actual economic conditions. Hiring and training will be curtailed. Over time, the availability and quality of economic indicators will diminish.

That would be especially paradoxical and backward at a time when economic statistics can and should be advancing through technological innovation instead of marched backward by politics. Integrating survey data, administrative data and commercial data collected with scanners and other digital technologies could produce richer, more useful information with less of a burden on businesses and households.

Now more than ever, framing sound economic policy depends on timely and accurate information about the economy. Bad or ill-targeted data can lead to bad or ill-targeted decisions about taxes and spending. The tighter the budget and the more contentious the political debate around it, the more compelling the argument for investing in federal data that accurately show how government policies are affecting the economy, so we can target the most effective cuts or spending or other policies, and make ourselves accountable for their results. That’s why Congress should restore funding to the federal statistical agencies at a level that allows them to carry out their critical work.”

Collaboration Between Government and Outreach Organizations: A Case Study of the Department of Veterans Affairs


“In this report, Drs. Lael Keiser and Susan Miller examine the critical role of non-governmental outreach organizations in assisting government agencies to determine benefit eligibility of citizens applying for services.  Many non-profits and other organizations help low-income applicants apply for Social Security, Medicaid, and the Supplemental Nutritional Assistance Program (SNAP, or food stamps).
Some outreach organizations help veterans navigate the complexity of the veterans disability benefits program.  These organizations include the American Legion, the Disabled American Veterans, and the Veterans of Foreign Wars, as well as state government-run veterans agencies.  Drs. Keiser and Miller interviewed dozens of managers from the Department of Veterans Affairs (VA) and outreach organizations about their interactions in helping veterans.  They found “there is indeed effective collaboration” and that these organizations serve a key role for veterans in processing their claims.  These organizations also help lighten the workload of VA benefit examiners by ensuring the paperwork is in order in advance, as well as serving as a communications conduit.
Drs. Keiser and Miller found variations in the effectiveness of the relationships between VA and outreach organization staffs and identified best practices for increasing effectiveness.  These lessons can be applied to other agencies that interactive frequently with outreach organizations that assist citizens in navigating the complexity of applying for various government benefit programs.
Listen to the interview on Federal News Radio.”

The Contours of Crowd Capability


New paper by Prashant Shukla and John Prpi: “The existence of dispersed knowledge has been a subject of inquiry for more than six decades. Despite the longevity of this rich research tradition, the “knowledge problem” has remained largely unresolved both in research and practice, and remains “the central theoretical problem of all social science”. However, in the 21st century, organizations are presented with opportunities through technology to potentially benefit from the dispersed knowledge problem to some extent. One such opportunity is represented by the recent emergence of a variety of crowd-engaging information systems (IS).
In this vein, Crowdsourcing  is being widely studied in numerous contexts, and the knowledge generated from these IS phenomena is well-documented. At the same time, other organizations are leveraging dispersed knowledge by putting in place IS-applications such as Predication Markets to gather large sample-size forecasts from within and without the organization. Similarly, we are also observing many organizations using IS-tools such as “Wikis” to access the knowledge of dispersed populations within the boundaries of the organization. Further still, other organizations are applying gamification techniques to accumulate Citizen Science knowledge from the public at large through IS.
Among these seemingly disparate phenomena, a complex ecology of crowd- engaging IS has emerged, involving millions of people all around the world generating knowledge for organizations through IS. However, despite the obvious scale and reach of this emerging crowd-engagement paradigm, there are no examples of research (as far as we know), that systematically compares and contrasts a large variety of these existing crowd-engaging IS-tools in one work. Understanding this current state of affairs, we seek to address this significant research void by comparing and contrasting a number of the crowd-engaging forms of IS currently available for organizational use.

To achieve this goal, we employ the Theory of Crowd Capital as a lens to systematically structure our investigation of crowd-engaging IS. Employing this parsimonious lens, we first explain how Crowd Capital is generated through Crowd Capability in organizations. Taking this conceptual platform as a point of departure, in Section 3, we offer an array of examples of IS currently in use in modern practice to generate Crowd Capital. We compare and contrast these emerging IS techniques using the Crowd Capability construct, therein highlighting some important choices that organizations face when entering the crowd- engagement fray. This comparison, which we term “The Contours of Crowd Capability”, can be used by decision-makers and researchers alike, to differentiate among the many extant methods of Crowd Capital generation. At the same time, our comparison also illustrates some important differences to be found in the internal organizational processes that accompany each form of crowd-engaging IS. In section 4, we conclude with a discussion of the limitations of our work.”

From Crowd-Sourcing Potholes to Community Policing


New paper by Manik Suri (GovLab): “The tragic Boston Marathon bombing and hair-raising manhunt that ensued was a sobering event. It also served as a reminder that emerging “civic technologies” – platforms and applications that enable citizens to connect and collaborate with each other and with government – are more important today than ever before. As commentators have noted, local police and federal agents utilized a range of technological platforms to tap the “wisdom of the crowd,” relying on thousands of private citizens to develop a “hive mind” that identified two suspects within a record period of time.
In the immediate wake of the devastating attack on April 15th, investigators had few leads. But within twenty-four hours, senior FBI officials, determined to seek “assistance from the public,” called on everyone with information to submit all media, tips, and leads related to the Boston Marathon attack. This unusual request for help yielded thousands of images and videos from local Bostonians, tourists, and private companies through technological channels ranging from telephone calls and emails to Flickr posts and Twitter messages. In mere hours, investigators were able to “crowd-source” a tremendous amount of data – including thousands of images from personal cameras, amateur videos from smart phones, and cell-tower information from private carriers. Combing through data from this massive network of “eyes and ears,” law enforcement officials were quickly able to generate images of two lead suspects – enabling a “modern manhunt” to commence immediately.
Technological innovations have transformed our commercial, political, and social realities. These advances include new approaches to how we generate knowledge, access information, and interact with one another, as well as new pathways for building social movements and catalyzing political change. While a significant body of academic research has focused on the role of technology in transforming electoral politics and social movements, less attention has been paid to how technological innovation can improve the process of governance itself.
A growing number of platforms and applications lie at this intersection of technology and governance, in what might be termed the “civic technology” sector. Broadly speaking, this sector involves the application of new information and communication technologies – ranging from robust social media platforms to state-of-the-art big data analysis systems – to address public policy problems. Civic technologies encompass enterprises that “bring web technologies directly to government, build services on top of government data for citizens, and change the way citizens ask, get, or need services from government.” These technologies have the potential to transform governance by promoting greater transparency in policy-making, increasing government efficiency, and enhancing citizens’ participation in public sector decision-making.

GovLab Seeks Open Data Success Stories


Wyatt Kash in InformationWeek: “A team of open government advocates, led by former White House aide Beth Novek, has launched a campaign to identify 500 examples of how freely available government data is being put to profitable use in the private sector.Open Data 500 is part of a broader effort by New York University’s Governance Lab (GovLab) to conduct the “first real, comprehensive study of the use of open government data in the private sector,” said Joel Gurin, founder of OpenDataNow.com and senior adviser at GovLab.
Novek, who served in the White House as the first U.S. deputy CTO and led the White House Open Government Initiative from 2009-2011, founded GovLab while also teaching at the MIT Media Lab and NYU’s Robert F. Wagner Graduate School of Public Service.
In an interview with InformationWeek Government, Gurin explained that the goal of GovLab, and the Open Data 500 project, is to show how technology and new uses of data can make government more effective, and create more of a partnership between government and the public. “We’re also trying to draw on more public expertise to solve government problems,” he said….
Gurin said Open Data 500 will primarily look at U.S.-based, revenue-producing companies or organizations where government data is a key resource for their business. While the GovLab will focus initially on the use of federal data, it will also look at cases where entrepreneurs are making use of state or local data, but in scalable fashion.
“This goes one step further than the datapaloozas” championed by U.S. CTO Todd Park to showcase tools developed by the private sector using government data. “We’re trying to show how we can make data sets even more impactful and useful.”
Gurin said the GovLab team hopes to complete the study by the end of this year. The team has already identified 150 companies as candidates. To submit your company for consideration, visit thegovlab.org/submit-your-company; to submit another company, visit thegovlab.org/open500