Using Big Data to Ask Big Questions


Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?

Looking at Legislation

Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.

Analyze All the Things!

Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”

The Shutdown’s Data Blackout


Opinion piece by Katherine G. Abraham and John Haltiwanger in The New York Times: “Today, for the first time since 1996 and only the second time in modern memory, the Bureau of Labor Statistics will not issue its monthly jobs report, as a result of the shutdown of nonessential government services. This raises an important question: Are the B.L.S. report and other economic data that the government provides “nonessential”?

If we’re trying to understand how much damage the shutdown or sequestration cuts are doing to jobs or the fragile economic recovery, they are definitely essential. Without robust economic data from the federal government, we can speculate, but we won’t really know.

In the last two shutdowns, in 1995 and 1996, the Congressional Budget Office estimated the economic damage at around 0.5 percent of the gross domestic product. This time, Moody’s estimates that a three-to-four-week shutdown might subtract 1.4 percent (annualized) from gross domestic product growth this quarter and take $55 billion out of the economy. Democrats tend to play up such projections; Republicans tend to play them down. If the shutdown continues, though, we’ll all be less able to tell what impact it is having, because more reports like the B.L.S. jobs report will be delayed, while others may never be issued.

In fact, sequestration cuts that affected 2013 budgets are already leading federal statistics agencies to defer or discontinue dozens of reports on everything from income to overseas labor costs. The economic data these agencies produce are key to tracking G.D.P., earnings and jobs, and to informing the Federal Reserve, the executive branch and Congress on the state of the economy and the impact of economic policies. The data are also critical for decisions made by state and local policy makers, businesses and households.

The combined budget for all the federal statistics agencies totals less than 0.1 percent of the federal budget. Yet the same across-the-board-cut mentality that led to sequester and shutdown has shortsightedly cut statistics agencies, too, as if there were something “nonessential” about spending money on accurately assessing the economic effects of government actions and inactions. As a result, as we move through the shutdown, the debt-ceiling fight and beyond, reliable, essential data on the impact of policy decisions will be harder to come by.

Unless the sequester cuts are reversed, funding for economic data will shrink further in 2014, on top of a string of lean budget years. More data reports will be eliminated at the B.L.S., the Census Bureau, the Bureau of Economic Analysis and other agencies. Even more insidious damage will come from compromising the methods for producing the reports that still are paid for and from failing to prepare for the future.

To save money, survey sample sizes will be cut, reducing the reliability of national data and undermining local statistics. Fewer resources will be devoted to maintaining the listings used to draw business survey samples, running the risk that surveys based on those listings won’t do as good a job of capturing actual economic conditions. Hiring and training will be curtailed. Over time, the availability and quality of economic indicators will diminish.

That would be especially paradoxical and backward at a time when economic statistics can and should be advancing through technological innovation instead of marched backward by politics. Integrating survey data, administrative data and commercial data collected with scanners and other digital technologies could produce richer, more useful information with less of a burden on businesses and households.

Now more than ever, framing sound economic policy depends on timely and accurate information about the economy. Bad or ill-targeted data can lead to bad or ill-targeted decisions about taxes and spending. The tighter the budget and the more contentious the political debate around it, the more compelling the argument for investing in federal data that accurately show how government policies are affecting the economy, so we can target the most effective cuts or spending or other policies, and make ourselves accountable for their results. That’s why Congress should restore funding to the federal statistical agencies at a level that allows them to carry out their critical work.”

Technology Can Expose Government Sins, But You Need Humans to Fix Them


Lorelei Kelly: “We can’t bring accountability to the NSA unless we figure out how to give the whole legislative branch modern methods for policy oversight. Those modern methods can include technology, but the primary requirement is figuring out how to supply Congress with unbiased subject matter experts—not just industry lobbyists or partisan think tank analysts. Why? Because trusted and available expertise inside the process of policymaking is what is missing today.
According to calculations by the Sunlight Foundation, today’s Congress is operating with about 40 percent less staff than in 1979. According to the Congressional Management Foundation, it’s also contending with at least 800 percent more incoming communications. Yet, instead of helping Congress gain insight in new ways, instead of helping it sort and filter, curate and authenticate, technology has mostly created disorganized information overload. And the information Congress receives is often sentiment, not substance. Elected leaders should pay attention to both, but need the latter for policymaking.
The result? Congress defaults to what it knows. And that means slapping a “national security” label on policy questions that instead deserve to be treated as broad public conversations about the evolution of American democracy. This is a Congress that categorizes questions about our freedoms on the Internet as “cyber security.”
What can we do? First, recognize that Congress is an obsolete and incapacitated system, and treat it as such. Technology and transparency can help modernize our legislature, but they can’t fix the system of governance.
Activists, even tech-savvy ones, need to talk directly with Congressional members and staff at home. Hackers, you should invite your representatives to wherever you do your hacking. And then offer your skills to help them in any way possible. You may create some great data maps and visualization tools, but the real point is to make friends in Congress. There’s no substitute for repeated conversations, and long-haul engagement. In politics, relationships will leverage the technology. All technology can do is help you find one another.
Without our help and our knowledge, our elected leaders and governing institutions won’t have the bandwidth to cope with our complex world. This will be a steep climb. But, like nearly every good outcome in politics, the climb starts with an outstretched hand, not one that’s poised at a keyboard, ready to tweet.”

How to Change the World by Building a Swarm


Nina Misuraca Ignaczak at Shareable: “In 2005, Rick Falvinge of Sweden launched a new political party, the Swedish Pirate Party, on a platform to reform copyright and patent laws. It’s now the third largest party in Sweden, it won two European Parliament seats in 2009, and it inspired the International Pirate Party movement with representation in over 60 countries. The rise of the party has been remarkably fast. In Swarmwise: The Tactical Manual to Changing the World, Falvinge describes how he did it with a unique, decentralized organizing architecture that leverages the power of technology and the crowd to spread ideas and work across diverse groups of people.
Falvinge defines a swarm as: “a decentralized, collaborative effort of volunteers that looks like a hierarchical, traditional organization from the outside. It is built by a small core of people that construct a scaffolding of go-to people, enabling a large number of volunteers to cooperate on a common goal in quantities of people not possible before the net was available.”
The key is decentralization. The founder must set the vision and goal and then release control of messaging and branding, delegate as much authority as possible, and embrace the fact that the only way to lead is to inspire.
A swarm has a shared direction, values and method. Informal leadership is strong, and focuses on everyone’s contributions. The main benefits to swarm organization are:

  • Speed of operation
  • Next-to-nothing operating cost
  • Large number of devoted volunteers
  • Open and inviting to anyone
  • No recruitment process
  • Multiple solutions tried in parallel
  • Transparent by default

Step One: Find an idea to change the world that people can get excited about.
This is critical. The idea must be a game-changer- so exciting, revolutionary and provocative that it will sell itself. Your idea must have four key attributes to be worthy:

  • Tangible: You must have concrete goals with specifics on when this goal should happen, where it will happen, and how it will happen. In the case of the Swedish Pirate Party, the goal was to elect an open-information platform candidate to the European Parliament in the next election. Period.
  • Credible: You must present the goals as realistic and doable.  The key is to strike a balance between a change-the-world idea and pure fantasy.
  • Inclusive: There must be a role and room for participation for everyone, and everyone must see not only how they will personally benefit form the idea but also ho they can be a part of making it happen.
  • Epic: The idea must be a big one, capable of changing how things are done on a broad scale, and people must see the scope of the idea’s impact when it is presented.

Step Two: Do the Math

All versions of the book (including free ones, of course) are available at the bottom of this page.”

Making All Voices Count


Launch of Making All Voices Count: “Making All Voices Count is a global initiative that supports innovation, scaling-up, and research to deepen existing innovations and help harness new technologies to enable citizen engagement and government responsiveness….Solvable problems need not remain unsolved. Democratic systems in the 21st century continue to be inhibited by 19th century timescales, with only occasional opportunities for citizens to express their views formally, such as during elections. In this century, many citizens have access to numerous tools that enable them to express their views – and measure government performance – in real time.
For example, online reporting platforms enable citizens to monitor the election process by reporting intimidation, vote buying, bias and misinformation; access to mobile technology allows citizens to update water suppliers on gaps in service delivery; crisis information can be crowdsourced via eyewitness reports of violence, as reported by email and sms.
The rise of mobile communication, the installation of broadband and the fast-growing availability of open data, offer tremendous opportunities for data journalism and new media channels. They can inspire governments to develop new ways to fight corruption and respond to citizens efficiently, effectively and fairly. In short, developments in technology and innovation mean that government and citizens can interact like never before.
Making All Voices Count is about seizing this moment to strengthen our commitments to promote transparency, fight corruption, empower citizens, and harness the power of new technologies to make government more effective and accountable.
The programme specifically aims to address the following barriers that weaken the link between governments and citizens:

  • Citizens lack incentives: Citizens may not have the necessary incentives to express their feedback on government performance – due to a sense of powerlessness, distrust in the government, fear of retribution, or lack of reliable information
  • Governments lack incentives: At the same time, governments need incentives to respond to citizen input whenever possible and to leverage citizen participation. The government’s response to citizens should be reinforced by proactive, public communication.  This initiative will help create incentives for government to respond.  Where government responds effectively, citizens’ confidence in government performance and approval ratings are likely to increase
  • Governments lack the ability to translate citizen feedback into action: This could be due to anything from political constraints to a lack of skills and systems. Governments need better tools to effectively analyze and translate citizen input into information that will lead to solutions and shape resource allocation. Once captured, citizens’ feedback (on their experiences with government performance) must be communicated so as to engage both the government and the broader public in finding a solution.
  • Citizens lack meaningful opportunities: Citizens need greater access to better tools and know-how to easily engage with government in a way that results in government action and citizen empowerment”

Embracing Expertise


Biella Coleman in Concurring Opinions: “I often describe hacker politics as Weapons of the Geek, in contrast to Weapons of the Weak—the term anthropologist James Scott uses to capture the unique, clandestine nature of peasant politics. While Weapons of the Weak is a modality of politics among disenfranchised, economically marginalized populations who engage in small-scale illicit acts —such as foot dragging and minor acts of sabotage—that don’t appear on their surface to be political, Weapons of the Geek is a modality of politics exercised by a class of privileged actors who often lie at the center of economic life. Among geeks and hackers, political activities are rooted in concrete experiences of their craft—administering a server or editing videos—and portion of these hackers channel these skills toward political life. To put another way hackers don’t necessarily have class-consciousness, though some certainly do, but they all tend to have craft consciousness. But they have already shown they are willing to engage in prolific and distinct types of political acts from policy making to party politics, from writing free software to engaging in some of the most pronounced and personally risky acts of civil disobedience of the last decade as we saw with Snowden. Just because they are hackers does not mean they are only acting out their politics through technology even if their technological experiences usually inform their politics.
It concerns and bothers me that most technologists are male and white but I am not concerned, in fact I am quite thrilled, these experts are taking political charge. I tend to agree with Michael Shudson’s reading of Walter Lippman that when it comes to democracy we need more experts not less: “The intellectual challenge is not to invent democracy without experts, but to seek a way to harness experts to a legitimately democratic function.
Imagine if as many doctors and professors mobilized their moral authority and expertise as hackers have done, to rise up and intervene in the problems plaguing their vocational domains. Professors would be visibly denouncing the dismal and outrageous labor conditions of adjuncts whose pay is a pittance. Doctors would be involved in the fight for more affordable health care in the United States. Mobilizing expertise does not mean other stakeholders can’t and should not have a voice but there are many practical and moral reasons why we should embrace a politics of expertise, especially if configured to allow more generally contributions.
 
More than any other group of experts, hackers have shown how productive an expert based politics can be. And many domains of hacker and geek politics such as the Pirate Parties and Anonymous are interesting precisely for how they marry an open participatory element along with a more technical, expert-based one. Expertise can co-exist with participation if configured as such.
My sense is that hacker (re: technically informed) based politics will grow more important in years to come. Just last week I went to visit one hacker-activist, Jeremy Hammond who is in jail for his politically motivated acts of direct action. I asked him what he thought of Edward Snowden’s revelations about the NSA’s blanket surveillance of American citizens. Along with saying he was encouraged for someone dared to expose this wrongdoing (as many of us are), he captured the enormous power held by hackers and technologists when he followed with this statement: “there are all these nerds who don’t agree with what is politically happening and they have power.”
Hammond and others are exercising their technical power and I generally think this is a net gain for democracy. But it is why we must diligently work toward establishing more widespread digital and technical literacy. The low numbers of female technologists and other minorities in and out of hacker-dom are appalling and disturbing (and why I am involved with initiatives like those of NCWIT to rectify this problem). There are certainly barriers internal to the hacker world but the problems are so entrenched and so systematic unless those are solved, the numbers of women in voluntary and political domains will continue to be low.
So it is not that expertise is the problem. It is the barriers that prevent a large class of individuals from ever becoming experts that concerns me the most”.

Explore the world’s constitutions with a new online tool


Official Google Blog: “Constitutions are as unique as the people they govern, and have been around in one form or another for millennia. But did you know that every year approximately five new constitutions are written, and 20-30 are amended or revised? Or that Africa has the youngest set of constitutions, with 19 out of the 39 constitutions written globally since 2000 from the region?
The process of redesigning and drafting a new constitution can play a critical role in uniting a country, especially following periods of conflict and instability. In the past, it’s been difficult to access and compare existing constitutional documents and language—which is critical to drafters—because the texts are locked up in libraries or on the hard drives of constitutional experts. Although the process of drafting constitutions has evolved from chisels and stone tablets to pens and modern computers, there has been little innovation in how their content is sourced and referenced.
With this in mind, Google Ideas supported the Comparative Constitutions Project to build Constitute, a new site that digitizes and makes searchable the world’s constitutions. Constitute enables people to browse and search constitutions via curated and tagged topics, as well as by country and year. The Comparative Constitutions Project cataloged and tagged nearly 350 themes, so people can easily find and compare specific constitutional material. This ranges from the fairly general, such as “Citizenship” and “Foreign Policy,” to the very specific, such as “Suffrage and turnouts” and “Judicial Autonomy and Power.”
Our aim is to arm drafters with a better tool for constitution design and writing. We also hope citizens will use Constitute to learn more about their own constitutions, and those of countries around the world.”

Swarm-Based Medicine


Paul Martin Putora and Jan Oldenburgin the Journal of Medical Internet Research: “Humans, armed with Internet technology, exercise crowd intelligence in various spheres of social interaction ranging from predicting elections to company management. Internet-based interaction may result in different outcomes, such as improved response capability and decision-making quality.
The direct comparison of swarm-based medicine with evidence- or eminence-based is interesting, but these concepts should be perceived as complementing each other and working independently of each other. Optimal decision making depends on a balance of personal knowledge and swarm intelligence, taking into account the quality of each, with their weight in decisions being adapted accordingly. The possibility of balancing controversial standpoints and achieving acceptable conclusions for the majority of participants has been an important task of scientific and medical conferences since the Age of Enlightenment in the 17th and 18th centuries. Our swarm continues with this interconnecting synchronization at an unprecedented speed and is, thanks to eVotes, Internet forums, and the like, more reactive than ever. Faster changes in our direction of movement, like a school of fish, are becoming possible. Information spreads from one individual to another. It is unconscious, but with our own dance we influence the rest of the beehive.
Within an environment, individual behavior determines the behavior of the collective and vice versa. Internet technology has dramatically changed the environment we behave in. Traditionally, medical information was provided to patients as well as to physicians by experts. This intermediation was characterized by an expert standing between sources of information and the user. Currently, and probably even more so in the future, Web 2.0 and appropriate algorithms enable users to rely on the guidance or behavior of their peers in selecting and consuming information. This is one of many processes facilitated by medicine 2.0 and is described as “apomediation”. Apomediation, whether implicit or explicit, increases the influence of individuals on others. For an individual to adapt its behavior within a swarm, other individuals need to be perceived and their actions reacted upon. Through apomediation, more individuals take part in the swarm.
Our patients are better informed; second opinions can be sought via the Internet within hours. Our individual behavior is influenced by online resources as well as digital communication with our colleagues. This change in individual behavior influences the way we find, understand, and adopt guidelines. Societies representing larger groups within the swarms use this technology to create recommendations. This process is influenced by individuals and previous actions of the community; these then in return influence individual behavior. Information technology has a major impact on the lifecycle of guidelines and recommendations. There is no entry and exit point for IT in this regard. With increasing influence on individual behavior, its influence on collective behavior increases, influencing the other direction to the same extent.
Dynamic changes in movement of the swarm and within the swarm may lead to individuals leaving the herd. These may influence the herd to move in the direction of the outliers. At the same time, an individual leaving a flock or swarm is exposed. Physicians as well as clinical centers expose themselves when they leave the group for the sake of innovation. Negative results and failure might lead to legal exposure should treatments fail.
The perception of swarm behavior itself changes the way we approach guidelines. When several guidelines are published, being aware of them as a result of interaction increases our awareness for bias. Major deviations from other recommendations warrant scrutiny. The perception of swarm behavior and embracing the knowledge of the swarm may lead to an optimized use of resources. Information that has already been obtained may be incorporated directly by agents, enabling them to build on this and establish new knowledge—as social learning agents”

Civics for a Digital Age


Jathan Sadowski  in the Atlantic on “Eleven principles for relating to cities that are automated and smart: Over half of the world’s population lives in urban environments, and that number is rapidly growing according to the World Health Organization. Many of us interact with the physical environments of cities on a daily basis: the arteries that move traffic, the grids that energize our lives, the buildings that prevent and direct actions. For many tech companies, though, much of this urban infrastructure is ripe for a digital injection. Cities have been “dumb” for millennia. It’s about time they get “smart” — or so the story goes….
Before accepting the techno-hype as a fait accompli, we should consider the implications such widespread technological changes might have on society, politics, and life in general. Urban scholar and historian Lewis Mumford warned of “megamachines” where people become mere components — like gears and transistors — in a hierarchical, human machine. The proliferation of smart projects requires an updated way of thinking about their possibilities, complications, and effects.
A new book, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, by Anthony Townsend, a research director at the Institute for the Future, provides some groundwork for understanding how these urban projects are occurring and what guiding principles we might use in directing their development. Townsend sets out to sketch a new understanding of “civics,” one that will account for new technologies.
The foundation for his theory speaks to common, worthwhile concerns: “Until now, smart-city visions have been controlling us. What we need is a new social code to bring meaning and to exert control over the technological code of urban operating systems.” It’s easy to feel like technologies — especially urban ones that are, at once, ubiquitous and often unseen to city-dwellers — have undue influence over our lives. Townsend’s civics, which is based on eleven principles, looks to address, prevent, and reverse that techno-power.”

From Crowd-Sourcing Potholes to Community Policing


New paper by Manik Suri (GovLab): “The tragic Boston Marathon bombing and hair-raising manhunt that ensued was a sobering event. It also served as a reminder that emerging “civic technologies” – platforms and applications that enable citizens to connect and collaborate with each other and with government – are more important today than ever before. As commentators have noted, local police and federal agents utilized a range of technological platforms to tap the “wisdom of the crowd,” relying on thousands of private citizens to develop a “hive mind” that identified two suspects within a record period of time.
In the immediate wake of the devastating attack on April 15th, investigators had few leads. But within twenty-four hours, senior FBI officials, determined to seek “assistance from the public,” called on everyone with information to submit all media, tips, and leads related to the Boston Marathon attack. This unusual request for help yielded thousands of images and videos from local Bostonians, tourists, and private companies through technological channels ranging from telephone calls and emails to Flickr posts and Twitter messages. In mere hours, investigators were able to “crowd-source” a tremendous amount of data – including thousands of images from personal cameras, amateur videos from smart phones, and cell-tower information from private carriers. Combing through data from this massive network of “eyes and ears,” law enforcement officials were quickly able to generate images of two lead suspects – enabling a “modern manhunt” to commence immediately.
Technological innovations have transformed our commercial, political, and social realities. These advances include new approaches to how we generate knowledge, access information, and interact with one another, as well as new pathways for building social movements and catalyzing political change. While a significant body of academic research has focused on the role of technology in transforming electoral politics and social movements, less attention has been paid to how technological innovation can improve the process of governance itself.
A growing number of platforms and applications lie at this intersection of technology and governance, in what might be termed the “civic technology” sector. Broadly speaking, this sector involves the application of new information and communication technologies – ranging from robust social media platforms to state-of-the-art big data analysis systems – to address public policy problems. Civic technologies encompass enterprises that “bring web technologies directly to government, build services on top of government data for citizens, and change the way citizens ask, get, or need services from government.” These technologies have the potential to transform governance by promoting greater transparency in policy-making, increasing government efficiency, and enhancing citizens’ participation in public sector decision-making.