Using Big Data to Ask Big Questions


Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?

Looking at Legislation

Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.

Analyze All the Things!

Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”

Are Some Tweets More Interesting Than Others? #HardQuestion


New paper by Microsoft Research (Omar Alonso, Catherine C. Marshall, and Marc Najork): “Twitter has evolved into a significant communication nexus, coupling personal and highly contextual utterances with local news, memes, celebrity gossip, headlines, and other microblogging subgenres. If we take Twitter as a large and varied dynamic collection, how can we predict which tweets will be interesting to a broad audience in advance of lagging social indicators of interest such as retweets? The telegraphic form of tweets, coupled with the subjective notion of interestingness, makes it difficult for human judges to agree on which tweets are indeed interesting.
In this paper, we address two questions: Can we develop a reliable strategy that results in high-quality labels for a collection of tweets, and can we use this labeled collection to predict a tweet’s interestingness?
To answer the first question, we performed a series of studies using crowdsourcing to reach a diverse set of workers who served as a proxy for an audience with variable interests and perspectives. This method allowed us to explore different labeling strategies, including varying the judges, the labels they applied, the datasets, and other aspects of the task.
To address the second question, we used crowdsourcing to assemble a set of tweets rated as interesting or not; we scored these tweets using textual and contextual features; and we used these scores as inputs to a binary classifier. We were able to achieve moderate agreement (kappa = 0.52) between the best classifier and the human assessments, a figure which reflects the challenges of the judgment task.”

How to Make All Apps More Civic


Nick Grossman in Idea Lab: “The big idea in all of this is that through open data and standards and API-based interoperability, it’s possible not just to build more “civic apps,” but to make all apps more civic:
apps
So in a perfect world, I’d not only be able to get my transit information from anywhere (say, Citymapper), I’d be able to read restaurant inspection data from anywhere (say, Foursquare), be able to submit a 311 request from anywhere (say, Twitter), etc.
These examples only scratch the surface of how apps can “become more civic” (i.e., integrate with government/civic information and services). And that’s only really describing one direction: apps tapping into government information and services.
Another, even more powerful direction is the reverse: helping governments tap into the people-power in web networks. In fact, I heard an amazing stat earlier this year:
It’s incredible to think about how web-enabled networks can extend the reach and increase the leverage of public-interest programs and government services, even when (perhaps especially when) that is not their primary function — i.e., Waze is a traffic avoidance app, not a “civic” app. Other examples include the Airbnb community coming together to provide emergency housing after Sandy, and the Etsy community helping to “craft a comeback” in Rockford, Ill.
In other words, helping all apps “be more civic,” rather than just building more civic apps. I think there is a ton of leverage there, and it’s a direction that has just barely begun to be explored.”

The Science Behind Using Online Communities To Change Behavior


Sean D. Young in TechCrunch: “Although social media and online communities might have been developed for people to connect and share information, recent research shows that these technologies are really helpful in changing behaviors. My colleagues and I in the medical school, for instance, created online communities designed to improve health by getting people to do things, such as test for HIV, stop using methamphetamines, and just de-stress and relax. We don’t handpick people to join because we think they’ll love the technology; that’s not how science works. We invite them because the technology is relevant to them — they’re engaging in drugs, sex and other behaviors that might put themselves and others at risk. It’s our job to create the communities in a way that engages them enough to want to stay and participate. Yes, we do offer to pay them $30 to complete an hour-long survey, but then they are free to collect their money and never talk to us again. But for some reason, they stay in the group and decide to be actively engaged with strangers.
So how do we create online communities that keep people engaged and change their behaviors? Our starting point is to understand and address their psychological needs….
Throughout our research, we find that newly created online communities can change people’s behaviors by addressing the following psychological needs:
The Need to Trust. Sharing our thoughts, experiences, and difficulties with others makes us feel closer to others and increases our trust. When we trust people, we’re more open-minded, more willing to learn, and more willing to change our behavior. In our studies, we found that sharing personal information (even something as small as describing what you did today) can help increase trust and change behavior.
The Need to Fit In. Most of us inherently strive to fit in. Social norms, or other people’s attitudes and behaviors, heavily influence our own attitudes and behaviors. Each time a new online community or group forms, it creates its own set of social norms and expectations for how people should behave. Most people are willing to change their attitudes and/or behavior to fit these group norms and fit in with the community.
The Need for Self-Worth. When people feel good about themselves, they are more open to change and feel empowered to be able to change their behavior. When an online community is designed to have people support and care for each other, they can help to increase self-esteem.
The Need to Be Rewarded for Good Behavior. Anyone who has trained a puppy knows that you can get him to keep sitting as long as you keep the treats flowing to reward him, but if you want to wean him off the treats and really train him then you’ll need to begin spacing out the treats to make them less predictable. Well, people aren’t that different from animals in that way and can be trained with reinforcements too. For example, “liking” people’s communications when they immediately join a network, and then progressively spacing out the time that their posts are liked (psychologists call this variable reinforcement) can be incorporated onto social network platforms to encourage them to keep posting content. Eventually, these behaviors become habits.
The Need to Feel Empowered. While increasing self-esteem makes people feel good about themselves, increasing empowerment helps them know they have the ability to change. Creating a sense of empowerment is one of the most powerful predictors of whether people will change their behavior. Belonging to a network of people who are changing their own behaviors, support our needs, and are confident in our changing our behavior empowers us and gives us the ability to change our behavior.”

User-Generated Content Is Here to Stay


in the Huffington Post: “The way media are transmitted has changed dramatically over the last 10 years. User-generated content (UGC) has completely changed the landscape of social interaction, media outreach, consumer understanding, and everything in between. Today, UGC is media generated by the consumer instead of the traditional journalists and reporters. This is a movement defying and redefining traditional norms at the same time. Current events are largely publicized on Twitter and Facebook by the average person, and not by a photojournalist hired by a news organization. In the past, these large news corporations dominated the headlines — literally — and owned the monopoly on public media. Yet with the advent of smartphones and spread of social media, everything has changed. The entire industry has been replaced; smartphones have supplanted how information is collected, packaged, edited, and conveyed for mass distribution. UGC allows for raw and unfiltered movement of content at lightening speed. With the way that the world works today, it is the most reliable way to get information out. One thing that is for certain is that UGC is here to stay whether we like it or not, and it is driving much more of modern journalistic content than the average person realizes.
Think about recent natural disasters where images are captured by citizen journalists using their iPhones. During Hurricane Sandy, 800,000 photos uploaded onto Instagram with “#Sandy.” Time magazine even hired five iPhoneographers to photograph the wreckage for its Instagram page. During the May 2013 Oklahoma City tornadoes, the first photo released was actually captured by a smartphone. This real-time footage brings environmental chaos to your doorstep in a chillingly personal way, especially considering the photographer of the first tornado photos ultimately died because of the tornado. UGC has been monumental for criminal investigations and man-made catastrophes. Most notably, the Boston Marathon bombing was covered by UGC in the most unforgettable way. Dozens of images poured in identifying possible Boston bombers, to both the detriment and benefit of public officials and investigators. Though these images inflicted considerable damage to innocent bystanders sporting suspicious backpacks, ultimately it was also smartphone images that highlighted the presence of the Tsarnaev brothers. This phenomenon isn’t limited to America. Would the so-called Arab Spring have happened without social media and UGC? Syrians, Egyptians, and citizens from numerous nations facing protests can easily publicize controversial images and statements to be shared worldwide….
This trend is not temporary but will only expand. The first iPhone launched in 2007, and the world has never been the same. New smartphones are released each month with better cameras and faster processors than computers had even just a few years ago….”

A New Kind of Economy is Born – Social Decision-Makers Beat the "Homo Economicus"


A new paper by Dirk Helbing: “The Internet and Social Media change our way of decision-making. We are no longer the independent decision makers we used to be. Instead, we have become networked minds, social decision-makers, more than ever before. This has several fundamental implications. First of all, our economic theories must change, and second, our economic institutions must be adapted to support the social decision-maker, the “homo socialis”, rather than tailored to the perfect egoist, known as “homo economicus”….
Such developments will eventually create a participatory market society. “Prosumers”, i.e. co-producing consumers, the new “makers” movement, and the sharing economy are some examples illustrating this. Just think of the success of Wikipedia, Open Streetmap or Github. Open Streetmap now provides the most up-to-date maps of the world, thanks to more than 1 million volunteers.
This is just the beginning of a new era, where production and public engagement will more and more happen in a bottom up way through fluid “projects”, where people can contribute as a leaders (“entrepreneurs”) or participants. A new intellectual framework is emerging, and a creative and participatory era is ahead.
The paradigm shift towards participatory bottom-up self-regulation may be bigger than the paradigm shift from a geocentric to a heliocentric worldview. If we build the right institutions for the information society of the 21st century, we will finally be able to mitigate some very old problems of humanity. “Tragedies of the commons” are just one of them. After so many centuries, they are still plaguing us, but this needn’t be.”

Social media analytics for future oriented policy making


New paper by Verena Grubmüller, Katharina Götsch, and Bernhard Krieger: “Research indicates that evidence-based policy making is most successful when public administrators refer to diversified information portfolios. With the rising prominence of social media in the last decade, this paper argues that governments can benefit from integrating this publically available, user-generated data through the technique of social media analytics (SMA). There are already several initiatives set up to predict future policy issues, e.g. for the policy fields of crisis mitigation or migrant integration insights. The authors analyse these endeavours and their potential for providing more efficient and effective public policies. Furthermore, they scrutinise the challenges to governmental SMA usage in particular with regards to legal and ethical aspects. Reflecting the latter, this paper provides forward-looking recommendations on how these technologies can best be used for future policy making in a legally and ethically sound manner.”

MicroMappers: Microtasking for Disaster Response


Patrick Meier: “My team and I at QCRI are about to launch MicroMappers: the first ever set of microtasking apps specifically customized for digital humanitarian response. If you’re new to microtasking in the context of disaster response, then I recommend reading this, this and this. The purpose of our web-based microtasking apps (we call them Clickers) is to quickly make sense of all the user-generated, multi-media content posted on social media during disasters. How? By using microtasking and making it as easy as a single click of the mouse to become a digital humanitarian volunteer. This is how volunteers with Zooniverse were able to click-and-thus-tag well over 2,000,000 images in under 48-hours.
We have already developed and customized four Clickers using the free and open source microtasking platform CrowdCrafting: TweetClicker, TweetGeoClicker, ImageClicker and ImageGeoClicker. Each Clicker includes a mini-tutorial to guide volunteers.”

Social media: its emerging importance and impact on citizen engagement


New article by Victoria Burton in International Affairs Forum that “examines the impact of social media which not only provides citizens alternative avenues to express themselves about government policies but presents new challenges and means for government to provide services to the public. An example is the CovJam online venture presented by Coventry City and IBM that used social media as part of a three-day brainstorming event about the city. Social media have facilitated government programs to carry out surveys and fine-tune services but perhaps the greatest aspect is that of greater public participation. Moving forward, it will be important to address social media across public sectors and establish strategies to leverage its advantages and benefits.”

From Crowd-Sourcing Potholes to Community Policing


New paper by Manik Suri (GovLab): “The tragic Boston Marathon bombing and hair-raising manhunt that ensued was a sobering event. It also served as a reminder that emerging “civic technologies” – platforms and applications that enable citizens to connect and collaborate with each other and with government – are more important today than ever before. As commentators have noted, local police and federal agents utilized a range of technological platforms to tap the “wisdom of the crowd,” relying on thousands of private citizens to develop a “hive mind” that identified two suspects within a record period of time.
In the immediate wake of the devastating attack on April 15th, investigators had few leads. But within twenty-four hours, senior FBI officials, determined to seek “assistance from the public,” called on everyone with information to submit all media, tips, and leads related to the Boston Marathon attack. This unusual request for help yielded thousands of images and videos from local Bostonians, tourists, and private companies through technological channels ranging from telephone calls and emails to Flickr posts and Twitter messages. In mere hours, investigators were able to “crowd-source” a tremendous amount of data – including thousands of images from personal cameras, amateur videos from smart phones, and cell-tower information from private carriers. Combing through data from this massive network of “eyes and ears,” law enforcement officials were quickly able to generate images of two lead suspects – enabling a “modern manhunt” to commence immediately.
Technological innovations have transformed our commercial, political, and social realities. These advances include new approaches to how we generate knowledge, access information, and interact with one another, as well as new pathways for building social movements and catalyzing political change. While a significant body of academic research has focused on the role of technology in transforming electoral politics and social movements, less attention has been paid to how technological innovation can improve the process of governance itself.
A growing number of platforms and applications lie at this intersection of technology and governance, in what might be termed the “civic technology” sector. Broadly speaking, this sector involves the application of new information and communication technologies – ranging from robust social media platforms to state-of-the-art big data analysis systems – to address public policy problems. Civic technologies encompass enterprises that “bring web technologies directly to government, build services on top of government data for citizens, and change the way citizens ask, get, or need services from government.” These technologies have the potential to transform governance by promoting greater transparency in policy-making, increasing government efficiency, and enhancing citizens’ participation in public sector decision-making.