Big Data needs Big Theory


Geoffrey West, former President of the Santa Fe Institute: “As the world becomes increasingly complex and interconnected, some of our biggest challenges have begun to seem intractable. What should we do about uncertainty in the financial markets? How can we predict energy supply and demand? How will climate change play out? How do we cope with rapid urbanization? Our traditional approaches to these problems are often qualitative and disjointed and lead to unintended consequences. To bring scientific rigor to the challenges of our time, we need to develop a deeper understanding of complexity itself….
The digital revolution is driving much of the increasing complexity and pace of life we are now seeing, but this technology also presents an opportunity. The ubiquity of cell phones and electronic transactions, the increasing use of personal medical probes, and the concept of the electronically wired “smart city” are already providing us with enormous amounts of data. With new computational tools and techniques to digest vast, interrelated databases, researchers and practitioners in science, technology, business and government have begun to bring large-scale simulations and models to bear on questions formerly out of reach of quantitative analysis, such as how cooperation emerges in society, what conditions promote innovation, and how conflicts spread and grow.
The trouble is, we don’t have a unified, conceptual framework for addressing questions of complexity. We don’t know what kind of data we need, nor how much, or what critical questions we should be asking. “Big data” without a “big theory” to go with it loses much of its potency and usefulness, potentially generating new unintended consequences.
When the industrial age focused society’s attention on energy in its many manifestations—steam, chemical, mechanical, and so on—the universal laws of thermodynamics came as a response. We now need to ask if our age can produce universal laws of complexity that integrate energy with information. What are the underlying principles that transcend the extraordinary diversity and historical contingency and interconnectivity of financial markets, populations, ecosystems, war and conflict, pandemics and cancer? An overarching predictive, mathematical framework for complex systems would, in principle, incorporate the dynamics and organization of any complex system in a quantitative, computable framework.
We will probably never make detailed predictions of complex systems, but coarse-grained descriptions that lead to quantitative predictions for essential features are within our grasp. We won’t predict when the next financial crash will occur, but we ought to be able to assign a probability of one occurring in the next few years. The field is in the midst of a broad synthesis of scientific disciplines, helping reverse the trend toward fragmentation and specialization, and is groping toward a more unified, holistic framework for tackling society’s big questions. The future of the human enterprise may well depend on it.”

Open Data: From ‘Platform’ to ‘Program’


Engaging Cities: “A few months ago, Dutch designer Mark van der Net launched OSCity.nl, a highly interesting example of what can be done with open data. At first, it looks like a mapping tool. The interface shows a – beautifully designed – map of The Netherlands, color coded according to whatever open data set the user selects, varying from geographical height to the location of empty office buildings. As such it is an example of a broader current in which artists, citizens, ngos and business actors have build online tools to visualize all kinds of data, varying from open government data to collaboratively produced data sets focused on issues like environmental pollution.
What makes OSCity interesting is that it allows users to intuitively map various datasets in combination with each other in so called ‘map stories’. For instance, a map of empty office space can be combined with maps of urban growth and decline, the average renting price per square meter of office space, as well as map that displays the prices of houses for sale. The intersection of those maps shows you where empty office spaces are offered at or below half the price of regular houses and apartments. The result is thus not just an aesthetically pleasing state of affairs, but an action map. Policy makers, developers and citizens can use the insights produced by the map to find empty offices that are worthwhile to turn into houses.
There are two important lessons we can learn from this project. First, it shows the importance of programs like OSCity to make open data platforms operationable for various actors. Over the last few years governments and other organizations have started to open up their datasets, often accompanied with high expectations of citizen empowerment and greater transparency of governments. However, case studies have showed that opening up data and building an open platform is only a first step. Dawes and Helbig have shown that various stakeholders have various needs in terms of standards and protocols, whereas both citizens and government officials need the relevant skills to be able to understand and operate upon the data. ‘Vast amounts of useful information are contained in government data systems’, they write, ‘but the systems themselves are seldom designed for use beyond the collecting agency’s own needs.’ In other words: what is needed to deliver on the expectations of open data, is not only a platform – a publicly available database – but also what I have called ‘programs’ – online tools with intuitive interfaces that make this data intelligible and actionable in concert with the needs of the public.
There is a second issue that OSCity raises. As Jo Bates has pointed out, the main question is: who exactly is empowered through programs like this? Will ‘programs’ that make data operationable work for citizens? Or will their procedures, standards and access be organized to benefit corporate interests? These do not have to be necessarily contradicting, but if the goal is to empower citizens, it is important to engage them as stakeholders in the design of these programs.”

Neighborhood Buzz: What people are tweeting in your city.


“Neighborhood Buzz is an experimental system that lets you find out what people in your neighborhood, and neighborhoods in cities around the country, are talking about on Twitter. When you select a neighborhood from a city map, Neighborhood Buzz displays the main topics that people in that neighborhood are discussing — politics, sports, food, etc. — and then lets you drill down to look at the individual tweets in those categories.
The system also lets you see, at a glance, how much people in different neighborhoods in a city are talking about a given topic through a “heat map” overlay on the city’s geographical map.
Neighborhood Buzz uses geo-located tweets as input. Only a small fraction of tweets currently have location tags, but the number is sufficient to provide tens or hundreds of tweets per neighborhood per day.
The topical categorizer that the system uses is statistical — which means that even though we show only the tweets we are most confident the system is categorizing correctly, it still sometimes makes mistakes. You can let us know when the system has incorrectly categorized a tweet, and eventually that will help us to improve the system.
Neighborhood Buzz was originally developed at Northwestern University Knight Lab in our joint projects class in technology and journalism, involving students and faculty from the Medill School of Journalism and the McCormick School of Engineering, Dept. of Electrical Engineering and Computer Science, at Northwestern. It was then re-architected and further developed at the Knight Lab.”

BBC throws weight behind open data movement


The Telegraph: “The BBC has signed Memoranda of Understanding (MoUs) with the Europeana Foundation, the Open Data Institute, the Open Knowledge Foundation and the Mozilla Foundation, supporting free and open internet technologies…

The agreements will enable closer collaboration between the BBC and each of the four organisations on a range of mutual interests, including the release of structured open data and the use of open standards in web development, according to the BBC.
One aim of the agreement is to give clear technical standards and models to organisations who want to work with the BBC, and give those using the internet a deeper understanding of the technologies involved.
The MoUs also bring together several existing areas of research and provide a framework to explore future opportunities. Through this and other initiatives, the BBC hopes to become a catalyst for open innovation by publishing clear technical standards, models, expertise and – where feasible – data.
The BBC has been publishing linked open data for some time, most notably as part of the /programmes service, where machine-readable information about the programme schedule is made available online.
It also helped to deliver the Olympics Data Service, which underpinned 10,490 athlete pages on the BBC sport website during the 2012 Olympics….

“The BBC has been at the forefront of technological innovation around broadcasting and online for many years delivering the benefits of new technologies to licence fee payers, offering new services and products to audiences around the world, and creating public value in the digital economy,” said James Purnell, BBC Director of Strategy and Digital.”

Can These Entrepreneurs Solve The Intractable Problems Of City Government?


FastCoExist: “In case the recent Obamacare debacle didn’t make it clear enough, the government has some serious problems getting technology to work correctly. This is something that President Obama has recognized in the past. In July, he made this statement: “I’m going to be asking more people around the country–more inventors and entrepreneurs and visionaries–to sign up to serve. We’ve got to have the brightest minds to help solve our biggest challenges.”
In San Francisco, that request has been taken on by the newly minted Entrepreneur-in-Residence (EIR) program–the first ever government-run program that helps startups to develop technologies that can be used to deal with pressing government issues. It’s kind of like a government startup incubator. This week, the EIR program announced 11 finalists for the program, which received 200 applications from startups across the world. Three to five startups will ultimately be chosen for the opportunity….
The 11 finalists range from small startups with just a handful of people doing cutting-edge work to companies valued at over $1 billion. Some of the highlights:

  • Arrive Labs, a company that crowdsources public transit data and combines it with algorithms and external conditions (like the weather) to predict congestion, and to offer riders faster alternatives.
  • A startup called Regroup that offers group messaging through a number of channels, including email, text, Facebook, Twitter, and digital signs.
  • Smart waste management company Compology, which is working on a wireless waste monitoring system to tell officials what’s inside city dumpsters and when they are full.
  • Birdi, a startup developing smart air quality, carbon monoxide, and smoke detectors that send alerts to your smartphone. The company also has an open API so that developers can pull in public outdoor air quality data.
  • Synthicity’s 3-D digital city simulation (think “real-life Simcity”), which is based on urban datasets. The simulation is geared towards transportation planners, urban designers, and others who rely on city data to make decisions…”

An Experiment in Hiring Discrimination Via Online Social Networks


Paper by Acquisti, Alessandro and Fong, Christina: “Surveys of U.S. employers suggest that numerous firms seek information about job applicants online. However, little is known about how this information gathering influences employers’ hiring behavior. We present results from two complementary randomized experiments (a field experiment and an online experiment) on the impact of online information on U.S. firms’ hiring behavior. We manipulate candidates’ personal information that is protected under either federal laws or some state laws, and may be risky for employers to enquire about during interviews, but which may be inferred from applicants’ online social media profiles. In the field experiment, we test responses of over 4,000 U.S. employers to a Muslim candidate relative to a Christian candidate, and to a gay candidate relative to a straight candidate. We supplement the field experiment with a randomized, survey-based online experiment with over 1,000 subjects (including subjects with previous human resources experience) testing the effects of the manipulated online information on hypothetical hiring decisions and perceptions of employability. The results of the field experiment suggest that a minority of U.S. firms likely searched online for the candidates’ information. Hence, the overall effect of the experimental manipulations on interview invitations is small and not statistically significant. However, in the field experiment, we find evidence of discrimination linked to political party affiliation. Specifically, following the Gallup Organization’s segmentation of U.S. states by political ideology, we use results from the 2012 presidential election and find evidence of discrimination against the Muslim candidate compared to the Christian candidate among employers in more Romney-leaning states and counties. These results are robust to controlling for firm characteristics, state fixed effects, and a host of county-level variables. We find no evidence of discrimination against the gay candidate relative to the straight candidate. Results from the online experiment are consistent with those from the field experiment: we find more evidence of bias among subjects more likely to self-report more political conservative party affiliation. The online experiment’s results are also robust to controlling for demographic variables. Results from both experiments should be interpreted carefully. Because politically conservative states and counties in our field experiment, and more conservative party affiliation in our online experiment, are not randomly assigned, the result that discrimination is greater in more politically conservative areas and among more politically conservative online subjects should be interpreted as correlational, not causal.”

Open Data and Citizen Engagement – Disentangling the Relationship


Tiago Peixoto: “…Within an ecosystem that combines transparency and participation, examining the relationship between the two becomes essential. More specifically, a clearer understanding of the interaction between open data and participatory institutions remains a frontier to be explored….

R&D for Data-Driven Participation

Coming up with clear hypotheses and testing them is essential if we are to move forward with the ecosystem that brings together open data, participation and accountability. Surely, many organizations working in the open government space are operating with limited resources, squeezing their budgets to keep their operational work going. In this sense, conducting experiments to test hypotheses may appear as a luxury that very few can afford.
Nevertheless, one of the opportunities provided by the use of technologies for civic behavior is that of potentially driving down the costs for experimentation. For instance, online and mobile experiments could play the role of tech-enabled (and affordable) randomized controlled trials, improving our understanding of how open data can be best used to spur collective action. Thinking of the ways in which technology can be used to conduct lowered costs experiments to shed light on behavioral and causal chains is still limited to a small number of people and organizations, and much work is needed on that front.
Yet, it is also important to acknowledge that experiments are not the only source of relevant knowledge. To stick with a simple example, in some cases even an online survey trying to figure out who is accessing data, what data they use, and how they use it may provide us with valuable knowledge about the interaction between open data and citizen action. In any case, however, it may be important that the actors working in that space agree upon a minimal framework that facilitates comparison and incremental learning: the field of technology for accountability desperately needs a more coordinated research agenda.

Citizen Data Platforms?

As more and more players engage in participatory initiatives, there is a significant amount of citizen-generated data being collected, which is important on its own. However, in a similar vein to government data, the potential of citizen data may be further unlocked if openly available to third parties who can learn from it and build upon it. In this respect, it might not be long before we realize the need to have adequate structures and platforms to host this wealth of data that – hopefully – will be increasingly generated around the world. This would entail that not only governments open up their data related to citizen engagement initiatives, but also that other actors working in that field – such as donors and NGOs – do the same. Such structures would also be the means by which lessons generated by experiments and other approaches are widely shared, bringing cumulative knowledge to the field.
However, as we think of future scenarios, we should not lose sight of current challenges and knowledge gaps when it comes to the relationship between citizen engagement and open data. Better disentangling the relationship between the two is the most immediate priority, and a long overdue topic in the open government conversation.”

Kick the data secrecy habit and everyone wins


New Scientist:Freely available information has the power to make and save money and enhance our daily life, says Nigel Shadbolt of the Open Data Institute…
What kind of things do these start-ups do?
Our first success was with data analytics company Mastadon C, which used public information to look at doctors’ prescribing habits for cholesterol-lowering drugs. They found that by switching from brand names to generic drugs, doctors could save the NHS more than £200 million a year.
Have you looked at other public resources?
Another start-up, Placr, is unifying timetables and live departure and disruption information for UK bus, rail, underground, ferry and tram services. It uses feeds from many organisations to provide an app for travellers and services for local authorities. A recent review in London – where Transport for London has made lots of its data open – showed that millions of journeys are being altered to avoid disruptions on the basis of this information. Time savings alone add up to £58 million a year.
Is there a danger of creating more big companies that will turn into monopolies?
We want companies that use open data to make money, and they will try to defend their patches. But if we leave the data open, others can exploit it too. Nobody can own or monopolise the data. I think we can make more money and create more benefit by making data open, and I’m sure we will even dislodge a few monopolies along the way.”

Social Media Can Boost Disease Outbreak Monitoring, Study Finds


IHealthBeat: “Monitoring social media websites like Twitter could help health officials and providers identify in real time severe medical outbreaks, allowing them to more efficiently direct resources and curb the spread of disease, according to a San Diego State University study published last month in the Journal of Medical Internet Research, Medical News Today reports…
For the study, lead researcher and San Diego State University geography professor Ming-Hsiang Tsou and his team used a program to monitor tweets that originated within a 17-mile radius of 11 cities. The program recorded details of tweets containing the words “flu” or “influenza,” including:

  • Origin;
  • Username;
  • Whether the tweet was an original or a retweet; and
  • Any links to websites in the tweet.

Researchers then compared their findings with regional data based on CDC’s definition of influenza-like illness….
The program recorded data on 161,821 tweets that included the word “flu” and 6,174 tweets that included the word “influenza” between June 2012 and the beginning of December 2012.
According to the study, nine of the 11 cities exhibited a statistically significant correlation between an uptick in the number of tweets mentioning the keywords and regional outbreak reports. In five of the cities — Denver, Fort Worth, Jacksonville, San Diego and Seattle — the algorithm noted the outbreaks sooner than regional reports.
Tsou in a release said that the social media monitoring program detected outbreaks daily, while “[t]raditional procedures” typically “take at least two weeks.”

Selected Readings on Crowdsourcing Funds


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of crowdsourcing was originally published in 2013.

Crowdsourcing funds, or crowdfunding, is an emerging method for raising money that allows a wide pool of people to make small investments, gain access to ideas and projects they feel personally connected to, and spur growth in small businesses and social ventures. Popular crowdfunding platforms like Kickstarter and Indiegogo helped bring the practice into the public consciousness. Now, civic crowdfunding platforms like Citizinvestor and Spacehive are helping to apply this innovative funding model already in use for helping to fund artists, charities and inventors to help address public concerns traditionally considered under government’s purview.

Crowdfunding has also received recent attention from policymakers in the US through the US Securities JOBS Act, which provides an exemption from the registration requirements for offerings of securities by a company made through an SEC registered Crowdfunding Platform.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Aitamurto, Tanja. “The Impact of Crowdfunding on Journalism.” Journalism Practice 5, no. 4 (2011): 429–445. http://bit.ly/1bk4wNI.

  • This article analyzes the impact of crowdfunding on journalism, where, “readers’ donations accumulate into judgments about the issues that need to be covered.”
  • Aitamurto’s central findings inspire optimism regarding the potential of crowdfunding for the public good. She finds that, “From the donor’s perspective, donating does not create a strong relationship from donor to journalist or to the story to which they contributed;” rather, “[t]he primary motivation for donating is to contribute to the common good and social change.”

Baeck, Peter and Liam Collins. Working the Crowd: A Short Guide to Crowdfunding and How It Can Work for You. Nesta, May 2013. http://bit.ly/Hkl3rx.

  • This report “aims to give a quick overview of crowdfunding, the different versions of the model and how they work.”
  •  The authors list four technological innovations that have contributed to the growth of modern crowdfunding:
    • An online place for pitches
    • Moving your money with a click
    • The social engine
    • Fueling campaigns with algorithms
  • Baeck and Collins consider public and social projects to be one of the areas where crowdfunding can have a significant impact. They argue that civic crowdfunding “has the potential to disrupt how money for charitable causes is sourced and how public services and spaces are used and paid for.”

Best, Jason, Sherwood Neiss and Davis Jones. “How Crowdfund Investing Helps Solve Three Pressing Socioeconomic Challenges.” Crowdfunding PR, Social Media & Marketing Campaigns. http://bit.ly/1aaTGwQ.

  • This paper outlines the forces driving the widespread use of crowdfund investing, namely social media, the existence of funding systems that marginalize people outside of major urban centers and the ability of people to function remotely from their work spaces.
  • The authors also discuss a number of public-facing benefits of crowdfund investing:
    • Crowdfund Investing Creates Jobs
    • Bringing capital in off the sidelines for use by small businesses
    • Funding entrepreneurs everywhere
    • Capital no longer for the chosen few
    • Crowdfund Investing Grows GDP
    • Reduction in the failure rate of small businesses
    • Crowd monitoring reduces agency costs

De Buysere, Kristof, Oliver Gajda, Ronald Kleverlaan, Dan Marom, and Matthias Klaes. A Framework for European Crowdfunding, 2012. http://bit.ly/1aaTFsE.

  • This paper seeks to provide a “concise overview of the state of crowdfunding in Europe, with the aim of establishing policy and a distinct framework for the European crowdfunding industry,” which the authors believe, “will aid in the economic recovery of Europe.”
  • The authors, in their advocacy for greater crowdfunding opportunities for businesses in Europe, provide a rationale for the practice that also helps demonstrate the potential benefits of greater crowdfunding opportunities within government. They argue that, “Crowdfunding can offer unique support for budding and existing entrepreneurs on multiple levels. No other investment form, be it debt or equity, can provide the benefits of pre-sales, market research, word-of-mouth promotion, and crowd wisdom without additional cost.”

Hollow, Matthew. “Crowdfunding and Civic Society in Europe: A Profitable Partnership?” Open Citizenship 4, no. 1 (May 20, 2013). http://bit.ly/1cgzefL.

  • In this paper, Hollow explores the rise of crowdfunding platforms (CFPs), particularly related to civil society. He notes that, “[f]or civil society activists and others concerned with local welfare issues, the emergence of these new CFPs has been hugely significant: It has opened up a new source of funding when governments and businesses around the world are cutting back on their spending.”
  • Hollow argues that, “aside from their evident financial and economic benefits, CFPs also have the capacity to help foster and strengthen non-parliamentary democratic structures and practices. As such, they should be supported and encouraged as part of a framework of further European democratization and civic integration.”

Mollick, Ethan R. “The Dynamics of Crowdfunding: An Exploratory Study.” Journal of Business Venturing (June 26, 2013). http://bit.ly/1aaTJIV.

  • This paper “offers a description of the underlying dynamics of success and failure among crowdfunded ventures,” focusing on how personal networks and the project quality and viability have an impact on the success of crowdfunding efforts.
  • Mollick also highlights how other factors, like the geography of the project, design choices made by crowdfunding sites and developments in technology in this space all have an influence on the relationship between backers and project founders.
  • The paper finally demonstrates that projects that succeed do so by a small margin and those that fail seemingly by a large margin suggesting the influence of social bias and crowd influence.

Stemler, Abbey R. “The JOBS Act and Crowdfunding: Harnessing the Power—and Money—of the Masses.” Business Horizons 56, no. 3 (May 2013): 271–275. http://bit.ly/1ih9lts.

  • This paper discusses the Jumpstart Our Business Startups (JOBS) Act signed into law by President Obama in 2012, with a specific focus on the CROWDFUND Act, which enables entrepreneurs and small business owners to sell limited equity in their companies to a “crowd” of investors.
  • The objective of the Act is to exempt crowdfunding from registration requirement costs, allowing the potential of equity-based funding to be realized, by creating a pathway for underfunded entrepreneurs to access otherwise inaccessible streams of funding.
  • Stemler argues that the Act helps to legitimize crowdfunding as a community-building and fundraising tool for the business community, and also helps build better relationships between small business owners and government.