Smart Cities Turn Big Data Into Insight [Infographic]


Mark van Rijmenam in SmartDataCollective: “Cities around the globe are confronted with growing populations, aging infrastructure, reduced budgets, and the challenge of doing more with less. Applying big data technologies within cities can provide valuable insights that can keep a city habitable. The City of Songdo is a great example of a connected city, where all connected devices create a smart city that is optimized for the every-changing conditions in that same city. IBM recently released an infographic showing the vast opportunities of smart cities and the possible effects on the economy.”
Infographic Smarter Cities. Turning Big Data into Insight

Using Big Data to Ask Big Questions


Chase Davis in the SOURCE: “First, let’s dispense with the buzzwords. Big Data isn’t what you think it is: Every federal campaign contribution over the last 30-plus years amounts to several tens of millions of records. That’s not Big. Neither is a dataset of 50 million Medicare records. Or even 260 gigabytes of files related to offshore tax havens—at least not when Google counts its data in exabytes. No, the stuff we analyze in pursuit of journalism and app-building is downright tiny by comparison.
But you know what? That’s ok. Because while super-smart Silicon Valley PhDs are busy helping Facebook crunch through petabytes of user data, they’re also throwing off intellectual exhaust that we can benefit from in the journalism and civic data communities. Most notably: the ability to ask Big Questions.
Most of us who analyze public data for fun and profit are familiar with small questions. They’re focused, incisive, and often have the kind of black-and-white, definitive answers that end up in news stories: How much money did Barack Obama raise in 2012? Is the murder rate in my town going up or down?
Big Questions, on the other hand, are speculative, exploratory, and systemic. As the name implies, they are also answered at scale: Rather than distilling a small slice of a dataset into a concrete answer, Big Questions look at entire datasets and reveal small questions you wouldn’t have thought to ask.
Can we track individual campaign donor behavior over decades, and what does that tell us about their influence in politics? Which neighborhoods in my city are experiencing spikes in crime this week, and are police changing patrols accordingly?
Or, by way of example, how often do interest groups propose cookie-cutter bills in state legislatures?

Looking at Legislation

Even if you don’t follow politics, you probably won’t be shocked to learn that lawmakers don’t always write their own bills. In fact, interest groups sometimes write them word-for-word.
Sometimes those groups even try to push their bills in multiple states. The conservative American Legislative Exchange Council has gotten some press, but liberal groups, social and business interests, and even sororities and fraternities have done it too.
On its face, something about elected officials signing their names to cookie-cutter bills runs head-first against people’s ideal of deliberative Democracy—hence, it tends to make news. Those can be great stories, but they’re often limited in scope to a particular bill, politician, or interest group. They’re based on small questions.
Data science lets us expand our scope. Rather than focusing on one bill, or one interest group, or one state, why not ask: How many model bills were introduced in all 50 states, period, by anyone, during the last legislative session? No matter what they’re about. No matter who introduced them. No matter where they were introduced.
Now that’s a Big Question. And with some basic data science, it’s not particularly hard to answer—at least at a superficial level.

Analyze All the Things!

Just for kicks, I tried building a system to answer this question earlier this year. It was intended as an example, so I tried to choose methods that would make intuitive sense. But it also makes liberal use of techniques applied often to Big Data analysis: k-means clustering, matrices, graphs, and the like.
If you want to follow along, the code is here….
To make exploration a little easier, my code represents similar bills in graph space, shown at the top of this article. Each dot (known as a node) represents a bill. And a line connecting two bills (known as an edge) means they were sufficiently similar, according to my criteria (a cosine similarity of 0.75 or above). Thrown into a visualization software like Gephi, it’s easy to click around the clusters and see what pops out. So what do we find?
There are 375 clusters in total. Because of the limitations of our data, many of them represent vague, subject-specific bills that just happen to have similar titles even though the legislation itself is probably very different (think things like “Budget Bill” and “Campaign Finance Reform”). This is where having full bill text would come handy.
But mixed in with those bills are a handful of interesting nuggets. Several bills that appear to be modeled after legislation by the National Conference of Insurance Legislators appear in multiple states, among them: a bill related to limited lines travel insurance; another related to unclaimed insurance benefits; and one related to certificates of insurance.”

Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo


New paper by Michael J. Madison: “The knowledge commons research framework is applied to a case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis via the Internet. In the second place Galaxy Zoo is a highly successful example of peer production, some times known colloquially as crowdsourcing, by which data are gathered, supplied, and/or analyzed by very large numbers of anonymous and pseudonymous contributors to an enterprise that is centrally coordinated or managed. In the third place Galaxy Zoo is a highly visible example of data-intensive science, sometimes referred to as e-science or Big Data science, by which scientific researchers develop methods to grapple with the massive volumes of digital data now available to them via modern sensing and imaging technologies. This chapter synthesizes these three perspectives on Galaxy Zoo via the knowledge commons framework.”

Defining Open Data


Open Knowledge Foundation Blog: “Open data is data that can be freely used, shared and built-on by anyone, anywhere, for any purpose. This is the summary of the full Open Definition which the Open Knowledge Foundation created in 2005 to provide both a succinct explanation and a detailed definition of open data.
As the open data movement grows, and even more governments and organisations sign up to open data, it becomes ever more important that there is a clear and agreed definition for what “open data” means if we are to realise the full benefits of openness, and avoid the risks of creating incompatibility between projects and splintering the community.

Open can apply to information from any source and about any topic. Anyone can release their data under an open licence for free use by and benefit to the public. Although we may think mostly about government and public sector bodies releasing public information such as budgets or maps, or researchers sharing their results data and publications, any organisation can open information (corporations, universities, NGOs, startups, charities, community groups and individuals).

Read more about different kinds of data in our one page introduction to open data
There is open information in transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance …. So the explanation of what open means applies to all of these information sources and types. Open may also apply both to data – big data and small data – or to content, like images, text and music!
So here we set out clearly what open means, and why this agreed definition is vital for us to collaborate, share and scale as open data and open content grow and reach new communities.

What is Open?

The full Open Definition provides a precise definition of what open data is. There are 2 important elements to openness:

  • Legal openness: you must be allowed to get the data legally, to build on it, and to share it. Legal openness is usually provided by applying an appropriate (open) license which allows for free access to and reuse of the data, or by placing data into the public domain.
  • Technical openness: there should be no technical barriers to using that data. For example, providing data as printouts on paper (or as tables in PDF documents) makes the information extremely difficult to work with. So the Open Definition has various requirements for “technical openness,” such as requiring that data be machine readable and available in bulk.”…

Imagining Data Without Division


Thomas Lin in Quanta Magazine: “As science dives into an ocean of data, the demands of large-scale interdisciplinary collaborations are growing increasingly acute…Seven years ago, when David Schimel was asked to design an ambitious data project called the National Ecological Observatory Network, it was little more than a National Science Foundation grant. There was no formal organization, no employees, no detailed science plan. Emboldened by advances in remote sensing, data storage and computing power, NEON sought answers to the biggest question in ecology: How do global climate change, land use and biodiversity influence natural and managed ecosystems and the biosphere as a whole?…
For projects like NEON, interpreting the data is a complicated business. Early on, the team realized that its data, while mid-size compared with the largest physics and biology projects, would be big in complexity. “NEON’s contribution to big data is not in its volume,” said Steve Berukoff, the project’s assistant director for data products. “It’s in the heterogeneity and spatial and temporal distribution of data.”
Unlike the roughly 20 critical measurements in climate science or the vast but relatively structured data in particle physics, NEON will have more than 500 quantities to keep track of, from temperature, soil and water measurements to insect, bird, mammal and microbial samples to remote sensing and aerial imaging. Much of the data is highly unstructured and difficult to parse — for example, taxonomic names and behavioral observations, which are sometimes subject to debate and revision.
And, as daunting as the looming data crush appears from a technical perspective, some of the greatest challenges are wholly nontechnical. Many researchers say the big science projects and analytical tools of the future can succeed only with the right mix of science, statistics, computer science, pure mathematics and deft leadership. In the big data age of distributed computing — in which enormously complex tasks are divided across a network of computers — the question remains: How should distributed science be conducted across a network of researchers?
Part of the adjustment involves embracing “open science” practices, including open-source platforms and data analysis tools, data sharing and open access to scientific publications, said Chris Mattmann, 32, who helped develop a precursor to Hadoop, a popular open-source data analysis framework that is used by tech giants like Yahoo, Amazon and Apple and that NEON is exploring. Without developing shared tools to analyze big, messy data sets, Mattmann said, each new project or lab will squander precious time and resources reinventing the same tools. Likewise, sharing data and published results will obviate redundant research.
To this end, international representatives from the newly formed Research Data Alliance met this month in Washington to map out their plans for a global open data infrastructure.”

Undefined By Data: A Survey of Big Data Definitions


Paper by Jonathan Stuart Ward and Adam Barker: “The term big data has become ubiquitous. Owing to shared origin between academia, industry and the media there is no single unified definition, and various stakeholders provide diverse and often contradictory definitions. The lack of a consistent definition introduces ambiguity and hampers discourse relating to big data. This short paper attempts to collate the various definitions which have gained some degree of traction and to furnish a clear and concise definition of an otherwise ambiguous term…
Despite the range and differences existing within each of the aforementioned definitions there are some points of similarity. Notably all definitions make at least one of the following assertions:
Size: the volume of the datasets is a critical factor.
Complexity: the structure, behaviour and permutations of the datasets is a critical factor.
Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.
The definitions surveyed here all encompass at least one of these factors, most encompass two. An extrapolation of these factors would therefore postulate the following: Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.”

5 Ways Cities Are Using Big Data


Eric Larson in Mashable: “New York City released more than 200 high-value data sets to the public on Monday — a way, in part, to provide more content for open-sourced mapping projects like OpenStreetMap.
It’s one of the many releases since the Local Law 11 of 2012 passed in February, which calls for more transparency of the city government’s collected data.
But it’s not just New York: Cities across the world, large and small, are utilizing big data sets — like traffic statistics, energy consumption rates and GPS mapping — to launch projects to help their respective communities.
We rounded up a few of our favorites below….

1. Seattle’s Power Consumption

The city of Seattle recently partnered with Microsoft and Accenture on a pilot project to reduce the area’s energy usage. Using Microsoft’s Azure cloud, the project will collect and analyze hundreds of data sets collected from four downtown buildings’ management systems.
With predictive analytics, then, the system will work to find out what’s working and what’s not — i.e. where energy can be used less, or not at all. The goal is to reduce power usage by 25%.

2. SpotHero

Finding parking spots — especially in big cities — is undoubtably a headache.

SpotHero is an app, for both iOS and Android devices, that tracks down parking spots in a select number of cities. How it works: Users type in an address or neighborhood (say, Adams Morgan in Washington, D.C.) and are taken to a listing of available garages and lots nearby — complete with prices and time durations.
The app tracks availability in real-time, too, so a spot is updated in the system as soon as it’s snagged.
Seven cities are currently synced with the app: Washington, D.C., New York, Chicago, Baltimore, Boston, Milwaukee and Newark, N.J.

3. Adopt-a-Hydrant

Anyone who’s spent a winter in Boston will agree: it snows.

In January, the city’s Office of New Urban Mechanics released an app called Adopt-a-Hydrant. The program is mapped with every fire hydrant in the city proper — more than 13,000, according to a Harvard blog post — and lets residents pledge to shovel out one, or as many as they choose, in the almost inevitable event of a blizzard.
Once a pledge is made, volunteers receive a notification if their hydrant — or hydrants — become buried in snow.

4. Adopt-a-Sidewalk

Similar to Adopt-a-Hydrant, Chicago’s Adopt-a-Sidewalk app lets residents of the Windy City pledge to shovel sidewalks after snowfall. In a city just as notorious for snowstorms as Boston, it’s an effective way to ensure public spaces remain free of snow and ice — especially spaces belonging to the elderly or disabled.

If you’re unsure which part of town you’d like to “adopt,” just register on the website and browse the map — you’ll receive a pop-up notification for each street you swipe that’s still available.

5. Less Congestion for Lyon

Last year, researchers at IBM teamed up with the city of Lyon, France (about four hours south of Paris), to build a system that helps traffic operators reduce congestion on the road.

The system, called the “Decision Support System Optimizer (DSSO),” uses real-time traffic reports to detect and predict congestions. If an operator sees that a traffic jam is likely to occur, then, she/he can adjust traffic signals accordingly to keep the flow of cars moving smoothly.
It’s an especially helpful tool for emergencies — say, when an ambulance is en route to the hospital. Over time, the algorithms in the system will “learn” from its most successful recommendations, then apply that knowledge when making future predictions.”

Civics for a Digital Age


Jathan Sadowski  in the Atlantic on “Eleven principles for relating to cities that are automated and smart: Over half of the world’s population lives in urban environments, and that number is rapidly growing according to the World Health Organization. Many of us interact with the physical environments of cities on a daily basis: the arteries that move traffic, the grids that energize our lives, the buildings that prevent and direct actions. For many tech companies, though, much of this urban infrastructure is ripe for a digital injection. Cities have been “dumb” for millennia. It’s about time they get “smart” — or so the story goes….
Before accepting the techno-hype as a fait accompli, we should consider the implications such widespread technological changes might have on society, politics, and life in general. Urban scholar and historian Lewis Mumford warned of “megamachines” where people become mere components — like gears and transistors — in a hierarchical, human machine. The proliferation of smart projects requires an updated way of thinking about their possibilities, complications, and effects.
A new book, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, by Anthony Townsend, a research director at the Institute for the Future, provides some groundwork for understanding how these urban projects are occurring and what guiding principles we might use in directing their development. Townsend sets out to sketch a new understanding of “civics,” one that will account for new technologies.
The foundation for his theory speaks to common, worthwhile concerns: “Until now, smart-city visions have been controlling us. What we need is a new social code to bring meaning and to exert control over the technological code of urban operating systems.” It’s easy to feel like technologies — especially urban ones that are, at once, ubiquitous and often unseen to city-dwellers — have undue influence over our lives. Townsend’s civics, which is based on eleven principles, looks to address, prevent, and reverse that techno-power.”

From Crowd-Sourcing Potholes to Community Policing


New paper by Manik Suri (GovLab): “The tragic Boston Marathon bombing and hair-raising manhunt that ensued was a sobering event. It also served as a reminder that emerging “civic technologies” – platforms and applications that enable citizens to connect and collaborate with each other and with government – are more important today than ever before. As commentators have noted, local police and federal agents utilized a range of technological platforms to tap the “wisdom of the crowd,” relying on thousands of private citizens to develop a “hive mind” that identified two suspects within a record period of time.
In the immediate wake of the devastating attack on April 15th, investigators had few leads. But within twenty-four hours, senior FBI officials, determined to seek “assistance from the public,” called on everyone with information to submit all media, tips, and leads related to the Boston Marathon attack. This unusual request for help yielded thousands of images and videos from local Bostonians, tourists, and private companies through technological channels ranging from telephone calls and emails to Flickr posts and Twitter messages. In mere hours, investigators were able to “crowd-source” a tremendous amount of data – including thousands of images from personal cameras, amateur videos from smart phones, and cell-tower information from private carriers. Combing through data from this massive network of “eyes and ears,” law enforcement officials were quickly able to generate images of two lead suspects – enabling a “modern manhunt” to commence immediately.
Technological innovations have transformed our commercial, political, and social realities. These advances include new approaches to how we generate knowledge, access information, and interact with one another, as well as new pathways for building social movements and catalyzing political change. While a significant body of academic research has focused on the role of technology in transforming electoral politics and social movements, less attention has been paid to how technological innovation can improve the process of governance itself.
A growing number of platforms and applications lie at this intersection of technology and governance, in what might be termed the “civic technology” sector. Broadly speaking, this sector involves the application of new information and communication technologies – ranging from robust social media platforms to state-of-the-art big data analysis systems – to address public policy problems. Civic technologies encompass enterprises that “bring web technologies directly to government, build services on top of government data for citizens, and change the way citizens ask, get, or need services from government.” These technologies have the potential to transform governance by promoting greater transparency in policy-making, increasing government efficiency, and enhancing citizens’ participation in public sector decision-making.

Three Paradoxes of Big Data


New Paper by Neil M. Richards and Jonathan H. King in the Stanford Law Review Online:Big data is all the rage. Its proponents tout the use of sophisticated analytics to mine large data sets for insight as the solution to many of our society’s problems. These big data evangelists insist that data-driven decisionmaking can now give us better predictions in areas ranging from college admissions to dating to hiring to medicine to national security and crime prevention. But much of the rhetoric of big data contains no meaningful analysis of its potential perils, only the promise. We don’t deny that big data holds substantial potential for the future, and that large dataset analysis has important uses today. But we would like to sound a cautionary note and pause to consider big data’s potential more critically. In particular, we want to highlight three paradoxes in the current rhetoric about big data to help move us toward a more complete understanding of the big data picture. First, while big data pervasively collects all manner of private information, the operations of big data itself are almost entirely shrouded in legal and commercial secrecy. We call this the Transparency Paradox. Second, though big data evangelists talk in terms of miraculous outcomes, this rhetoric ignores the fact that big data seeks to identify at the expense of individual and collective identity. We call this the Identity Paradox. And third, the rhetoric of big data is characterized by its power to transform society, but big data has power effects of its own, which privilege large government and corporate entities at the expense of ordinary individuals. We call this the Power Paradox. Recognizing the paradoxes of big data, which show its perils alongside its potential, will help us to better understand this revolution. It may also allow us to craft solutions to produce a revolution that will be as good as its evangelists predict.”