In Defense of Transit Apps


Mark Headd at Civic Innovations: “The civic technology community has a love-hate relationship with transit apps.
We love to, and often do, use the example of open transit data and the cottage industry of civic app development it has helped spawn as justification for governments releasing open data. Some of the earliest, most enduring and most successful civic applications have been built on transit data and there literally hundreds of different apps available.
The General Transit Feed Specification (GTFS), which has helped to encourage the release of transit data from dozens and dozens of transportation authorities across the country, is used as the model for the development of other open data standards. I once described work being done to develop a data standard for locations dispensing vaccinations as “GTFS for flu shots.”
bracken-tweet
But some in the civic technology community chafe at the overuse of transit apps as the example cited for the release of open data and engagement with outside civic hackers. Surely there are other examples we can point to that get at deeper, more fundamental problems with civic engagement and the operation of government. Is the best articulation of the benefits of open data and civic hacking a simple bus stop application?
Last week at Transparency Camp in DC, during a session I ran on open data, I was asked what data governments should focus on releasing as open data. I stated my belief that – at a minimum – governments should concentrate on The 3 B’s: Buses (transit data), Bullets (crime data) and Bucks (budget & expenditure data).
To be clear – transit data and the apps it helps generate are critical to the open data and civic technology movements. I think it is vital to exploring the role that transit apps have played in the development of the civic technology ecosystem and their impact on open data.

Story telling with transit data

Transit data supports more than just “next bus” apps. In fact, characterizing all transit apps this way does a disservice to the talented and creative people working to build things with transit data. Transit data supports a wide range of different visualizations that can tell an intimate, granular story about how a transit system works and how it’s operation impacts a city.
One inspiring example of this kind of app was developed recently by Mike Barry and Brian Card, and looked at the operation of MBTA in Boston. Their motive was simple:

We attempt to present this information to help people in Boston better understand the trains, how people use the trains, and how the people and trains interact with each other.

We’re able to tell nuanced stories about transit systems because the quality of data being released continues to expand and improve in quality. This happens because developers building apps in cities across the country have provided feedback to transit officials on what they want to see and the quality of what is provided.
Developers building the powerful visualizations we see today are standing on the shoulders of the people that built the “next bus” apps a few years ago. Without these humble apps, we don’t get to tell these powerful stories today.

Holding government accountable

Transit apps are about more than just getting to the train on time.
Support for transit system operations can run into the billions of dollars and affect the lives of millions of people in an urban area. With this much investment, it’s important that transit riders and taxpayers are able to hold officials accountable for the efficient operation of transit systems. To help us do this, we now have a new generation of transit apps that can examine things like the scheduled arrival and departure times of trains with their actual arrival and departure time.
Not only does this give citizens transparency into how well their transit system is being run, it offers a pathway for engagement – by knowing which routes are not performing close to scheduled times, transit riders and others can offer suggestions for changes and improvements.

A gateway to more open data

One of the most important things that transit apps can do is provide a pathway for more open data.
In Philadelphia, the city’s formal open data policy and the creation of an open data portal all followed after the efforts of a small group of developers working to obtain transit schedule data from the Southeastern Pennsylvania Transportation Authority (SEPTA). This group eventually built the region’s first transit app.
This small group pushed SEPTA to make their data open, and the Authority eventually embraced open data. This, in turn, raised the profile of open data with other city leaders and directly contributed to the adoption of an open data policy by the City of Philadelphia several years later. Without this simple transit app and the push for more open transit data, I don’t think this would have happened. Certainly not as soon as it did.
And it isn’t just big cities like Philadelphia. In Syracuse, NY – a small city with no tradition of civic hacking and no formal open data program – a group at a local hackathon decided that they wanted to build a platform for government open data.
The first data source they selected to focus on? Transit data. The first app they built? A transit app…”

The Art and Science of Data-driven Journalism


Alex Howard for the Tow Center for digital journalism: Journalists have been using data in their stories for as long as the profession has existed. A revolution in computing in the 20th century created opportunities for data integration into investigations, as journalists began to bring technology into their work. In the 21st century, a revolution in connectivity is leading the media toward new horizons. The Internet, cloud computing, agile development, mobile devices, and open source software have transformed the practice of journalism, leading to the emergence of a new term: data journalism. Although journalists have been using data in their stories for as long as they have been engaged in reporting, data journalism is more than traditional journalism with more data. Decades after early pioneers successfully applied computer-assisted reporting and social science to investigative journalism, journalists are creating news apps and interactive features that help people understand data, explore it, and act upon the insights derived from it. New business models are emerging in which data is a raw material for profit, impact, and insight, co-created with an audience that was formerly reduced to passive consumption. Journalists around the world are grappling with the excitement and the challenge of telling compelling stories by harnessing the vast quantity of data that our increasingly networked lives, devices, businesses, and governments produce every day. While the potential of data journalism is immense, the pitfalls and challenges to its adoption throughout the media are similarly significant, from digital literacy to competition for scarce resources in newsrooms. Global threats to press freedom, digital security, and limited access to data create difficult working conditions for journalists in many countries. A combination of peer-to-peer learning, mentorship, online training, open data initiatives, and new programs at journalism schools rising to the challenge, however, offer reasons to be optimistic about more journalists learning to treat data as a source. (Download the report)”

Reflections on How Designers Design With Data


Alex Bigelow, Steven Drucker, Danyel Fisher, and Miriah Meyer at Microsoft Research: “In recent years many popular data visualizations have emerged that are created largely by designers whose main area of expertise is not computer science. Designers generate these visualizations using a handful of design tools and environments. To better inform the development of tools intended for designers working with data, we set out to understand designers’ challenges and perspectives. We interviewed professional designers, conducted observations of designers working with data in the lab, and observed designers working with data in team settings in the wild. A set of patterns emerged from these observations from which we extract a number of themes that provide a new perspective on design considerations for visualization tool creators, as well as on known engineering problems.”

Let's amplify California's collective intelligence


Gavin Newsom and Ken Goldberg at the SFGate: “Although the results of last week’s primary election are still being certified, we already know that voter turnout was among the lowest in California’s history. Pundits will rant about the “cynical electorate” and wag a finger at disengaged voters shirking their democratic duties, but we see the low turnout as a symptom of broader forces that affect how people and government interact.
The methods used to find out what citizens think and believe are limited to elections, opinion polls, surveys and focus groups. These methods may produce valuable information, but they are costly, infrequent and often conducted at the convenience of government or special interests.
We believe that new technology has the potential to increase public engagement by tapping the collective intelligence of Californians every day, not just on election day.
While most politicians already use e-mail and social media, these channels are easily dominated by extreme views and tend to regurgitate material from mass media outlets.
We’re exploring an alternative.
The California Report Card is a mobile-friendly web-based platform that streamlines and organizes public input for the benefit of policymakers and elected officials. The report card allows participants to assign letter grades to key issues and to suggest new ideas for consideration; public officials then can use that information to inform their decisions.
In an experimental version of the report card released earlier this year, residents from all 58 counties assigned more than 20,000 grades to the state of California and also suggested issues they feel deserve priority at the state level. As one participant noted: “This platform allows us to have our voices heard. The ability to review and grade what others suggest is important. It enables elected officials to hear directly how Californians feel.”
Initial data confirm that Californians approve of our state’s rollout of Obamacare, but are very concerned about the future of our schools and universities.
There was also a surprise. California Report Card suggestions for top state priorities revealed consistently strong interest and support for more attention to disaster preparedness. Issues related to this topic were graded as highly important by a broad cross section of participants across the state. In response, we’re testing new versions of the report card that can focus on topics related to wildfires and earthquakes.
The report card is part of an ongoing collaboration between the CITRIS Data and Democracy Initiative at UC Berkeley and the Office of the Lieutenant Governor to explore how technology can improve public communication and bring the government closer to the people. Our hunch is that engineering concepts can be adapted for public policy to rapidly identify real insights from constituents and resist gaming by special interests.
You don’t have to wait for the next election to have your voice heard by officials in Sacramento. The California Report Card is now accessible from cell phones, desktop and tablet computers. We encourage you to contribute your own ideas to amplify California’s collective intelligence. It’s easy, just click “participate” on this website: CaliforniaReportCard.org”

Crowdsourcing and social search


at Techcrunch: “When we think of the sharing economy, what often comes to mind are sites like Airbnb, Lyft, or Feastly — the platforms that allow us to meet people for a specific reason, whether that’s a place to stay, a ride, or a meal.
But what about sharing something much simpler than that, like answers to our questions about the world around us? Sharing knowledge with strangers can offer us insight into a place we are curious about or trying to navigate, and in a more personal, efficient way than using traditional web searches.
“Sharing an answer or response to question, that is true sharing. There’s no financial or monetary exchange based on that. It’s the true meaning of [the word],” said Maxime Leroy, co-founder and CEO of a new app called Enquire.
Enquire is a new question-and-answer app, but it is unlike others in the space. You don’t have to log in via Facebook or Twitter, use SMS messaging like on Quest, or upload an image like you do on Jelly. None of these apps have taken off yet, which could be good or bad for Enquire just entering the space.
With Enquire, simply log in with a username and password and it will unlock the neighborhood you are in (the app only works in San Francisco, New York, and Paris right now). There are lists of answers to other questions, or you can post your own. If 200 people in a city sign up, the app will become available to them, which is an effort to make sure there is a strong community to gather answers from.
Leroy, who recently made a documentary about the sharing economy, realized there was “one tool missing for local communities” in the space, and decided to create this app.
“We want to build a more local-based network, and empower and increase trust without having people share all their identity,” he said.
Different social channels look at search in different ways, but the trend is definitely moving to more social searching or location-based searching, according to according to Altimeter social media analyst Rebecca Lieb. Arguably, she said, Yelp, Groupon, and even Google Maps are vertical search engines. If you want to find a nearby restaurant, pharmacy, or deal, you look to these platforms.
However, she credits Aardvark as one of the first in the space, which was a social search engine founded in 2007 that used instant messaging and email to get answers from your existing contacts. Google bought the company in 2010. It shows the idea of crowdsourcing answers isn’t new, but the engines have become “appified,” she said.
“Now it’s geo-local specific,” she said. “We’re asking a lot more of those geo-local questions because of location-based immediacy [that we want].”
Think Seamless, with which you find the food nearby that most satisfies your appetite. Even Tinder and Grindr are social search engines, Lieb said. You want to meet up with the people that are closest to you, geographically….
His challenge is to offer rewards to incite people to sign up for the app. Eventually, Leroy would like to strengthen the networks and scale Enquire to cities and neighborhoods all over the world. Once that’s in place, people can start creating their own neighborhoods — around a school or workplace, where they hang out regularly — instead of using the existing constraints.
“I may be an expert in one area, and a newbie in another. I want to emphasize the activity and content from users to give them credit to other users and build that trust,” he said.
Usually, our first instinct is to open Yelp to find the best sushi restaurant or Google to search the closest concert venue, and it will probably stay that way for some time. But the idea that the opinions and insights of other human beings, even of strangers, is becoming much more valuable because of the internet is not far-fetched.
Admit it: haven’t you had a fleeting thought of starting a Kickstarter campaign for an idea? Looked for a cheaper place to stay on Airbnb than that hotel you normally book in New York? Or considered financing someone’s business idea across the world using Kiva? If so, then you’ve engaged in social search.
Suddenly, crowdsourcing answers for the things that pique your interest on your morning walk may not seem so strange after all.”

Why Statistically Significant Studies Aren’t Necessarily Significant


Michael White in PSMagazine on how modern statistics have made it easier than ever for us to fool ourselves: “Scientific results often defy common sense. Sometimes this is because science deals with phenomena that occur on scales we don’t experience directly, like evolution over billions of years or molecules that span billionths of meters. Even when it comes to things that happen on scales we’re familiar with, scientists often draw counter-intuitive conclusions from subtle patterns in the data. Because these patterns are not obvious, researchers rely on statistics to distinguish the signal from the noise. Without the aid of statistics, it would be difficult to convincingly show that smoking causes cancer, that drugged bees can still find their way home, that hurricanes with female names are deadlier than ones with male names, or that some people have a precognitive sense for porn.
OK, very few scientists accept the existence of precognition. But Cornell psychologist Daryl Bem’s widely reported porn precognition study illustrates the thorny relationship between science, statistics, and common sense. While many criticisms were leveled against Bem’s study, in the end it became clear that the study did not suffer from an obvious killer flaw. If it hadn’t dealt with the paranormal, it’s unlikely that Bem’s work would have drawn much criticism. As one psychologist put it after explaining how the study went wrong, “I think Bem’s actually been relatively careful. The thing to remember is that this type of fudging isn’t unusual; to the contrary, it’s rampant–everyone does it. And that’s because it’s very difficult, and often outright impossible, to avoid.”…
That you can lie with statistics is well known; what is less commonly noted is how much scientists still struggle to define proper statistical procedures for handling the noisy data we collect in the real world. In an exchange published last month in the Proceedings of the National Academy of Sciences, statisticians argued over how to address the problem of false positive results, statistically significant findings that on further investigation don’t hold up. Non-reproducible results in science are a growing concern; so do researchers need to change their approach to statistics?
Valen Johnson, at Texas A&M University, argued that the commonly used threshold for statistical significance isn’t as stringent as scientists think it is, and therefore researchers should adopt a tighter threshold to better filter out spurious results. In reply, statisticians Andrew Gelman and Christian Robert argued that tighter thresholds won’t solve the problem; they simply “dodge the essential nature of any such rule, which is that it expresses a tradeoff between the risks of publishing misleading results and of important results being left unpublished.” The acceptable level of statistical significance should vary with the nature of the study. Another team of statisticians raised a similar point, arguing that a more stringent significance threshold would exacerbate the worrying publishing bias against negative results. Ultimately, good statistical decision making “depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.”
However, arguments over statistics usually occur because it is not always obvious how to make good statistical decisions. Some bad decisions are clear. As xkcd’s Randall Munroe illustrated in his comic on the spurious link between green jelly beans and acne, most people understand that if you keep testing slightly different versions of a hypothesis on the same set of data, sooner or later you’re likely to get a statistically significant result just by chance. This kind of statistical malpractice is called fishing or p-hacking, and most scientists know how to avoid it.
But there are more subtle forms of the problem that pervade the scientific literature. In an unpublished paper (PDF), statisticians Andrew Gelman, at Columbia University, and Eric Loken, at Penn State, argue that researchers who deliberately avoid p-hacking still unknowingly engage in a similar practice. The problem is that one scientific hypothesis can be translated into many different statistical hypotheses, with many chances for a spuriously significant result. After looking at their data, researchers decide which statistical hypothesis to test, but that decision is skewed by the data itself.
To see how this might happen, imagine a study designed to test the idea that green jellybeans cause acne. There are many ways the results could come out statistically significant in favor of the researchers’ hypothesis. Green jellybeans could cause acne in men, but not in women, or in women but not men. The results may be statistically significant if the jellybeans you call “green” include Lemon Lime, Kiwi, and Margarita but not Sour Apple. Gelman and Loken write that “researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.” In the end, the researchers may explicitly test only one or a few statistical hypotheses, but their decision-making process has already biased them toward the hypotheses most likely to be supported by their data. The result is “a sort of machine for producing and publicizing random patterns.”
Gelman and Loken are not alone in their concern. Last year Daniele Fanelli, at the University of Edingburgh, and John Ioannidis, at Stanford University, reported that many U.S. studies, particularly in the social sciences, may overestimate the effect sizes of their results. “All scientists have to make choices throughout a research project, from formulating the question to submitting results for publication.” These choices can be swayed “consciously or unconsciously, by scientists’ own beliefs, expectations, and wishes, and the most basic scientific desire is that of producing an important research finding.”
What is the solution? Part of the answer is to not let measures of statistical significance override our common sense—not our naïve common sense, but our scientifically-informed common sense…”

Behavioural Sciences in Practice: Lessons for EU Policymakers


Chapter by Di Porto, Fabiana and Rangone, Nicoletta, for the book by Anne-Lise Sibony and Alberto Alemanno (eds), on Nudging and the Law. What can EU Learn from Behavioural Sciences? (Forthcoming): “This chapter establishes how the regulatory process should change in order to bring out and use evidence from cognitive sciences. It further discusses the impact of cognitive sciences on the regulatory toolkit, positing that, on the one hand, traditional tools should be rethought about; and, on the other, that the regulatory toolkit should be enriched by two more strategies: empowerment and nudging (where the first eases the overcoming of cognitive and behavioural limitations, while the second exploits them).

The Field Guide to Data Science


Booz Allen Hamilton: “Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Booz Allen Hamilton created The Field Guide to Data Science to help organizations of all types and missions understand how to make use of data as a resource. The text spells out what Data Science is and why it matters to organizations as well as how to create Data Science teams. Along the way, our team of experts provides field-tested approaches, personal tips and tricks, and real-life case studies. Senior leaders will walk away with a deeper understanding of the concepts at the heart of Data Science. Practitioners will add to their toolboxes.
In The Field Guide to Data Science, our Booz Allen experts provide their insights in the following areas:

  • Start Here for the Basics provides an introduction to Data Science, including what makes Data Science unique from other analysis approaches. We will help you understand Data Science maturity within an organization and how to create a robust Data Science capability.
  • Take Off the Training Wheels is the practitioners guide to Data Science. We share our established processes, including our approach to decomposing complex Data Science problems, the Fractal Analytic Model. We conclude with the Guide to Analytic Selection to help you select the right analytic techniques to conquer your toughest challenges.
  • Life in the Trenches gives a first hand account of life as a Data Scientist. We share insights on a variety of Data Science topics through illustrative case studies. We provide tips and tricks from our own experiences on these real-life analytic challenges.
  • Putting it All Together highlights our successes creating Data Science solutions for our clients. It follows several projects from data to insights and see the impact Data Science can have on your organization…”

The Promise of a New Internet


Adrienne Lafrance in the Atlantic:People tend to talk about the Internet the way they talk about democracy—optimistically, and in terms that describe how it ought to be rather than how it actually is.

This idealism is what buoys much of the network neutrality debate, and yet many of what are considered to be the core issues at stake—like payment for tiered access, for instance—have already been decided. For years, Internet advocates have been asking what regulatory measures might help save the open, innovation-friendly Internet.
But increasingly, another question comes up: What if there were a technical solution instead of a regulatory one? What if the core architecture of how people connect could make an end run on the centralization of services that has come to define the modern net?
It’s a question that reflects some of the Internet’s deepest cultural values, and the idea that this network—this place where you are right now—should distribute power to people. In the post-NSA, post-Internet-access-oligopoly world, more and more people are thinking this way, and many of them are actually doing something about it.
Among them, there is a technology that’s become a kind of shorthand code for a whole set of beliefs about the future of the Internet: “mesh networking.” These words have become a way to say that you believe in a different, freer Internet.
*  *  *
Mesh networks promise the things we already expect but don’t always get from the Internet: they’re fast, reliable, and relatively inexpensive. But before we get into the particulars of what this alternate Internet might look like, a quick refresher on how the one we have works:
Your computer is connected to an Internet service provider like Comcast, which sends packets of your data (the binary stuff of emails, tweets, Facebook status updates, web addresses, etc.) back and forth across the network. The packets that move across the Internet encounter a series of checkpoints including routers and servers along the paths your data travels. You can’t control these paths or these checkpoints, so your data is subject to all kinds of security threats like hackers and snooping NSA agents.
So the idea behind mesh networking is to skip those checkpoints and cut out the middleman service provider whenever possible. This can work when each device in a network connects to the other devices, rather than each device connecting to the ISP.
It helps to visualize it. The image on the left shows a network built around a centralized hub, like the Internet as we know it. The image on the right is what a mesh network looks like:

Think of it this way: With a mesh network, each device is like a mini cell phone tower. So instead of having multiple devices rely on a single, centralized hub; multiple devices rely on one another. And with information ricocheting across the network more unpredictably between those devices, the network as a whole is harder to take out.
“You end up with a network that is much harder to disrupt,” said Stanislav Shalunov, co-founder of Open Garden, a startup that develops peer-to-peer and mesh networking apps. “There is no single point where you can unplug and expect that there will be a large impact.”
Plus, a mesh network forms itself based on an algorithm—which again reduces opportunities for disruption. “There is no human intervention involved, even from the users of the devices and certainly not from any administrative entity that needs to arrange the topology of this network or how people are connected or how the network is used,” Shalunov told me. “It is entirely up to the people participating and the software that runs this network to make everything work.”

Your regular old smartphone already has the power to connect to other smartphones without being hooked up to the Internet through a traditional carrier. All you need is the radio frequency of your phone’s bluetooth connection, and you can send and receive data over a mesh network from anyone in relatively close proximity—say, a person in the same neighborhood or office building. (Mesh networks can also be built around cheap wireless routers or roof antennae.)…
For now, there’s no nationwide device-to-device mesh network. So if you want to communicate with someone across the country,  someone—but not everyone—in the mesh network will need to be connected to the Internet through a traditional provider. That’s true locally, too, if you want the mesh network hooked up to the rest of the Internet. Mesh networks are more reliable in a crowd because devices can rely on one another—rather than each device trying to ping the same overburdened cell phone tower. “The important thing is we can use any of the Internet connections that anybody in that mesh network is connected to,” Shalunov said. “So maybe you are connected to AT&T and I am connected to Comcast and my phone is on Verizon and there is a Sprint subscriber nearby. If any of these will let the traffic through, all of it will get through.”
* * *
Mesh networks have been around, at least theoretically, for at least as long as the Internet has existed…”

Cluster mapping


“The U.S. Cluster Mapping Project is a national economic initiative that provides open, interactive data to understand regional clusters and support business, innovation and policy in the United States. It is based at the Institute for Strategy and Competitiveness at Harvard Business School, with support from a number of partners and a federal grant from the U.S. Department of Commerce’s Economic Development Administration.
Research
The project provides a robust cluster mapping database grounded in the leading academic research. Professor Michael Porter pioneered the comprehensive mapping of clusters in the U.S. economy in the early 2000s. The research team from Harvard, MIT, and Temple used the latest Census and industry data to develop a new algorithm to define cluster categories that cover the entire U.S. economy. These categories enable comparative analyses of clusters across any region in the United States….
Impact
Research on the presence of regional clusters has recently oriented economic policy toward addressing the needs of clusters and mobilizing their potential. Four regional partners in Massachusetts, Minnesota, Oregon, and South Carolina produced a set of case studies that discuss how regions have organized economic policy around clusters. These cases form the core of a resource library that aims to disseminate insights and strengthen the community of practice in cluster-based economic development. The project will also take an international scope to benefit cross-border industries in North America and inform collective global dialogue around cluster-based economic development.”