Let's amplify California's collective intelligence


Gavin Newsom and Ken Goldberg at the SFGate: “Although the results of last week’s primary election are still being certified, we already know that voter turnout was among the lowest in California’s history. Pundits will rant about the “cynical electorate” and wag a finger at disengaged voters shirking their democratic duties, but we see the low turnout as a symptom of broader forces that affect how people and government interact.
The methods used to find out what citizens think and believe are limited to elections, opinion polls, surveys and focus groups. These methods may produce valuable information, but they are costly, infrequent and often conducted at the convenience of government or special interests.
We believe that new technology has the potential to increase public engagement by tapping the collective intelligence of Californians every day, not just on election day.
While most politicians already use e-mail and social media, these channels are easily dominated by extreme views and tend to regurgitate material from mass media outlets.
We’re exploring an alternative.
The California Report Card is a mobile-friendly web-based platform that streamlines and organizes public input for the benefit of policymakers and elected officials. The report card allows participants to assign letter grades to key issues and to suggest new ideas for consideration; public officials then can use that information to inform their decisions.
In an experimental version of the report card released earlier this year, residents from all 58 counties assigned more than 20,000 grades to the state of California and also suggested issues they feel deserve priority at the state level. As one participant noted: “This platform allows us to have our voices heard. The ability to review and grade what others suggest is important. It enables elected officials to hear directly how Californians feel.”
Initial data confirm that Californians approve of our state’s rollout of Obamacare, but are very concerned about the future of our schools and universities.
There was also a surprise. California Report Card suggestions for top state priorities revealed consistently strong interest and support for more attention to disaster preparedness. Issues related to this topic were graded as highly important by a broad cross section of participants across the state. In response, we’re testing new versions of the report card that can focus on topics related to wildfires and earthquakes.
The report card is part of an ongoing collaboration between the CITRIS Data and Democracy Initiative at UC Berkeley and the Office of the Lieutenant Governor to explore how technology can improve public communication and bring the government closer to the people. Our hunch is that engineering concepts can be adapted for public policy to rapidly identify real insights from constituents and resist gaming by special interests.
You don’t have to wait for the next election to have your voice heard by officials in Sacramento. The California Report Card is now accessible from cell phones, desktop and tablet computers. We encourage you to contribute your own ideas to amplify California’s collective intelligence. It’s easy, just click “participate” on this website: CaliforniaReportCard.org”

Crowdsourcing and social search


at Techcrunch: “When we think of the sharing economy, what often comes to mind are sites like Airbnb, Lyft, or Feastly — the platforms that allow us to meet people for a specific reason, whether that’s a place to stay, a ride, or a meal.
But what about sharing something much simpler than that, like answers to our questions about the world around us? Sharing knowledge with strangers can offer us insight into a place we are curious about or trying to navigate, and in a more personal, efficient way than using traditional web searches.
“Sharing an answer or response to question, that is true sharing. There’s no financial or monetary exchange based on that. It’s the true meaning of [the word],” said Maxime Leroy, co-founder and CEO of a new app called Enquire.
Enquire is a new question-and-answer app, but it is unlike others in the space. You don’t have to log in via Facebook or Twitter, use SMS messaging like on Quest, or upload an image like you do on Jelly. None of these apps have taken off yet, which could be good or bad for Enquire just entering the space.
With Enquire, simply log in with a username and password and it will unlock the neighborhood you are in (the app only works in San Francisco, New York, and Paris right now). There are lists of answers to other questions, or you can post your own. If 200 people in a city sign up, the app will become available to them, which is an effort to make sure there is a strong community to gather answers from.
Leroy, who recently made a documentary about the sharing economy, realized there was “one tool missing for local communities” in the space, and decided to create this app.
“We want to build a more local-based network, and empower and increase trust without having people share all their identity,” he said.
Different social channels look at search in different ways, but the trend is definitely moving to more social searching or location-based searching, according to according to Altimeter social media analyst Rebecca Lieb. Arguably, she said, Yelp, Groupon, and even Google Maps are vertical search engines. If you want to find a nearby restaurant, pharmacy, or deal, you look to these platforms.
However, she credits Aardvark as one of the first in the space, which was a social search engine founded in 2007 that used instant messaging and email to get answers from your existing contacts. Google bought the company in 2010. It shows the idea of crowdsourcing answers isn’t new, but the engines have become “appified,” she said.
“Now it’s geo-local specific,” she said. “We’re asking a lot more of those geo-local questions because of location-based immediacy [that we want].”
Think Seamless, with which you find the food nearby that most satisfies your appetite. Even Tinder and Grindr are social search engines, Lieb said. You want to meet up with the people that are closest to you, geographically….
His challenge is to offer rewards to incite people to sign up for the app. Eventually, Leroy would like to strengthen the networks and scale Enquire to cities and neighborhoods all over the world. Once that’s in place, people can start creating their own neighborhoods — around a school or workplace, where they hang out regularly — instead of using the existing constraints.
“I may be an expert in one area, and a newbie in another. I want to emphasize the activity and content from users to give them credit to other users and build that trust,” he said.
Usually, our first instinct is to open Yelp to find the best sushi restaurant or Google to search the closest concert venue, and it will probably stay that way for some time. But the idea that the opinions and insights of other human beings, even of strangers, is becoming much more valuable because of the internet is not far-fetched.
Admit it: haven’t you had a fleeting thought of starting a Kickstarter campaign for an idea? Looked for a cheaper place to stay on Airbnb than that hotel you normally book in New York? Or considered financing someone’s business idea across the world using Kiva? If so, then you’ve engaged in social search.
Suddenly, crowdsourcing answers for the things that pique your interest on your morning walk may not seem so strange after all.”

Why Statistically Significant Studies Aren’t Necessarily Significant


Michael White in PSMagazine on how modern statistics have made it easier than ever for us to fool ourselves: “Scientific results often defy common sense. Sometimes this is because science deals with phenomena that occur on scales we don’t experience directly, like evolution over billions of years or molecules that span billionths of meters. Even when it comes to things that happen on scales we’re familiar with, scientists often draw counter-intuitive conclusions from subtle patterns in the data. Because these patterns are not obvious, researchers rely on statistics to distinguish the signal from the noise. Without the aid of statistics, it would be difficult to convincingly show that smoking causes cancer, that drugged bees can still find their way home, that hurricanes with female names are deadlier than ones with male names, or that some people have a precognitive sense for porn.
OK, very few scientists accept the existence of precognition. But Cornell psychologist Daryl Bem’s widely reported porn precognition study illustrates the thorny relationship between science, statistics, and common sense. While many criticisms were leveled against Bem’s study, in the end it became clear that the study did not suffer from an obvious killer flaw. If it hadn’t dealt with the paranormal, it’s unlikely that Bem’s work would have drawn much criticism. As one psychologist put it after explaining how the study went wrong, “I think Bem’s actually been relatively careful. The thing to remember is that this type of fudging isn’t unusual; to the contrary, it’s rampant–everyone does it. And that’s because it’s very difficult, and often outright impossible, to avoid.”…
That you can lie with statistics is well known; what is less commonly noted is how much scientists still struggle to define proper statistical procedures for handling the noisy data we collect in the real world. In an exchange published last month in the Proceedings of the National Academy of Sciences, statisticians argued over how to address the problem of false positive results, statistically significant findings that on further investigation don’t hold up. Non-reproducible results in science are a growing concern; so do researchers need to change their approach to statistics?
Valen Johnson, at Texas A&M University, argued that the commonly used threshold for statistical significance isn’t as stringent as scientists think it is, and therefore researchers should adopt a tighter threshold to better filter out spurious results. In reply, statisticians Andrew Gelman and Christian Robert argued that tighter thresholds won’t solve the problem; they simply “dodge the essential nature of any such rule, which is that it expresses a tradeoff between the risks of publishing misleading results and of important results being left unpublished.” The acceptable level of statistical significance should vary with the nature of the study. Another team of statisticians raised a similar point, arguing that a more stringent significance threshold would exacerbate the worrying publishing bias against negative results. Ultimately, good statistical decision making “depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.”
However, arguments over statistics usually occur because it is not always obvious how to make good statistical decisions. Some bad decisions are clear. As xkcd’s Randall Munroe illustrated in his comic on the spurious link between green jelly beans and acne, most people understand that if you keep testing slightly different versions of a hypothesis on the same set of data, sooner or later you’re likely to get a statistically significant result just by chance. This kind of statistical malpractice is called fishing or p-hacking, and most scientists know how to avoid it.
But there are more subtle forms of the problem that pervade the scientific literature. In an unpublished paper (PDF), statisticians Andrew Gelman, at Columbia University, and Eric Loken, at Penn State, argue that researchers who deliberately avoid p-hacking still unknowingly engage in a similar practice. The problem is that one scientific hypothesis can be translated into many different statistical hypotheses, with many chances for a spuriously significant result. After looking at their data, researchers decide which statistical hypothesis to test, but that decision is skewed by the data itself.
To see how this might happen, imagine a study designed to test the idea that green jellybeans cause acne. There are many ways the results could come out statistically significant in favor of the researchers’ hypothesis. Green jellybeans could cause acne in men, but not in women, or in women but not men. The results may be statistically significant if the jellybeans you call “green” include Lemon Lime, Kiwi, and Margarita but not Sour Apple. Gelman and Loken write that “researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.” In the end, the researchers may explicitly test only one or a few statistical hypotheses, but their decision-making process has already biased them toward the hypotheses most likely to be supported by their data. The result is “a sort of machine for producing and publicizing random patterns.”
Gelman and Loken are not alone in their concern. Last year Daniele Fanelli, at the University of Edingburgh, and John Ioannidis, at Stanford University, reported that many U.S. studies, particularly in the social sciences, may overestimate the effect sizes of their results. “All scientists have to make choices throughout a research project, from formulating the question to submitting results for publication.” These choices can be swayed “consciously or unconsciously, by scientists’ own beliefs, expectations, and wishes, and the most basic scientific desire is that of producing an important research finding.”
What is the solution? Part of the answer is to not let measures of statistical significance override our common sense—not our naïve common sense, but our scientifically-informed common sense…”

Behavioural Sciences in Practice: Lessons for EU Policymakers


Chapter by Di Porto, Fabiana and Rangone, Nicoletta, for the book by Anne-Lise Sibony and Alberto Alemanno (eds), on Nudging and the Law. What can EU Learn from Behavioural Sciences? (Forthcoming): “This chapter establishes how the regulatory process should change in order to bring out and use evidence from cognitive sciences. It further discusses the impact of cognitive sciences on the regulatory toolkit, positing that, on the one hand, traditional tools should be rethought about; and, on the other, that the regulatory toolkit should be enriched by two more strategies: empowerment and nudging (where the first eases the overcoming of cognitive and behavioural limitations, while the second exploits them).

The Field Guide to Data Science


Booz Allen Hamilton: “Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Booz Allen Hamilton created The Field Guide to Data Science to help organizations of all types and missions understand how to make use of data as a resource. The text spells out what Data Science is and why it matters to organizations as well as how to create Data Science teams. Along the way, our team of experts provides field-tested approaches, personal tips and tricks, and real-life case studies. Senior leaders will walk away with a deeper understanding of the concepts at the heart of Data Science. Practitioners will add to their toolboxes.
In The Field Guide to Data Science, our Booz Allen experts provide their insights in the following areas:

  • Start Here for the Basics provides an introduction to Data Science, including what makes Data Science unique from other analysis approaches. We will help you understand Data Science maturity within an organization and how to create a robust Data Science capability.
  • Take Off the Training Wheels is the practitioners guide to Data Science. We share our established processes, including our approach to decomposing complex Data Science problems, the Fractal Analytic Model. We conclude with the Guide to Analytic Selection to help you select the right analytic techniques to conquer your toughest challenges.
  • Life in the Trenches gives a first hand account of life as a Data Scientist. We share insights on a variety of Data Science topics through illustrative case studies. We provide tips and tricks from our own experiences on these real-life analytic challenges.
  • Putting it All Together highlights our successes creating Data Science solutions for our clients. It follows several projects from data to insights and see the impact Data Science can have on your organization…”

The Promise of a New Internet


Adrienne Lafrance in the Atlantic:People tend to talk about the Internet the way they talk about democracy—optimistically, and in terms that describe how it ought to be rather than how it actually is.

This idealism is what buoys much of the network neutrality debate, and yet many of what are considered to be the core issues at stake—like payment for tiered access, for instance—have already been decided. For years, Internet advocates have been asking what regulatory measures might help save the open, innovation-friendly Internet.
But increasingly, another question comes up: What if there were a technical solution instead of a regulatory one? What if the core architecture of how people connect could make an end run on the centralization of services that has come to define the modern net?
It’s a question that reflects some of the Internet’s deepest cultural values, and the idea that this network—this place where you are right now—should distribute power to people. In the post-NSA, post-Internet-access-oligopoly world, more and more people are thinking this way, and many of them are actually doing something about it.
Among them, there is a technology that’s become a kind of shorthand code for a whole set of beliefs about the future of the Internet: “mesh networking.” These words have become a way to say that you believe in a different, freer Internet.
*  *  *
Mesh networks promise the things we already expect but don’t always get from the Internet: they’re fast, reliable, and relatively inexpensive. But before we get into the particulars of what this alternate Internet might look like, a quick refresher on how the one we have works:
Your computer is connected to an Internet service provider like Comcast, which sends packets of your data (the binary stuff of emails, tweets, Facebook status updates, web addresses, etc.) back and forth across the network. The packets that move across the Internet encounter a series of checkpoints including routers and servers along the paths your data travels. You can’t control these paths or these checkpoints, so your data is subject to all kinds of security threats like hackers and snooping NSA agents.
So the idea behind mesh networking is to skip those checkpoints and cut out the middleman service provider whenever possible. This can work when each device in a network connects to the other devices, rather than each device connecting to the ISP.
It helps to visualize it. The image on the left shows a network built around a centralized hub, like the Internet as we know it. The image on the right is what a mesh network looks like:

Think of it this way: With a mesh network, each device is like a mini cell phone tower. So instead of having multiple devices rely on a single, centralized hub; multiple devices rely on one another. And with information ricocheting across the network more unpredictably between those devices, the network as a whole is harder to take out.
“You end up with a network that is much harder to disrupt,” said Stanislav Shalunov, co-founder of Open Garden, a startup that develops peer-to-peer and mesh networking apps. “There is no single point where you can unplug and expect that there will be a large impact.”
Plus, a mesh network forms itself based on an algorithm—which again reduces opportunities for disruption. “There is no human intervention involved, even from the users of the devices and certainly not from any administrative entity that needs to arrange the topology of this network or how people are connected or how the network is used,” Shalunov told me. “It is entirely up to the people participating and the software that runs this network to make everything work.”

Your regular old smartphone already has the power to connect to other smartphones without being hooked up to the Internet through a traditional carrier. All you need is the radio frequency of your phone’s bluetooth connection, and you can send and receive data over a mesh network from anyone in relatively close proximity—say, a person in the same neighborhood or office building. (Mesh networks can also be built around cheap wireless routers or roof antennae.)…
For now, there’s no nationwide device-to-device mesh network. So if you want to communicate with someone across the country,  someone—but not everyone—in the mesh network will need to be connected to the Internet through a traditional provider. That’s true locally, too, if you want the mesh network hooked up to the rest of the Internet. Mesh networks are more reliable in a crowd because devices can rely on one another—rather than each device trying to ping the same overburdened cell phone tower. “The important thing is we can use any of the Internet connections that anybody in that mesh network is connected to,” Shalunov said. “So maybe you are connected to AT&T and I am connected to Comcast and my phone is on Verizon and there is a Sprint subscriber nearby. If any of these will let the traffic through, all of it will get through.”
* * *
Mesh networks have been around, at least theoretically, for at least as long as the Internet has existed…”

Cluster mapping


“The U.S. Cluster Mapping Project is a national economic initiative that provides open, interactive data to understand regional clusters and support business, innovation and policy in the United States. It is based at the Institute for Strategy and Competitiveness at Harvard Business School, with support from a number of partners and a federal grant from the U.S. Department of Commerce’s Economic Development Administration.
Research
The project provides a robust cluster mapping database grounded in the leading academic research. Professor Michael Porter pioneered the comprehensive mapping of clusters in the U.S. economy in the early 2000s. The research team from Harvard, MIT, and Temple used the latest Census and industry data to develop a new algorithm to define cluster categories that cover the entire U.S. economy. These categories enable comparative analyses of clusters across any region in the United States….
Impact
Research on the presence of regional clusters has recently oriented economic policy toward addressing the needs of clusters and mobilizing their potential. Four regional partners in Massachusetts, Minnesota, Oregon, and South Carolina produced a set of case studies that discuss how regions have organized economic policy around clusters. These cases form the core of a resource library that aims to disseminate insights and strengthen the community of practice in cluster-based economic development. The project will also take an international scope to benefit cross-border industries in North America and inform collective global dialogue around cluster-based economic development.”

Can social media make every civil servant an innovator?


Steve Kelman at FCW: “Innovation, particularly in government, can be very hard. Lots of signoffs, lots of naysayers. For many, it’s probably not worth the hassle.
Yet all sorts of examples are surfacing about ways civil servants, non-profits, startups and researchers have thought to use social media — or data mining of government information — to get information that can either help citizens directly or help agencies serve citizens. I want to call attention to examples that I’ve seen just in the past few weeks — partly to recognize the creative people who have come up with these ideas, but partly to make a point about the relationship between these ideas and the general issue of innovation in government. I think that these social media and data-driven experiments are often a much simpler way for civil servants to innovate than many of the changes we typically think of under the heading “innovation in government.” They open the possibility to make innovation in government an activity for the civil service masses.
One example that was reported in The New York Times was about a pilot project at the New York City Department of Health and Mental Hygiene to do rapid keyword searches with phrases such as “vomit” and “diarrhea” associated with 294,000 Yelp restaurant reviews in New York City. The city is using a software program developed at Columbia University. They have now expended the monitoring to occur daily, to get quick information on possible problems at specific restaurants or with specific kinds of food.
A second example, reported in BloombergBusinessWeek, involved — perhaps not surprisingly, given the publication — an Israeli startup called Treato that is applying a similar idea to ferretting out adverse drug reactions before they come in through FDA studies and other systems. The founders are cooperating with researchers at Harvard Medical School and FDA officials, among others. Their software looks through Twitter and Facebook, along with a large number of patient forum sites, to cull out from all the reports of illnesses the incidents that may well reflect an unusual presence of adverse drug reactions.
These examples are fascinating in themselves. But one thing that caught my eye about both is that each seems high on the creativity dimension and low on the need-to-overcome-bureaucracy dimension. Both ideas reflect new and improved ways to do what these organizations do anyway, which is gather information to help inform regulatory and health decisions by government. Neither requires any upheaval in an agency’s existing culture, or steps on somebody’s turf in any serious way. Introducing the changes doesn’t require major changes in an agency’s internal procedures. Compared to many innovations in government, these are easy ones to make happen. (They do all need some funds, however.)
What I hope is that the information woven into social media will unlock a new era of innovation inside government. The limits of innovation are much less determined by difficult-to-change bureaucratic processes and can be much more responsive to an individual civil servant’s creativity…”

How NYC Open Data and Reddit Saved New Yorkers Over $55,000 a Year


IQuantNY: “NYC generates an enormous amount of data each year, and for the most part, it stays behind closed doors.  But thanks to the Open Data movement, signed into law by Bloomberg in 2012 and championed over the last several years by Borough President Gale Brewer, along with other council members, we now get to see a small slice of what the city knows. And that slice is growing.
There have been some detractors along the way; a senior attorney for the NYPD said in 2012 during a council hearing that releasing NYPD data in csv format was a problem because they were “concerned with the integrity of the data itself” and because “data could be manipulated by people who want ‘to make a point’ of some sort”.  But our democracy is built on the idea of free speech; we let all the information out and then let reason lead the way.
In some ways, Open Data adds another check and balance into government: its citizens.  I’ve watched the perfect example of this check work itself out over the past month.  You may have caught my post that used parking ticket data to identify the fire hydrant in New York City that was generating the most income for the city in the form of fines: $33,000 a year.  And on the next block, the second most profitable hydrant was generating $24,000 a year.  That’s two consecutive blocks with hydrants generating over $55,000 a year. But there was a problem.  In my post, I laid out why these two parking spots were extremely confusing and basically seemed like a trap; there was a wide “curb extension” between the street and the hydrant, making it appear like the hydrant was not by the street.  Additionally, the DOT had painted parking spots right where you would be fined if you parked.
Once the data was out there, the hydrant took on a life of its own.  First, it raised to the top of the nyc sub-reddit.  That is basically one way that the internet voted that this is in-fact “interesting”.  And that is how things go from small to big. From there, it travelled to the New York Observer, which was able to get a comment from the DOT. After that, it appeared in the New York Post, the post was republished in Gothamist and finally it even went global in the Daily Mail.
I guess the pressure was on the DOT at this point, as each media source reached out for comment, but what struck me was their response to the Observer:

“While DOT has not received any complaints about this location, we will review the roadway markings and make any appropriate alterations”

Why does someone have to complain in order for the DOT to see problems like this?  In fact, the DOT just redesigned every parking sign in New York because some of the old ones were considered confusing.  But if this hydrant was news to them, it implies that they did not utilize the very strongest source of measuring confusion on our streets: NYC parking tickets….”

How to Make Government Data Sites Better


Flowing Data: “Accessing government data from the source is frustrating. If you’ve done it, or at least tried to, you know the pain that is oddly formatted files, search that doesn’t work, and annotation that tells you nothing about the data in front of you.
The most frustrating part of the process is knowing how useful the data could be if only it were shared more simply. Unfortunately, ease-of-use is rarely the case, and we spend more time formatting and inspecting the data than we do actually putting it to use. Shouldn’t it be the other way around?
It’s this painstaking process that draws so much ire. It’s hard not to complain.
Maybe the people in charged of these sites just don’t know what’s going on. Or maybe they’re so overwhelmed by suck that they don’t know where to start. Or they’re unknowingly infected by the that-is-how-we’ve-always-done-it bug.
Whatever it may be, I need to think out loud about how to improve these sites. Empty complaints don’t help.
I use the Centers for Disease Control and Prevention as the test subject, but most of the things covered should easily generalize to other government sites (and non-government ones too). And I choose CDC not because they’re the worst but because they publish a lot of data that is of immediate and direct use to the general public.
I approach this from the point of view of someone who uses government data, beyond pulling a single data point from a spreadsheet. I’m also going to put on my Captain Obvious hat, because what seems obvious to some is apparently a black box to others.
Provide a useable data format
Sometimes it feels like government data is available in every format except the one that data users want. The worst one was when I downloaded a 2gb file, and upon unzipping it, I discovered it was a EXE file.
Data in PDF format is a kick in the face for people looking for CSV files. There might be ways to get the data out from PDFs, but it’s still a pain when you have more than a handful of files….
Useable data format is the most important, and if there’s just one thing you change, make it this.
(Raw data is fine too)
It’s rare to find raw government data, so it’s like striking gold when it actually happens. I realize you run into issues with data privacy, quality, missing data, etc. For these data sources, I appreciate the estimates with standard errors. However, the less aggregated (the more raw) you can provide, the better.
CSV for that too, please.
Never mind the fancy sharing tools
Not all government data is wedged into PDF files, and some of it is accessible via export tools that let you subset and layout your data exactly how you want it. The problem is that in an effort to please everyone, you end up with a tool shown on the left….
Tell people where to get the data
Get the things above done, and your government data site is exponentially better than it was before, but let’s keep going.
The navigation process to get to a dataset is incredibly convoluted, which makes it hard to find data and difficult to return to it….
Show visual previews
I’m all for visualization integrated with the data search tools. It always sucks when I spend time formatting data only to find that it wasn’t worth my time. Census Reporter is a fine example of how this might work.
That said, visual tools plus an upgrade to the previously mentioned things is a big undertaking, especially if you’re going to do it right. So I’m perfectly fine if you skip this step to focus your resources on data that’s easier to use and download. Leave the visualizing and analysis to us.
Decide what’s important, archive the rest
So much cruft. So many old documents. Broken links. Create an archive and highlight what people come to your site for.
Wrapping up
There’s plenty more stuff to update, especially once you start to work with the details, but this should be a good place to start. It’s a lot easier to point out what you can do to improve government data sharing than it is to actually do it of course. There are so many people, policies, and oh yes, politics, that it can be hard to change.”