Michael White in PSMagazine on how modern statistics have made it easier than ever for us to fool ourselves: “Scientific results often defy common sense. Sometimes this is because science deals with phenomena that occur on scales we don’t experience directly, like evolution over billions of years or molecules that span billionths of meters. Even when it comes to things that happen on scales we’re familiar with, scientists often draw counter-intuitive conclusions from subtle patterns in the data. Because these patterns are not obvious, researchers rely on statistics to distinguish the signal from the noise. Without the aid of statistics, it would be difficult to convincingly show that smoking causes cancer, that drugged bees can still find their way home, that hurricanes with female names are deadlier than ones with male names, or that some people have a precognitive sense for porn.
OK, very few scientists accept the existence of precognition. But Cornell psychologist Daryl Bem’s widely reported porn precognition study illustrates the thorny relationship between science, statistics, and common sense. While many criticisms were leveled against Bem’s study, in the end it became clear that the study did not suffer from an obvious killer flaw. If it hadn’t dealt with the paranormal, it’s unlikely that Bem’s work would have drawn much criticism. As one psychologist put it after explaining how the study went wrong, “I think Bem’s actually been relatively careful. The thing to remember is that this type of fudging isn’t unusual; to the contrary, it’s rampant–everyone does it. And that’s because it’s very difficult, and often outright impossible, to avoid.”…
That you can lie with statistics is well known; what is less commonly noted is how much scientists still struggle to define proper statistical procedures for handling the noisy data we collect in the real world. In an exchange published last month in the Proceedings of the National Academy of Sciences, statisticians argued over how to address the problem of false positive results, statistically significant findings that on further investigation don’t hold up. Non-reproducible results in science are a growing concern; so do researchers need to change their approach to statistics?
Valen Johnson, at Texas A&M University, argued that the commonly used threshold for statistical significance isn’t as stringent as scientists think it is, and therefore researchers should adopt a tighter threshold to better filter out spurious results. In reply, statisticians Andrew Gelman and Christian Robert argued that tighter thresholds won’t solve the problem; they simply “dodge the essential nature of any such rule, which is that it expresses a tradeoff between the risks of publishing misleading results and of important results being left unpublished.” The acceptable level of statistical significance should vary with the nature of the study. Another team of statisticians raised a similar point, arguing that a more stringent significance threshold would exacerbate the worrying publishing bias against negative results. Ultimately, good statistical decision making “depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.”
However, arguments over statistics usually occur because it is not always obvious how to make good statistical decisions. Some bad decisions are clear. As xkcd’s Randall Munroe illustrated in his comic on the spurious link between green jelly beans and acne, most people understand that if you keep testing slightly different versions of a hypothesis on the same set of data, sooner or later you’re likely to get a statistically significant result just by chance. This kind of statistical malpractice is called fishing or p-hacking, and most scientists know how to avoid it.
But there are more subtle forms of the problem that pervade the scientific literature. In an unpublished paper (PDF), statisticians Andrew Gelman, at Columbia University, and Eric Loken, at Penn State, argue that researchers who deliberately avoid p-hacking still unknowingly engage in a similar practice. The problem is that one scientific hypothesis can be translated into many different statistical hypotheses, with many chances for a spuriously significant result. After looking at their data, researchers decide which statistical hypothesis to test, but that decision is skewed by the data itself.
To see how this might happen, imagine a study designed to test the idea that green jellybeans cause acne. There are many ways the results could come out statistically significant in favor of the researchers’ hypothesis. Green jellybeans could cause acne in men, but not in women, or in women but not men. The results may be statistically significant if the jellybeans you call “green” include Lemon Lime, Kiwi, and Margarita but not Sour Apple. Gelman and Loken write that “researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.” In the end, the researchers may explicitly test only one or a few statistical hypotheses, but their decision-making process has already biased them toward the hypotheses most likely to be supported by their data. The result is “a sort of machine for producing and publicizing random patterns.”
Gelman and Loken are not alone in their concern. Last year Daniele Fanelli, at the University of Edingburgh, and John Ioannidis, at Stanford University, reported that many U.S. studies, particularly in the social sciences, may overestimate the effect sizes of their results. “All scientists have to make choices throughout a research project, from formulating the question to submitting results for publication.” These choices can be swayed “consciously or unconsciously, by scientists’ own beliefs, expectations, and wishes, and the most basic scientific desire is that of producing an important research finding.”
What is the solution? Part of the answer is to not let measures of statistical significance override our common sense—not our naïve common sense, but our scientifically-informed common sense…”
Selected Readings on Crowdsourcing Tasks and Peer Production
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of crowdsourcing was originally published in 2014.
Technological advances are creating a new paradigm by which institutions and organizations are increasingly outsourcing tasks to an open community, allocating specific needs to a flexible, willing and dispersed workforce. “Microtasking” platforms like Amazon’s Mechanical Turk are a burgeoning source of income for individuals who contribute their time, skills and knowledge on a per-task basis. In parallel, citizen science projects – task-based initiatives in which citizens of any background can help contribute to scientific research – like Galaxy Zoo are demonstrating the ability of lay and expert citizens alike to make small, useful contributions to aid large, complex undertakings. As governing institutions seek to do more with less, looking to the success of citizen science and microtasking initiatives could provide a blueprint for engaging citizens to help accomplish difficult, time-consuming objectives at little cost. Moreover, the incredible success of peer-production projects – best exemplified by Wikipedia – instills optimism regarding the public’s willingness and ability to complete relatively small tasks that feed into a greater whole and benefit the public good. You can learn more about this new wave of “collective intelligence” by following the MIT Center for Collective Intelligence and their annual Collective Intelligence Conference.
Selected Reading List (in alphabetical order)
- Yochai Benkler — The Wealth of Networks: How Social Production Transforms Markets and Freedom — a book on the ways commons-based peer-production is transforming modern society.
- Daren C. Brabham — Using Crowdsourcing in Government — a report describing the diversity of methods crowdsourcing could be greater utilized by governments, including through the leveraging of micro-tasking platforms.
- Kevin J. Boudreau, Patrick Gaule, Karim Lakhani, Christoph Reidl, Anita Williams Woolley – From Crowds to Collaborators: Initiating Effort & Catalyzing Interactions Among Online Creative Workers – a working paper exploring the conditions,
- including incentives, that affect online collaboration.
- Chiara Franzoni and Henry Sauermann — Crowd Science: The Organization of Scientific Research in Open Collaborative Projects — a paper describing the potential advantages of deploying crowd science in a variety of contexts.
- Aniket Kittur, Ed H. Chi and Bongwon Suh — Crowdsourcing User Studies with Mechanical Turk — a paper proposing potential benefits beyond simple task completion for microtasking platforms like Mechanical Turk.
- Aniket Kittur, Jeffrey V. Nickerson, Michael S. Bernstein, Elizabeth M. Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John J. Horton — The Future of Crowd Work — a paper describing the promise of increased and evolved crowd work’s effects on the global economy.
- Michael J. Madison — Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo — an in-depth case study of the Galaxy Zoo containing insights regarding the importance of clear objectives and institutional and/or professional collaboration in citizen science initiatives.
- Thomas W. Malone, Robert Laubacher and Chrysanthos Dellarocas – Harnessing Crowds: Mapping the Genome of Collective Intelligence – an article proposing a framework for understanding collective intelligence efforts.
- Geoff Mulgan – True Collective Intelligence? A Sketch of a Possible New Field – a paper proposing theoretical building blocks and an experimental and research agenda around the field of collective intelligence.
- Henry Sauermann and Chiara Franzoni – Participation Dynamics in Crowd-Based Knowledge Production: The Scope and Sustainability of Interest-Based Motivation – a paper exploring the role of interest-based motivation in collaborative knowledge production.
- Catherine E. Schmitt-Sands and Richard J. Smith – Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk – an article describing an experiment using Mechanical Turk to crowdsource public policy research microtasks.
- Clay Shirky — Here Comes Everybody: The Power of Organizing Without Organizations — a book exploring the ways largely unstructured collaboration is remaking practically all sectors of modern life.
- Jonathan Silvertown — A New Dawn for Citizen Science — a paper examining the diverse factors influencing the emerging paradigm of “science by the people.”
- Katarzyna Szkuta, Roberto Pizzicannella, David Osimo – Collaborative approaches to public sector innovation: A scoping study – an article studying success factors and incentives around the collaborative delivery of online public services.
Annotated Selected Reading List (in alphabetical order)
Benkler, Yochai. The Wealth of Networks: How Social Production Transforms Markets and Freedom. Yale University Press, 2006. http://bit.ly/1aaU7Yb.
- In this book, Benkler “describes how patterns of information, knowledge, and cultural production are changing – and shows that the way information and knowledge are made available can either limit or enlarge the ways people can create and express themselves.”
- In his discussion on Wikipedia – one of many paradigmatic examples of people collaborating without financial reward – he calls attention to the notable ongoing cooperation taking place among a diversity of individuals. He argues that, “The important point is that Wikipedia requires not only mechanical cooperation among people, but a commitment to a particular style of writing and describing concepts that is far from intuitive or natural to people. It requires self-discipline. It enforces the behavior it requires primarily through appeal to the common enterprise that the participants are engaged in…”
Brabham, Daren C. Using Crowdsourcing in Government. Collaborating Across Boundaries Series. IBM Center for The Business of Government, 2013. http://bit.ly/17gzBTA.
- In this report, Brabham categorizes government crowdsourcing cases into a “four-part, problem-based typology, encouraging government leaders and public administrators to consider these open problem-solving techniques as a way to engage the public and tackle difficult policy and administrative tasks more effectively and efficiently using online communities.”
- The proposed four-part typology describes the following types of crowdsourcing in government:
- Knowledge Discovery and Management
- Distributed Human Intelligence Tasking
- Broadcast Search
- Peer-Vetted Creative Production
- In his discussion on Distributed Human Intelligence Tasking, Brabham argues that Amazon’s Mechanical Turk and other microtasking platforms could be useful in a number of governance scenarios, including:
- Governments and scholars transcribing historical document scans
- Public health departments translating health campaign materials into foreign languages to benefit constituents who do not speak the native language
- Governments translating tax documents, school enrollment and immunization brochures, and other important materials into minority languages
- Helping governments predict citizens’ behavior, “such as for predicting their use of public transit or other services or for predicting behaviors that could inform public health practitioners and environmental policy makers”
Boudreau, Kevin J., Patrick Gaule, Karim Lakhani, Christoph Reidl, Anita Williams Woolley. “From Crowds to Collaborators: Initiating Effort & Catalyzing Interactions Among Online Creative Workers.” Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 14-060. January 23, 2014. https://bit.ly/2QVmGUu.
- In this working paper, the authors explore the “conditions necessary for eliciting effort from those affecting the quality of interdependent teamwork” and “consider the the role of incentives versus social processes in catalyzing collaboration.”
- The paper’s findings are based on an experiment involving 260 individuals randomly assigned to 52 teams working toward solutions to a complex problem.
- The authors determined the level of effort in such collaborative undertakings are sensitive to cash incentives. However, collaboration among teams was driven more by the active participation of teammates, rather than any monetary reward.
Franzoni, Chiara, and Henry Sauermann. “Crowd Science: The Organization of Scientific Research in Open Collaborative Projects.” Research Policy (August 14, 2013). http://bit.ly/HihFyj.
- In this paper, the authors explore the concept of crowd science, which they define based on two important features: “participation in a project is open to a wide base of potential contributors, and intermediate inputs such as data or problem solving algorithms are made openly available.” The rationale for their study and conceptual framework is the “growing attention from the scientific community, but also policy makers, funding agencies and managers who seek to evaluate its potential benefits and challenges. Based on the experiences of early crowd science projects, the opportunities are considerable.”
- Based on the study of a number of crowd science projects – including governance-related initiatives like Patients Like Me – the authors identify a number of potential benefits in the following categories:
- Knowledge-related benefits
- Benefits from open participation
- Benefits from the open disclosure of intermediate inputs
- Motivational benefits
- The authors also identify a number of challenges:
- Organizational challenges
- Matching projects and people
- Division of labor and integration of contributions
- Project leadership
- Motivational challenges
- Sustaining contributor involvement
- Supporting a broader set of motivations
- Reconciling conflicting motivations
Kittur, Aniket, Ed H. Chi, and Bongwon Suh. “Crowdsourcing User Studies with Mechanical Turk.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 453–456. CHI ’08. New York, NY, USA: ACM, 2008. http://bit.ly/1a3Op48.
- In this paper, the authors examine “[m]icro-task markets, such as Amazon’s Mechanical Turk, [which] offer a potential paradigm for engaging a large number of users for low time and monetary costs. [They] investigate the utility of a micro-task market for collecting user measurements, and discuss design considerations for developing remote micro user evaluation tasks.”
- The authors conclude that in addition to providing a means for crowdsourcing small, clearly defined, often non-skill-intensive tasks, “Micro-task markets such as Amazon’s Mechanical Turk are promising platforms for conducting a variety of user study tasks, ranging from surveys to rapid prototyping to quantitative measures. Hundreds of users can be recruited for highly interactive tasks for marginal costs within a timeframe of days or even minutes. However, special care must be taken in the design of the task, especially for user measurements that are subjective or qualitative.”
Kittur, Aniket, Jeffrey V. Nickerson, Michael S. Bernstein, Elizabeth M. Gerber, Aaron Shaw, John Zimmerman, Matthew Lease, and John J. Horton. “The Future of Crowd Work.” In 16th ACM Conference on Computer Supported Cooperative Work (CSCW 2013), 2012. http://bit.ly/1c1GJD3.
- In this paper, the authors discuss paid crowd work, which “offers remarkable opportunities for improving productivity, social mobility, and the global economy by engaging a geographically distributed workforce to complete complex tasks on demand and at scale.” However, they caution that, “it is also possible that crowd work will fail to achieve its potential, focusing on assembly-line piecework.”
- The authors argue that seven key challenges must be met to ensure that crowd work processes evolve and reach their full potential:
- Designing workflows
- Assigning tasks
- Supporting hierarchical structure
- Enabling real-time crowd work
- Supporting synchronous collaboration
- Controlling quality
Madison, Michael J. “Commons at the Intersection of Peer Production, Citizen Science, and Big Data: Galaxy Zoo.” In Convening Cultural Commons, 2013. http://bit.ly/1ih9Xzm.
- This paper explores a “case of commons governance grounded in research in modern astronomy. The case, Galaxy Zoo, is a leading example of at least three different contemporary phenomena. In the first place, Galaxy Zoo is a global citizen science project, in which volunteer non-scientists have been recruited to participate in large-scale data analysis on the Internet. In the second place, Galaxy Zoo is a highly successful example of peer production, some times known as crowdsourcing…In the third place, is a highly visible example of data-intensive science, sometimes referred to as e-science or Big Data science, by which scientific researchers develop methods to grapple with the massive volumes of digital data now available to them via modern sensing and imaging technologies.”
- Madison concludes that the success of Galaxy Zoo has not been the result of the “character of its information resources (scientific data) and rules regarding their usage,” but rather, the fact that the “community was guided from the outset by a vision of a specific organizational solution to a specific research problem in astronomy, initiated and governed, over time, by professional astronomers in collaboration with their expanding universe of volunteers.”
Malone, Thomas W., Robert Laubacher and Chrysanthos Dellarocas. “Harnessing Crowds: Mapping the Genome of Collective Intelligence.” MIT Sloan Research Paper. February 3, 2009. https://bit.ly/2SPjxTP.
- In this article, the authors describe and map the phenomenon of collective intelligence – also referred to as “radical decentralization, crowd-sourcing, wisdom of crowds, peer production, and wikinomics – which they broadly define as “groups of individuals doing things collectively that seem intelligent.”
- The article is derived from the authors’ work at MIT’s Center for Collective Intelligence, where they gathered nearly 250 examples of Web-enabled collective intelligence. To map the building blocks or “genes” of collective intelligence, the authors used two pairs of related questions:
- Who is performing the task? Why are they doing it?
- What is being accomplished? How is it being done?
- The authors concede that much work remains to be done “to identify all the different genes for collective intelligence, the conditions under which these genes are useful, and the constraints governing how they can be combined,” but they believe that their framework provides a useful start and gives managers and other institutional decisionmakers looking to take advantage of collective intelligence activities the ability to “systematically consider many possible combinations of answers to questions about Who, Why, What, and How.”
Mulgan, Geoff. “True Collective Intelligence? A Sketch of a Possible New Field.” Philosophy & Technology 27, no. 1. March 2014. http://bit.ly/1p3YSdd.
- In this paper, Mulgan explores the concept of a collective intelligence, a “much talked about but…very underdeveloped” field.
- With a particular focus on health knowledge, Mulgan “sets out some of the potential theoretical building blocks, suggests an experimental and research agenda, shows how it could be analysed within an organisation or business sector and points to possible intellectual barriers to progress.”
- He concludes that the “central message that comes from observing real intelligence is that intelligence has to be for something,” and that “turning this simple insight – the stuff of so many science fiction stories – into new theories, new technologies and new applications looks set to be one of the most exciting prospects of the next few years and may help give shape to a new discipline that helps us to be collectively intelligent about our own collective intelligence.”
Sauermann, Henry and Chiara Franzoni. “Participation Dynamics in Crowd-Based Knowledge Production: The Scope and Sustainability of Interest-Based Motivation.” SSRN Working Papers Series. November 28, 2013. http://bit.ly/1o6YB7f.
- In this paper, Sauremann and Franzoni explore the issue of interest-based motivation in crowd-based knowledge production – in particular the use of the crowd science platform Zooniverse – by drawing on “research in psychology to discuss important static and dynamic features of interest and deriv[ing] a number of research questions.”
- The authors find that interest-based motivation is often tied to a “particular object (e.g., task, project, topic)” not based on a “general trait of the person or a general characteristic of the object.” As such, they find that “most members of the installed base of users on the platform do not sign up for multiple projects, and most of those who try out a project do not return.”
- They conclude that “interest can be a powerful motivator of individuals’ contributions to crowd-based knowledge production…However, both the scope and sustainability of this interest appear to be rather limited for the large majority of contributors…At the same time, some individuals show a strong and more enduring interest to participate both within and across projects, and these contributors are ultimately responsible for much of what crowd science projects are able to accomplish.”
Schmitt-Sands, Catherine E. and Richard J. Smith. “Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk.” SSRN Working Papers Series. January 9, 2014. http://bit.ly/1ugaYja.
- In this paper, the authors describe an experiment involving the nascent use of Amazon’s Mechanical Turk as a social science research tool. “While researchers have used crowdsourcing to find research subjects or classify texts, [they] used Mechanical Turk to conduct a policy scan of local government websites.”
- Schmitt-Sands and Smith found that “crowdsourcing worked well for conducting an online policy program and scan.” The microtasked workers were helpful in screening out local governments that either did not have websites or did not have the types of policies and services for which the researchers were looking. However, “if the task is complicated such that it requires ongoing supervision, then crowdsourcing is not the best solution.”
Shirky, Clay. Here Comes Everybody: The Power of Organizing Without Organizations. New York: Penguin Press, 2008. https://bit.ly/2QysNif.
- In this book, Shirky explores our current era in which, “For the first time in history, the tools for cooperating on a global scale are not solely in the hands of governments or institutions. The spread of the Internet and mobile phones are changing how people come together and get things done.”
- Discussing Wikipedia’s “spontaneous division of labor,” Shirky argues that the process is like, “the process is more like creating a coral reef, the sum of millions of individual actions, than creating a car. And the key to creating those individual actions is to hand as much freedom as possible to the average user.”
Silvertown, Jonathan. “A New Dawn for Citizen Science.” Trends in Ecology & Evolution 24, no. 9 (September 2009): 467–471. http://bit.ly/1iha6CR.
- This article discusses the move from “Science for the people,” a slogan adopted by activists in the 1970s to “’Science by the people,’ which is “a more inclusive aim, and is becoming a distinctly 21st century phenomenon.”
- Silvertown identifies three factors that are responsible for the explosion of activity in citizen science, each of which could be similarly related to the crowdsourcing of skills by governing institutions:
- “First is the existence of easily available technical tools for disseminating information about products and gathering data from the public.
- A second factor driving the growth of citizen science is the increasing realisation among professional scientists that the public represent a free source of labour, skills, computational power and even finance.
- Third, citizen science is likely to benefit from the condition that research funders such as the National Science Foundation in the USA and the Natural Environment Research Council in the UK now impose upon every grantholder to undertake project-related science outreach. This is outreach as a form of public accountability.”
Szkuta, Katarzyna, Roberto Pizzicannella, David Osimo. “Collaborative approaches to public sector innovation: A scoping study.” Telecommunications Policy. 2014. http://bit.ly/1oBg9GY.
- In this article, the authors explore cases where government collaboratively delivers online public services, with a focus on success factors and “incentives for services providers, citizens as users and public administration.”
- The authors focus on six types of collaborative governance projects:
- Services initiated by government built on government data;
- Services initiated by government and making use of citizens’ data;
- Services initiated by civil society built on open government data;
- Collaborative e-government services; and
- Services run by civil society and based on citizen data.
- The cases explored “are all designed in the way that effectively harnesses the citizens’ potential. Services susceptible to collaboration are those that require computing efforts, i.e. many non-complicated tasks (e.g. citizen science projects – Zooniverse) or citizens’ free time in general (e.g. time banks). Those services also profit from unique citizens’ skills and their propensity to share their competencies.”
The Promise of a New Internet
Adrienne Lafrance in the Atlantic: “People tend to talk about the Internet the way they talk about democracy—optimistically, and in terms that describe how it ought to be rather than how it actually is.
But increasingly, another question comes up: What if there were a technical solution instead of a regulatory one? What if the core architecture of how people connect could make an end run on the centralization of services that has come to define the modern net?
It’s a question that reflects some of the Internet’s deepest cultural values, and the idea that this network—this place where you are right now—should distribute power to people. In the post-NSA, post-Internet-access-oligopoly world, more and more people are thinking this way, and many of them are actually doing something about it.
Among them, there is a technology that’s become a kind of shorthand code for a whole set of beliefs about the future of the Internet: “mesh networking.” These words have become a way to say that you believe in a different, freer Internet.
* * *
Mesh networks promise the things we already expect but don’t always get from the Internet: they’re fast, reliable, and relatively inexpensive. But before we get into the particulars of what this alternate Internet might look like, a quick refresher on how the one we have works:
Your computer is connected to an Internet service provider like Comcast, which sends packets of your data (the binary stuff of emails, tweets, Facebook status updates, web addresses, etc.) back and forth across the network. The packets that move across the Internet encounter a series of checkpoints including routers and servers along the paths your data travels. You can’t control these paths or these checkpoints, so your data is subject to all kinds of security threats like hackers and snooping NSA agents.
So the idea behind mesh networking is to skip those checkpoints and cut out the middleman service provider whenever possible. This can work when each device in a network connects to the other devices, rather than each device connecting to the ISP.
It helps to visualize it. The image on the left shows a network built around a centralized hub, like the Internet as we know it. The image on the right is what a mesh network looks like:
Think of it this way: With a mesh network, each device is like a mini cell phone tower. So instead of having multiple devices rely on a single, centralized hub; multiple devices rely on one another. And with information ricocheting across the network more unpredictably between those devices, the network as a whole is harder to take out.
“You end up with a network that is much harder to disrupt,” said Stanislav Shalunov, co-founder of Open Garden, a startup that develops peer-to-peer and mesh networking apps. “There is no single point where you can unplug and expect that there will be a large impact.”
Plus, a mesh network forms itself based on an algorithm—which again reduces opportunities for disruption. “There is no human intervention involved, even from the users of the devices and certainly not from any administrative entity that needs to arrange the topology of this network or how people are connected or how the network is used,” Shalunov told me. “It is entirely up to the people participating and the software that runs this network to make everything work.”
Your regular old smartphone already has the power to connect to other smartphones without being hooked up to the Internet through a traditional carrier. All you need is the radio frequency of your phone’s bluetooth connection, and you can send and receive data over a mesh network from anyone in relatively close proximity—say, a person in the same neighborhood or office building. (Mesh networks can also be built around cheap wireless routers or roof antennae.)…
For now, there’s no nationwide device-to-device mesh network. So if you want to communicate with someone across the country, someone—but not everyone—in the mesh network will need to be connected to the Internet through a traditional provider. That’s true locally, too, if you want the mesh network hooked up to the rest of the Internet. Mesh networks are more reliable in a crowd because devices can rely on one another—rather than each device trying to ping the same overburdened cell phone tower. “The important thing is we can use any of the Internet connections that anybody in that mesh network is connected to,” Shalunov said. “So maybe you are connected to AT&T and I am connected to Comcast and my phone is on Verizon and there is a Sprint subscriber nearby. If any of these will let the traffic through, all of it will get through.”
* * *
Mesh networks have been around, at least theoretically, for at least as long as the Internet has existed…”
How NYC Open Data and Reddit Saved New Yorkers Over $55,000 a Year
IQuantNY: “NYC generates an enormous amount of data each year, and for the most part, it stays behind closed doors. But thanks to the Open Data movement, signed into law by Bloomberg in 2012 and championed over the last several years by Borough President Gale Brewer, along with other council members, we now get to see a small slice of what the city knows. And that slice is growing.
There have been some detractors along the way; a senior attorney for the NYPD said in 2012 during a council hearing that releasing NYPD data in csv format was a problem because they were “concerned with the integrity of the data itself” and because “data could be manipulated by people who want ‘to make a point’ of some sort”. But our democracy is built on the idea of free speech; we let all the information out and then let reason lead the way.
In some ways, Open Data adds another check and balance into government: its citizens. I’ve watched the perfect example of this check work itself out over the past month. You may have caught my post that used parking ticket data to identify the fire hydrant in New York City that was generating the most income for the city in the form of fines: $33,000 a year. And on the next block, the second most profitable hydrant was generating $24,000 a year. That’s two consecutive blocks with hydrants generating over $55,000 a year. But there was a problem. In my post, I laid out why these two parking spots were extremely confusing and basically seemed like a trap; there was a wide “curb extension” between the street and the hydrant, making it appear like the hydrant was not by the street. Additionally, the DOT had painted parking spots right where you would be fined if you parked.
Once the data was out there, the hydrant took on a life of its own. First, it raised to the top of the nyc sub-reddit. That is basically one way that the internet voted that this is in-fact “interesting”. And that is how things go from small to big. From there, it travelled to the New York Observer, which was able to get a comment from the DOT. After that, it appeared in the New York Post, the post was republished in Gothamist and finally it even went global in the Daily Mail.
I guess the pressure was on the DOT at this point, as each media source reached out for comment, but what struck me was their response to the Observer:
“While DOT has not received any complaints about this location, we will review the roadway markings and make any appropriate alterations”
Why does someone have to complain in order for the DOT to see problems like this? In fact, the DOT just redesigned every parking sign in New York because some of the old ones were considered confusing. But if this hydrant was news to them, it implies that they did not utilize the very strongest source of measuring confusion on our streets: NYC parking tickets….”
How to Make Government Data Sites Better
Flowing Data: “Accessing government data from the source is frustrating. If you’ve done it, or at least tried to, you know the pain that is oddly formatted files, search that doesn’t work, and annotation that tells you nothing about the data in front of you.
The most frustrating part of the process is knowing how useful the data could be if only it were shared more simply. Unfortunately, ease-of-use is rarely the case, and we spend more time formatting and inspecting the data than we do actually putting it to use. Shouldn’t it be the other way around?
It’s this painstaking process that draws so much ire. It’s hard not to complain.
Maybe the people in charged of these sites just don’t know what’s going on. Or maybe they’re so overwhelmed by suck that they don’t know where to start. Or they’re unknowingly infected by the that-is-how-we’ve-always-done-it bug.
Whatever it may be, I need to think out loud about how to improve these sites. Empty complaints don’t help.
I use the Centers for Disease Control and Prevention as the test subject, but most of the things covered should easily generalize to other government sites (and non-government ones too). And I choose CDC not because they’re the worst but because they publish a lot of data that is of immediate and direct use to the general public.
I approach this from the point of view of someone who uses government data, beyond pulling a single data point from a spreadsheet. I’m also going to put on my Captain Obvious hat, because what seems obvious to some is apparently a black box to others.
Provide a useable data format
Sometimes it feels like government data is available in every format except the one that data users want. The worst one was when I downloaded a 2gb file, and upon unzipping it, I discovered it was a EXE file.
Data in PDF format is a kick in the face for people looking for CSV files. There might be ways to get the data out from PDFs, but it’s still a pain when you have more than a handful of files….
Useable data format is the most important, and if there’s just one thing you change, make it this.
(Raw data is fine too)
It’s rare to find raw government data, so it’s like striking gold when it actually happens. I realize you run into issues with data privacy, quality, missing data, etc. For these data sources, I appreciate the estimates with standard errors. However, the less aggregated (the more raw) you can provide, the better.
CSV for that too, please.
Never mind the fancy sharing tools
Not all government data is wedged into PDF files, and some of it is accessible via export tools that let you subset and layout your data exactly how you want it. The problem is that in an effort to please everyone, you end up with a tool shown on the left….
Tell people where to get the data
Get the things above done, and your government data site is exponentially better than it was before, but let’s keep going.
The navigation process to get to a dataset is incredibly convoluted, which makes it hard to find data and difficult to return to it….
Show visual previews
I’m all for visualization integrated with the data search tools. It always sucks when I spend time formatting data only to find that it wasn’t worth my time. Census Reporter is a fine example of how this might work.
That said, visual tools plus an upgrade to the previously mentioned things is a big undertaking, especially if you’re going to do it right. So I’m perfectly fine if you skip this step to focus your resources on data that’s easier to use and download. Leave the visualizing and analysis to us.
Decide what’s important, archive the rest
So much cruft. So many old documents. Broken links. Create an archive and highlight what people come to your site for.
Wrapping up
There’s plenty more stuff to update, especially once you start to work with the details, but this should be a good place to start. It’s a lot easier to point out what you can do to improve government data sharing than it is to actually do it of course. There are so many people, policies, and oh yes, politics, that it can be hard to change.”
The Emerging Science of Computational Anthropology
Emerging Technology From the arXiv: The increasing availability of big data from mobile phones and location-based apps has triggered a revolution in the understanding of human mobility patterns. This data shows the ebb and flow of the daily commute in and out of cities, the pattern of travel around the world and even how disease can spread through cities via their transport systems.
So there is considerable interest in looking more closely at human mobility patterns to see just how well it can be predicted and how these predictions might be used in everything from disease control and city planning to traffic forecasting and location-based advertising.
Today we get an insight into the kind of detailed that is possible thanks to the work of Zimo Yang at Microsoft research in Beijing and a few pals. These guys start with the hypothesis that people who live in a city have a pattern of mobility that is significantly different from those who are merely visiting. By dividing travelers into locals and non-locals, their ability to predict where people are likely to visit dramatically improves.
Zimo and co begin with data from a Chinese location-based social network called Jiepang.com. This is similar to Foursquare in the US. It allows users to record the places they visit and to connect with friends at these locations and to find others with similar interests.
The data points are known as check-ins and the team downloaded more than 1.3 million of them from five big cities in China: Beijing, Shanghai, Nanjing, Chengdu and Hong Kong. They then used 90 per cent of the data to train their algorithms and the remaining 10 per cent to test it. The Jiapang data includes the users’ hometowns so it’s easy to see whether an individual is checking in in their own city or somewhere else.
The question that Zimo and co want to answer is the following: given a particular user and their current location, where are they most likely to visit in the near future? In practice, that means analysing the user’s data, such as their hometown and the locations recently visited, and coming up with a list of other locations that they are likely to visit based on the type of people who visited these locations in the past.
Zimo and co used their training dataset to learn the mobility pattern of locals and non-locals and the popularity of the locations they visited. The team then applied this to the test dataset to see whether their algorithm was able to predict where locals and non-locals were likely to visit.
They found that their best results came from analysing the pattern of behaviour of a particular individual and estimating the extent to which this person behaves like a local. That produced a weighting called the indigenization coefficient that the researchers could then use to determine the mobility patterns this person was likely to follow in future.
In fact, Zimo and co say they can spot non-locals in this way without even knowing their home location. “Because non-natives tend to visit popular locations, like the Imperial Palace in Beijing and the Bund in Shanghai, while natives usually check in around their homes and workplaces,” they add.
The team say this approach considerably outperforms the mixed algorithms that use only individual visiting history and location popularity. “To our surprise, a hybrid algorithm weighted by the indigenization coefficients outperforms the mixed algorithm accounting for additional demographical information.”
It’s easy to imagine how such an algorithm might be useful for businesses who want to target certain types of travelers or local people. But there is a more interesting application too.
Zimo and co say that it is possible to monitor the way an individual’s mobility patterns change over time. So if a person moves to a new city, it should be possible to see how long it takes them to settle in.
One way of measuring this is in their mobility patterns: whether they are more like those of a local or a non-local. “We may be able to estimate whether a non-native person will behave like a native person after a time period and if so, how long in average a person takes to become a native-like one,” say Zimo and co.
That could have a fascinating impact on the way anthropologists study migration and the way immigrants become part of a local community. This is computational anthropology a science that is clearly in its early stages but one that has huge potential for the future.”
Ref: arxiv.org/abs/1405.7769 : Indigenization of Urban Mobility
How Long Is Too Long? The 4th Amendment and the Mosaic Theory
Law and Liberty Blog: “Volume 8.2 of the NYU Journal of Law and Liberty has been sent to the printer and physical copies will be available soon, but the articles in the issue are already available online here. One article that has gotten a lot of attention so far is by Steven Bellovin, Renee Hutchins, Tony Jebara, and Sebastian Zimmeck titled “When Enough is Enough: Location Tracking, Mosaic Theory, and Machine Learning.” A direct link to the article is here.
The mosaic theory is a modern corollary accepted by some academics – and the D.C. Circuit Court of Appeals in Maynard v. U.S. – as a twenty-first century extension of the Fourth Amendment’s prohibition on unreasonable searches of seizures. Proponents of the mosaic theory argue that at some point enough individual data collections, compiled and analyzed together, become a Fourth Amendment search. Thirty years ago the Supreme Court upheld the use of a tracking device for three days without a warrant, however the proliferation of GPS tracking in cars and smartphones has made it significantly easier for the police to access a treasure trove of information about our location at any given time.
It is easy to see why this theory has attracted some support. Humans are creatures of habit – if our public locations are tracked for a few days, weeks, or a month, it is pretty easy for machines to learn our ways and assemble a fairly detailed report for the government about our lives. Machines could basically predict when you will leave your house for work, what route you will take, when and where you go grocery shopping, all before you even do it, once it knows your habits. A policeman could observe you moving about in public without a warrant of course, but limited manpower will always reduce the probability of continuous mass surveillance. With current technology, a handful of trained experts could easily monitor hundreds of people at a time from behind a computer screen, and gather even more information than most searches requiring a warrant. The Supreme Court indicated a willingness to consider the mosaic theory in U.S. v. Jones, but has yet to embrace it…”
The article in Law & Liberty details the need to determine at which point machine learning creates an intrusion into our reasonable expectations of privacy, and even discusses an experiment that could be run to determine how long data collection can proceed before it is an intrusion. If there is a line at which individual data collection becomes a search, we need to discover where that line is. One of the articles’ authors, Steven Bollovin, has argued that the line is probably at one week – at that point your weekday and weekend habits would be known. The nation’s leading legal expert on criminal law, Professor Orin Kerr, fired back on the Volokh Conspiracy that Bollovin’s one week argument is not in line with previous iterations of the mosaic theory.
Open Government Will Reshape Latin America
Alejandro Guerrero at Medium: “When people think on the place for innovations, they typically think on innovation being spurred by large firms and small startups based in the US. And particularly in that narrow stretch of land and water called Silicon Valley.
However, the flux of innovation taking place in the intersection between technology and government is phenomenal and emerging everywhere. From the marble hallways of parliaments everywhere —including Latin America’s legislative houses— to office hubs of tech-savvy non-profits full of enthusiastic social changers —also including Latin American startups— a driving force is starting to challenge our conception of how government and citizens can and should interact. And few people are discussing or analyzing these developments.
Open Government in Latin America
The potential for Open Government to improve government’s decision-making and performance is huge. And it is particularly immense in middle income countries such as the ones in Latin America, where the combination of growing incomes, more sophisticated citizens’ demands, and broken public services is generating a large bottom-up pressure and requesting more creative solutions from governments to meet the enormous social needs, while cutting down corruption and improving governance.
It is unsurprising that citizens from all over Latin America are increasingly taking the streets and demanding better public services and more transparent institutions.
While these protests are necessarily short-lived and unarticulated —a product of growing frustration with government— they are a symptom with deeper causes that won’t go easily away, and these protests will most likely come back with increasing frequency and the unresolved frustration may eventually transmute in political platforms with more radical ideas to challenge the status quo.
Behind the scene, governments across the region still face enormous weaknesses in public management, ill-prepared and underpaid public officials carry on with their duties as the platonic idea of a demotivated workforce, and the opportunities for corruption, waste, and nepotism are plenty. The growing segment of more affluent citizens simply opt out from government and resort to private alternatives, thus exacerbating inequalities in the already most unequal region in the world. The crumbling middle classes and the poor can just resort to voicing their complaints. And they are increasingly doing so.
And here is where open government initiatives might play a transformative role, disrupting the way governments make decisions and work while empowering citizens in the process.
The preconditions for OpenGov are almost here
In Latin America, connectivity rates are growing fast (reaching 61% in 2013 for the Americas as a whole), close to 90% of the population owns a cellphone, and access to higher levels of education keeps growing (as an example, the latest PISA report indicates that Mexico went from 58% in 2003 to 70% high-schoolers in 2012). The social conditions for a stronger role of citizens in government are increasingly there.
Moreover, most Latin American countries passed transparency laws during the 2000s, creating the enabling environment for open government initiatives to flourish. It is thus unsurprising that the next generation of young government bureaucrats, on average more internet-savvy and better educated than its predecessors, is taking over and embracing innovations in government. And they are finding echo (and suppliers of ideas and apps!) among local startups and civil society groups, while also being courted by large tech corporations (think of Google or Microsoft) behind succulent government contracts associated with this form of “doing good”.
This is an emerging galaxy of social innovators, technologically-savvy bureaucrats, and engaged citizens providing a large crowd-sourcing community and an opportunity to test different approaches. And the underlying tectonic shifts are pushing governments towards that direction. For a sampler, check out the latest developments for Brazil, Argentina, Peru, Mexico, Colombia, Paraguay, Chile, Panama, Costa Rica, Guatemala, Honduras, Dominican Republic, Uruguay and (why not?) my own country, which I will include in the review often for the surprisingly limited progress of open government in this OECD member, which shares similar institutions and challenges with Latin America.
A Road Full of Promise…and Obstacles
Most of the progress in Latin America is quite recent, and the real impact is still often more limited once you abandon the halls of the Digital Government directorates and secretarías or look if you look beyond the typical government data portal. The resistance to change is as human as laughing, but it is particularly intense among the public sector side of human beings. Politics also typically plays a enormous role in resisting transparency open government, and in a context of weak institutions and pervasive corruption, the temptation to politically block or water down open data/open government projects is just too high. Selective release of data (if any) is too frequent, government agencies often act as silos by not sharing information with other government departments, and irrational fears by policy-makers combined with adoption barriers (well explained here) all contribute to deter the progress of the open government promise in Latin America…”
US Secret Service seeks Twitter sarcasm detector
BBC: “The agency has put out a work tender looking for a software system to analyse social media data.
The software should have, among other things, the “ability to detect sarcasm and false positives”.
A spokesman for the service said it currently used the Federal Emergency Management Agency’s Twitter analytics and needed its own, adding: “We aren’t looking solely to detect sarcasm.”
The Washington Post quoted Ed Donovan as saying: “Our objective is to automate our social media monitoring process. Twitter is what we analyse.
“This is real-time stream analysis. The ability to detect sarcasm and false positives is just one of 16 or 18 things we are looking at.”…
The tender was put out earlier this week on the US government’s Federal Business Opportunities website.
It sets out the objectives of automating social media monitoring and “synthesising large sets of social media data”.
Specific requirements include “audience and geographic segmentation” and analysing “sentiment and trend”.
The software also has to have “compatibility with Internet Explorer 8”. The browser was released more than five years ago.
The agency does not detail the purpose of the analysis but does set out its mission, which includes “preserving the integrity of the economy and protecting national leaders and visiting heads of state and government”.
OSTP’s Own Open Government Plan
: “The White House Office of Science and Technology Policy (OSTP) today released its 2014 Open Government Plan. The OSTP plan highlights three flagship efforts as well as the team’s ongoing work to embed the open government principles of transparency, participation, and collaboration into its activities.
OSTP advises the President on the effects of science and technology on domestic and international affairs. The work of the office includes policy efforts encompassing science, environment, energy, national security, technology, and innovation. This plan builds off of the 2010 and 2012 Open Government Plans, updating progress on past initiatives and adding new subject areas based on 2014 guidance.
Agencies began releasing biennial Open Government Plans in 2010, with direction from the 2009 Open Government Directive. These plans serve as a roadmap for agency openness efforts, explaining existing practices and announcing new endeavors to be completed over the coming two years. Agencies build these plans in consultation with civil society stakeholders and the general public. Open government is a vital component of the President’s Management Agenda and our overall effort to ensure the government is expanding economic growth and opportunity for all Americans.
OSTP’s 2014 flagship efforts include:
- Access to Scientific Collections: OSTP is leading agencies in developing policies that will improve the management of and access to scientific collections that agencies own or support. Scientific collections are assemblies of physical objects that are valuable for research and education—including drilling cores from the ocean floor and glaciers, seeds, space rocks, cells, mineral samples, fossils, and more. Agency policies will help make scientific collections and information about scientific collections more transparent and accessible in the coming years.
- We the Geeks: We the Geeks Google+ Hangouts feature informal conversations with experts to highlight the future of science, technology, and innovation in the United States. Participants can join the conversation on Twitter by using the hashtag #WeTheGeeks and asking questions of the presenters throughout the hangout.
- “All Hands on Deck” on STEM Education: OSTP is helping lead President Obama’s commitment to an “all-hands-on-deck approach” to providing students with skills they need to excel in science, technology, engineering, and math (STEM). In support of this goal, OSTP is bringing together government, industry, non-profits, philanthropy, and others to expand STEM education engagement and awareness through events like the annual White House Science Fair and the upcoming White House Maker Faire.
OSTP looks forward to implementing the 2014 Open Government Plan over the coming two years to continue building on its strong tradition of transparency, participation, and collaboration—with and for the American people.”