Global Internet Policy Observatory (GIPO)

European Commission Press Release: “The Commission today unveiled plans for the Global Internet Policy Observatory (GIPO), an online platform to improve knowledge of and participation of all stakeholders across the world in debates and decisions on Internet policies. GIPO will be developed by the Commission and a core alliance of countries and Non Governmental Organisations involved in Internet governance. Brazil, the African Union, Switzerland, the Association for Progressive Communication, Diplo Foundation and the Internet Society have agreed to cooperate or have expressed their interest to be involved in the project.
The Global Internet Policy Observatory will act as a clearinghouse for monitoring Internet policy, regulatory and technological developments across the world.
It will:

  • automatically monitor Internet-related policy developments at the global level, making full use of “big data” technologies;
  • identify links between different fora and discussions, with the objective to overcome “policy silos”;
  • help contextualise information, for example by collecting existing academic information on a specific topic, highlighting the historical and current position of the main actors on a particular issue, identifying the interests of different actors in various policy fields;
  • identify policy trends, via quantitative and qualitative methods such as semantic and sentiment analysis;
  • provide easy-to-use briefings and reports by incorporating modern visualisation techniques;”

The Commodification of Patient Opinion: the Digital Patient Experience Economy in the Age of Big Data

Paper by Lupton, Deborah, from the Sydney Unversity’s Department of Sociology and Social Policy . Abstract: “As part of the digital health phenomenon, a plethora of interactive digital platforms have been established in recent years to elicit lay people’s experiences of illness and healthcare. The function of these platforms, as expressed on the main pages of their websites, is to provide the tools and forums whereby patients and caregivers, and in some cases medical practitioners, can share their experiences with others, benefit from the support and knowledge of other contributors and contribute to large aggregated data archives as part of developing better medical treatments and services and conducting medical research.
However what may not always be readily apparent to the users of these platforms are the growing commercial uses by many of the platforms’ owners of the archives of the data they contribute. This article examines this phenomenon of what I term ‘the digital patient experience economy’. In so doing I discuss such aspects as prosumption, the phenomena of big data and metric assemblages, the discourse and ethic of sharing and the commercialisation of affective labour via such platforms. I argue that via these online platforms patients’ opinions and experiences may be expressed in more diverse and accessible forums than ever before, but simultaneously they have become exploited in novel ways.”

Bringing the deep, dark world of public data to light

public_img03Venturebeat: “The realm of public data is like a vast cave. It is technically open to all, but it contains many secrets and obstacles within its walls.
Enigma launched out of beta today to shed light on this hidden world. This “big data” startup focuses on data in the public domain, such as those published by governments, NGOs, and the media….
The company describes itself as “Google for public data.” Using a combination of automated web crawlers and directly reaching out to government agencies, Engima’s database contains billions of public records across more than 100,000 datasets. Pulling them all together breaks down the barriers that exist between various local, state, federal, and institutional search portals. On top of this information is an “entity graph” which searches through the data to discover relevant results. Furthermore, once the information is broken out of the silos, users can filter, reshape, and connect various datasets to find correlations….
The technology has a wide range of applications, including professional services, finance, news media, big data, and academia. Engima has formed strategic partnerships in each of these verticals with Deloitte, Gerson Lehrman Group, The New York Times, S&P Capital IQ, and Harvard Business School, respectively.”

The Big Data Debate: Correlation vs. Causation

Gil Press: “In the first quarter of 2013, the stock of big data has experienced sudden declines followed by sporadic bouts of enthusiasm. The volatility—a new big data “V”—continues and Ted Cuzzillo summed up the recent negative sentiment in “Big data, big hype, big danger” on SmartDataCollective:
“A remarkable thing happened in Big Data last week. One of Big Data’s best friends poked fun at one of its cornerstones: the Three V’s. The well-networked and alert observer Shawn Rogers, vice president of research at Enterprise Management Associates, tweeted his eight V’s: ‘…Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data.’ He was quick to explain to me that this is no comment on Gartner analyst Doug Laney’s three-V definition. Shawn’s just tired of people getting stuck on V’s.”…
Cuzzillo is joined by a growing chorus of critics that challenge some of the breathless pronouncements of big data enthusiasts. Specifically, it looks like the backlash theme-of-the-month is correlation vs. causation, possibly in reaction to the success of Viktor Mayer-Schönberger and Kenneth Cukier’s recent big data book in which they argued for dispensing “with a reliance on causation in favor of correlation”…
In “Steamrolled by Big Data,” The New Yorker’s Gary Marcus declares that “Big Data isn’t nearly the boundless miracle that many people seem to think it is.”…
Matti Keltanen at The Guardian agrees, explaining “Why ‘lean data’ beats big data.” Writes Keltanen: “…the lightest, simplest way to achieve your data analysis goals is the best one…The dirty secret of big data is that no algorithm can tell you what’s significant, or what it means. Data then becomes another problem for you to solve. A lean data approach suggests starting with questions relevant to your business and finding ways to answer them through data, rather than sifting through countless data sets. Furthermore, purely algorithmic extraction of rules from data is prone to creating spurious connections, such as false correlations… today’s big data hype seems more concerned with indiscriminate hoarding than helping businesses make the right decisions.”
In “Data Skepticism,” O’Reilly Radar’s Mike Loukides adds this gem to the discussion: “The idea that there are limitations to data, even very big data, doesn’t contradict Google’s mantra that more data is better than smarter algorithms; it does mean that even when you have unlimited data, you have to be very careful about the conclusions you draw from that data. It is in conflict with the all-too-common idea that, if you have lots and lots of data, correlation is as good as causation.”
Isn’t more-data-is-better the same as correlation-is-as-good-as-causation? Or, in the words of Chris Andersen, “with enough data, the numbers speak for themselves.”
“Can numbers actually speak for themselves?” non-believer Kate Crawford asks in “The Hidden Biases in Big Data” on the Harvard Business Review blog and answers: “Sadly, they can’t. Data and data sets are not objective; they are creations of human design…
And David Brooks in The New York Times, while probing the limits of “the big data revolution,” takes the discussion to yet another level: “One limit is that correlations are actually not all that clear. A zillion things can correlate with each other, depending on how you structure the data and what you compare. To discern meaningful correlations from meaningless ones, you often have to rely on some causal hypothesis about what is leading to what. You wind up back in the land of human theorizing…”

The Rise of Big Data

Kenneth Neil Cukier and Viktor Mayer-Schoenberger in Foreign Affairs: “Everyone knows that the Internet has changed how businesses operate, governments function, and people live. But a new, less visible technological trend is just as transformative: “big data.” Big data starts with the fact that there is a lot more information floating around these days than ever before, and it is being put to extraordinary new uses. Big data is distinct from the Internet, although the Web makes it much easier to collect and share data. Big data is about more than just communication: the idea is that we can learn from a large body of information things that we could not comprehend when we used only smaller amounts.”
Gideon Rose, editor of Foreign Affairs, sits down with Kenneth Cukier, data editor of The Economist (video):

Crowd diagnosis could spot rare diseases doctors miss

New Scientist: “Diagnosing rare illnesses could get easier, thanks to new web-based tools that pool information from a wide variety of sources…CrowdMed, launched on 16 April at the TedMed conference in Washington DC, uses crowds to solve tough medical cases.

Anyone can join CrowdMed and analyse cases, regardless of their background or training. Participants are given points that they can then use to bet on the correct diagnosis from lists of suggestions. This creates a prediction market, with diagnoses falling and rising in value based on their popularity, like stocks in a stock market. Algorithms then calculate the probability that each diagnosis will be correct. In 20 initial test cases, around 700 participants identified each of the mystery diseases as one of their top three suggestions….

Frustrated patients and doctors can also turn to FindZebra, a recently launched search engine for rare diseases. It lets users search an index of rare disease databases looked after by a team of researchers. In initial trials, FindZebra returned more helpful results than Google on searches within this same dataset.”

White House: Unleashing the Power of Big Data

Tom Kalil, Deputy Director for Technology and Innovation at OSTP : “As we enter the second year of the Big Data Initiative, the Obama Administration is encouraging multiple stakeholders, including federal agencies, private industry, academia, state and local government, non-profits, and foundations to develop and participate in Big Data initiatives across the country.  Of particular interest are partnerships designed to advance core Big Data technologies; harness the power of Big Data to advance national goals such as economic growth, education, health, and clean energy; use competitions and challenges; and foster regional innovation.
The National Science Foundation has issued a request for information encouraging stakeholders to identify Big Data projects they would be willing to support to achieve these goals.  And, later this year, OSTP, NSF, and other partner agencies in the Networking and Information Technology R&D (NITRD) program plan to convene an event that highlights high-impact collaborations and identifies areas for expanded collaboration between the public and private sectors.”

Work-force Science and Big Data

Steve Lohr from the New York Times: “Work-force science, in short, is what happens when Big Data meets H.R….Today, every e-mail, instant message, phone call, line of written code and mouse-click leaves a digital signal. These patterns can now be inexpensively collected and mined for insights into how people work and communicate, potentially opening doors to more efficiency and innovation within companies.

Digital technology also makes it possible to conduct and aggregate personality-based assessments, often using online quizzes or games, in far greater detail and numbers than ever before. In the past, studies of worker behavior were typically based on observing a few hundred people at most. Today, studies can include thousands or hundreds of thousands of workers, an exponential leap ahead.

“The heart of science is measurement,” says Erik Brynjolfsson, director of the Center for Digital Business at the Sloan School of Management at M.I.T. “We’re seeing a revolution in measurement, and it will revolutionize organizational economics and personnel economics.”

The data-gathering technology, to be sure, raises questions about the limits of worker surveillance. “The larger problem here is that all these workplace metrics are being collected when you as a worker are essentially behind a one-way mirror,” says Marc Rotenberg, executive director of the Electronic Privacy Information Center, an advocacy group. “You don’t know what data is being collected and how it is used.”

David Brooks on Big Data

David Brooks in NYT: “Over the past few centuries, there have been many efforts to come up with methods to help predict human behavior — what Leon Wieseltier of The New Republic calls mathematizing the subjective. The current one is the effort to understand the world by using big data.

Other efforts to predict behavior were based on models of human nature. The people using big data don’t presume to peer deeply into people’s souls. They don’t try to explain why people are doing things. They just want to observe what they are doing. The theory of big data is to have no theory, at least about human nature. You just gather huge amounts of information, observe the patterns and estimate probabilities about how people will act in the future….

One of my take-aways is that big data is really good at telling you what to pay attention to. It can tell you what sort of student is likely to fall behind. But then to actually intervene to help that student, you have to get back in the world of causality, back into the world of responsibility, back in the world of advising someone to do x because it will cause y.”

Big Data, Big Brains

“This report on Big Data is the first MeriTalk Beacon, a new series of reports designed to shed light and provide direction on far reaching issues in government and technology. Since Beacons are designed to tackle broad concepts, each Beacon report relies on insight from a small number of big thinkers in the topic area. Less data. More insight. Real knowledge…Mankind created 150 exabytes (billion gigabytes) of data in 2005, and 1,800 exabytes in 20112; growth that only continues to accelerate. Every minute, users: Upload 48 hours of video to YouTube; Send 204 million emails; Spend $207,000 via the web; Create 571 new websites. Within the Federal government; U.S. drone aircraft sent back 24 years worth of video footage in just 2009. Every 24 hours, NASA’s Curiosity rover can send nearly three gigabytes of data, collecting in mere days the equivalent of all human knowledge through the death of Augustus Caesar – from Mars.”