DATA – Page 542 – The Living Library

The Commodification of Patient Opinion: the Digital Patient Experience Economy in the Age of Big Data

Curated on May 9, 2013October 10, 2018 by Stefaan Verhulst

Paper by Lupton, Deborah, from the Sydney Unversity’s Department of Sociology and Social Policy . Abstract: “As part of the digital health phenomenon, a plethora of interactive digital platforms have been established in recent years to elicit lay people’s experiences of illness and healthcare. The function of these platforms, as expressed on the main pages of their websites, is to provide the tools and forums whereby patients and caregivers, and in some cases medical practitioners, can share their experiences with others, benefit from the support and knowledge of other contributors and contribute to large aggregated data archives as part of developing better medical treatments and services and conducting medical research.
However what may not always be readily apparent to the users of these platforms are the growing commercial uses by many of the platforms’ owners of the archives of the data they contribute. This article examines this phenomenon of what I term ‘the digital patient experience economy’. In so doing I discuss such aspects as prosumption, the phenomena of big data and metric assemblages, the discourse and ethic of sharing and the commercialisation of affective labour via such platforms. I argue that via these online platforms patients’ opinions and experiences may be expressed in more diverse and accessible forums than ever before, but simultaneously they have become exploited in novel ways.”

Bringing the deep, dark world of public data to light

Curated on May 8, 2013August 3, 2018 by Stefaan Verhulst

Venturebeat: “The realm of public data is like a vast cave. It is technically open to all, but it contains many secrets and obstacles within its walls.
Enigma launched out of beta today to shed light on this hidden world. This “big data” startup focuses on data in the public domain, such as those published by governments, NGOs, and the media….
The company describes itself as “Google for public data.” Using a combination of automated web crawlers and directly reaching out to government agencies, Engima’s database contains billions of public records across more than 100,000 datasets. Pulling them all together breaks down the barriers that exist between various local, state, federal, and institutional search portals. On top of this information is an “entity graph” which searches through the data to discover relevant results. Furthermore, once the information is broken out of the silos, users can filter, reshape, and connect various datasets to find correlations….
The technology has a wide range of applications, including professional services, finance, news media, big data, and academia. Engima has formed strategic partnerships in each of these verticals with Deloitte, Gerson Lehrman Group, The New York Times, S&P Capital IQ, and Harvard Business School, respectively.”

Personal Information Is the Currency of the 21st Century

Curated on May 8, 2013August 15, 2018 by Stefaan Verhulst

Tom Cochran (CTO at Atlantic Media) in All Things D: “The currency of the 21st century digital economy is your personal information. It has no transaction costs and does not decrease in value when the supply increases. Contrary to the laws of economics, it may even increase in value with greater supply. The more information you provide to companies, the more value they can extract from it….
Conversely, we tend to ignore this process because the most magnificent, technologically advanced and socially connected digital city is being built from it.
You are living in this growing digital city, and I’m guessing that you really like it here. Unfortunately, you can’t live in this city for free. Your rent is due in the form of your personal information, and you have to accept a certain loss of your privacy….
As a society, we need to define the rules under which our personal information can be mined. Our collective unease is largely the result of not having clear parameters to create an equilibrium between privacy and personalization.
These parameters will help shift our focus from the negatives to the positives, because in return for your personal information, you realize a net benefit with tremendous value.”

Human-Based Evolutionary Computing

Curated on May 8, 2013November 13, 2018 by Stefaan Verhulst

Abstract of new paper by Jeffrey V. Nickerson on Human-Based Evolutionary Computing (in Handbook of Human Computation, P. Michelucci, eds., Springer, Forthcoming): “Evolution explains the way the natural world changes over time. It can also explain changes in the artificial world, such as the way ideas replicate, alter, and merge. This analogy has led to a family of related computer procedures called evolutionary algorithms. These algorithms are being used to produce product designs, art, and solutions to mathematical problems. While for the most part these algorithms are run on computers, they also can be performed by people. Such human-based evolutionary algorithms are useful when many different ideas, designs, or solutions need to be generated, and human cognition is called for”

New NAS Report: Copyright in the Digital Era: Building Evidence for Policy

Curated on May 7, 2013November 13, 2018 by Stefaan Verhulst

National Academies of Sciences: “Over the course of several decades, copyright protection has been expanded and extended through legislative changes occasioned by national and international developments. The content and technology industries affected by copyright and its exceptions, and in some cases balancing the two, have become increasingly important as sources of economic growth, relatively high-paying jobs, and exports. Since the expansion of digital technology in the mid-1990s, they have undergone a technological revolution that has disrupted long-established modes of creating, distributing, and using works ranging from literature and news to film and music to scientific publications and computer software.

In the United States and internationally, these disruptive changes have given rise to a strident debate over copyright’s proper scope and terms and means of its enforcement–a debate between those who believe the digital revolution is progressively undermining the copyright protection essential to encourage the funding, creation, and distribution of new works and those who believe that enhancements to copyright are inhibiting technological innovation and free expression.

Copyright in the Digital Era: Building Evidence for Policy examines a range of questions regarding copyright policy by using a variety of methods, such as case studies, international and sectoral comparisons, and experiments and surveys. This report is especially critical in light of digital age developments that may, for example, change the incentive calculus for various actors in the copyright system, impact the costs of voluntary copyright transactions, pose new enforcement challenges, and change the optimal balance between copyright protection and exceptions.”

Is Privacy Algorithmically Impossible?

Curated on May 7, 2013May 29, 2019 by Stefaan Verhulst

MIT Technology Review : “In 1995, the European Union introduced privacy legislation that defined “personal data” as any information that could identify a person, directly or indirectly. The legislators were apparently thinking of things like documents with an identification number, and they wanted them protected just as if they carried your name.
Today, that definition encompasses far more information than those European legislators could ever have imagined—easily more than all the bits and bytes in the entire world when they wrote their law 18 years ago.
Here’s what happened. First, the amount of data created each year has grown exponentially (see figure)…
Much of this data is invisible to people and seems impersonal. But it’s not. What modern data science is finding is that nearly any type of data can be used, much like a fingerprint, to identify the person who created it: your choice of movies on Netflix, the location signals emitted by your cell phone, even your pattern of walking as recorded by a surveillance camera. In effect, the more data there is, the less any of it can be said to be private. We are coming to the point that if the commercial incentives to mine the data are in place, anonymity of any kind may be “algorithmically impossible,” says Princeton University computer scientist Arvind Narayanan.”

6 Things You May Not Know About Open Data

Curated on May 5, 2013August 3, 2018 by Stefaan Verhulst

GovTech: “On Friday, May 3, Palo Alto, Calif., CIO Jonathan Reichental …said that when it comes to making data more open, “The invisible becomes visible,” and he outlined six major points that identify and define what open data really is:

1. It’s the liberation of peoples’ data

The public sector collects data that pertains to government, such as employee salaries, trees or street information, and government entities are therefore responsible for liberating that data so the constituent can view it in an accessible format. Though this practice has become more commonplace in recent years, Reichental said government should have been doing this all along.

2. Data has to be consumable by a machine

Piecing data together from a spreadsheet to a website or containing it in a PDF isn’t the easiest way to retrieve data. To make data more open, in needs to be in a readable format so users don’t have to go through additional trouble of finding or reading it.

3. Data has a derivative value

When data is made available to the public, people like app developers, arichitects or others are able to analyze the data. In some cases, data can be used in city planning to understand what’s happening at the city scale.

4. It eliminates the middleman

For many states, public records laws require them to provide data when a public records request is made. But oftentimes, complying with such request regulations involves long and cumbersome processes. Lawyers and other government officials must process paperwork, and it can take weeks to complete a request. By having data readily available, these processes can be eliminated, thus also eliminating the middleman responsible for processing the requests. Direct access to the data saves time and resources.

5. Data creates deeper accountability

Since government is expected to provide accessible data, it is therefore being watched, making it more accountable for its actions — everything from emails, salaries and city council minutes can be viewed by the public.

6. Open Data builds trust

When the community can see what’s going on in its government through the access of data, Reichtental said individuals begin to build more trust in their government and feel less like the government is hiding information.”

Linking open data to augmented intelligence and the economy

Curated on May 3, 2013July 18, 2019 by Stefaan Verhulst

Open Data Institute and Professor Nigel Shadbolt (@Nigel_Shadbolt) interviewed by by Alex Howard (@digiphile): “…there are some clear learnings. One that I’ve been banging on about recently has been that yes, it really does matter to turn the dial so that governments have a presumption to publish non-personal public data. If you would publish it anyway, under a Freedom of Information request or whatever your local legislative equivalent is, why aren’t you publishing it anyway as open data? That, as a behavioral change. is a big one for many administrations where either the existing workflow or culture is, “Okay, we collect it. We sit on it. We do some analysis on it, and we might give it away piecemeal if people ask for it.” We should construct publication process from the outset to presume to publish openly. That’s still something that we are two or three years away from, working hard with the public sector to work out how to do and how to do properly.
We’ve also learned that in many jurisdictions, the amount of [open data] expertise within administrations and within departments is slight. There just isn’t really the skillset, in many cases. for people to know what it is to publish using technology platforms. So there’s a capability-building piece, too.
One of the most important things is it’s not enough to just put lots and lots of datasets out there. It would be great if the “presumption to publish” meant they were all out there anyway — but when you haven’t got any datasets out there and you’re thinking about where to start, the tough question is to say, “How can I publish data that matters to people?”
The data that matters is revealed in the fact that if we look at the download stats on these various UK, US and other [open data] sites. There’s a very, very distinctive parallel curve. Some datasets are very, very heavily utilized. You suspect they have high utility to many, many people. Many of the others, if they can be found at all, aren’t being used particularly much. That’s not to say that, under that long tail, there isn’t large amounts of use. A particularly arcane open dataset may have exquisite use to a small number of people.
The real truth is that it’s easy to republish your national statistics. It’s much harder to do a serious job on publishing your spending data in detail, publishing police and crime data, publishing educational data, publishing actual overall health performance indicators. These are tough datasets to release. As people are fond of saying, it holds politicians’ feet to the fire. It’s easy to build a site that’s full of stuff — but does the stuff actually matter? And does it have any economic utility?”
there are some clear learnings. One that I’ve been banging on about recently has been that yes, it really does matter to turn the dial so that governments have a presumption to publish non-personal public data. If you would publish it anyway, under a Freedom of Information request or whatever your local legislative equivalent is, why aren’t you publishing it anyway as open data? That, as a behavioral change. is a big one for many administrations where either the existing workflow or culture is, “Okay, we collect it. We sit on it. We do some analysis on it, and we might give it away piecemeal if people ask for it.” We should construct publication process from the outset to presume to publish openly. That’s still something that we are two or three years away from, working hard with the public sector to work out how to do and how to do properly.
We’ve also learned that in many jurisdictions, the amount of [open data] expertise within administrations and within departments is slight. There just isn’t really the skillset, in many cases. for people to know what it is to publish using technology platforms. So there’s a capability-building piece, too.
One of the most important things is it’s not enough to just put lots and lots of datasets out there. It would be great if the “presumption to publish” meant they were all out there anyway — but when you haven’t got any datasets out there and you’re thinking about where to start, the tough question is to say, “How can I publish data that matters to people?”
The data that matters is revealed in the fact that if we look at the download stats on these various UK, US and other [open data] sites. There’s a very, very distinctive parallel curve. Some datasets are very, very heavily utilized. You suspect they have high utility to many, many people. Many of the others, if they can be found at all, aren’t being used particularly much. That’s not to say that, under that long tail, there isn’t large amounts of use. A particularly arcane open dataset may have exquisite use to a small number of people.
The real truth is that it’s easy to republish your national statistics. It’s much harder to do a serious job on publishing your spending data in detail, publishing police and crime data, publishing educational data, publishing actual overall health performance indicators. These are tough datasets to release. As people are fond of saying, it holds politicians’ feet to the fire. It’s easy to build a site that’s full of stuff — but does the stuff actually matter? And does it have any economic utility?

The Big Data Debate: Correlation vs. Causation

Curated on May 3, 2013August 3, 2018 by Stefaan Verhulst

Gil Press: “In the first quarter of 2013, the stock of big data has experienced sudden declines followed by sporadic bouts of enthusiasm. The volatility—a new big data “V”—continues and Ted Cuzzillo summed up the recent negative sentiment in “Big data, big hype, big danger” on SmartDataCollective:
“A remarkable thing happened in Big Data last week. One of Big Data’s best friends poked fun at one of its cornerstones: the Three V’s. The well-networked and alert observer Shawn Rogers, vice president of research at Enterprise Management Associates, tweeted his eight V’s: ‘…Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data.’ He was quick to explain to me that this is no comment on Gartner analyst Doug Laney’s three-V definition. Shawn’s just tired of people getting stuck on V’s.”…
Cuzzillo is joined by a growing chorus of critics that challenge some of the breathless pronouncements of big data enthusiasts. Specifically, it looks like the backlash theme-of-the-month is correlation vs. causation, possibly in reaction to the success of Viktor Mayer-Schönberger and Kenneth Cukier’s recent big data book in which they argued for dispensing “with a reliance on causation in favor of correlation”…
In “Steamrolled by Big Data,” The New Yorker’s Gary Marcus declares that “Big Data isn’t nearly the boundless miracle that many people seem to think it is.”…
Matti Keltanen at The Guardian agrees, explaining “Why ‘lean data’ beats big data.” Writes Keltanen: “…the lightest, simplest way to achieve your data analysis goals is the best one…The dirty secret of big data is that no algorithm can tell you what’s significant, or what it means. Data then becomes another problem for you to solve. A lean data approach suggests starting with questions relevant to your business and finding ways to answer them through data, rather than sifting through countless data sets. Furthermore, purely algorithmic extraction of rules from data is prone to creating spurious connections, such as false correlations… today’s big data hype seems more concerned with indiscriminate hoarding than helping businesses make the right decisions.”
In “Data Skepticism,” O’Reilly Radar’s Mike Loukides adds this gem to the discussion: “The idea that there are limitations to data, even very big data, doesn’t contradict Google’s mantra that more data is better than smarter algorithms; it does mean that even when you have unlimited data, you have to be very careful about the conclusions you draw from that data. It is in conflict with the all-too-common idea that, if you have lots and lots of data, correlation is as good as causation.”
Isn’t more-data-is-better the same as correlation-is-as-good-as-causation? Or, in the words of Chris Andersen, “with enough data, the numbers speak for themselves.”
“Can numbers actually speak for themselves?” non-believer Kate Crawford asks in “The Hidden Biases in Big Data” on the Harvard Business Review blog and answers: “Sadly, they can’t. Data and data sets are not objective; they are creations of human design…
And David Brooks in The New York Times, while probing the limits of “the big data revolution,” takes the discussion to yet another level: “One limit is that correlations are actually not all that clear. A zillion things can correlate with each other, depending on how you structure the data and what you compare. To discern meaningful correlations from meaningless ones, you often have to rely on some causal hypothesis about what is leading to what. You wind up back in the land of human theorizing…”