Tim Berners-Lee: we need to re-decentralise the web


Wired:  “Twenty-five years on from the web’s inception, its creator has urged the public to re-engage with its original design: a decentralised internet that at its very core, remains open to all.
Speaking with Wired editor David Rowan at an event launching the magazine’s March issue, Tim Berners-Lee said that although part of this is about keeping an eye on for-profit internet monopolies such as search engines and social networks, the greatest danger is the emergence of a balkanised web.
“I want a web that’s open, works internationally, works as well as possible and is not nation-based,” Berners-Lee told the audience… “What I don’t want is a web where the  Brazilian government has every social network’s data stored on servers on Brazilian soil. That would make it so difficult to set one up.”
It’s the role of governments, startups and journalists to keep that conversation at the fore, he added, because the pace of change is not slowing — it’s going faster than ever before. For his part Berners-Lee drives the issue through his work at the Open Data Institute, World Wide Web Consortium and World Wide Web Foundation, but also as an MIT professor whose students are “building new architectures for the web where it’s decentralised”. On the issue of monopolies, Berners-Lee did say it’s concerning to be “reliant on big companies, and one big server”, something that stalls innovation, but that competition has historically resolved these issues and will continue to do so.
The kind of balkanised web he spoke about, as typified by Brazil’s home-soil servers argument or Iran’s emerging intranet, is partially being driven by revelations of NSA and GCHQ mass surveillance. The distrust that it has brewed, from a political level right down to the threat of self-censorship among ordinary citizens, threatens an open web and is, said Berners-Lee,  a greater threat than censorship. Knowing the NSA  may be breaking commercial encryption services could result in the emergence of more networks like China’s Great Firewall, to “protect” citizens. This is why we need a bit of anti-establishment push back, alluded to by Berners-Lee.”

New format of congressional documents creates transparency, opportunity


Colby Hochmuth in FedScoop:  “Today, in a major win for the open data community, the Government Printing Office partnered with the Library of Congress to make summaries for House bills available in XML format. Bill summaries can now be downloaded in bulk from GPO’s Federal Digital System or FDsys. This format allows the data to be repurposed and reused by third-party providers, whether for mobile apps, analytical purposes or data mashups.

“This is the result of an effective ongoing collaboration between the Library and GPO to provide legislative information in modern, widely used formats,” said Librarian of Congress James Billington. “We are pleased the popular bill summaries, which provide objective descriptions of complex legislative text, are now available through the Federal Digital System.”
The movement for more transparent government data gained traction in the House Appropriations Committee and its support of the task force on bulk data established by the House.
House bills, the Federal Register, the Code of Federal Regulations as well as various executive branch documents are currently available in XML downloadable format, but this latest development is different. These bill summaries are prepared by LOC’s Congressional Research Service and describe the key provisions of a piece of legislation, and explain the potential implications the legislation may have on current federal programs and laws.”

Open data: Strategies for impact


Havey Lewis at Open Government Partnership Blog: “When someone used to talk about “data for good”, chances are they were wondering whether the open data stream they relied on was still going to be available in the future. Similarly, “good with data” meant that experienced data scientists were being sought for a deeply technical project. Both interpretations reflect a state of being rather than of doing: data being around for good; people being good with data.
Important though these considerations are, they miss what should be an obvious and more profound alternative.
Right now, organisations like DataKind™  and Periscopic, and many other entrepreneurs, innovators and established social enterprises that use open data, see things differently. They are using these straplines to shake up the status quo, to demonstrate that data-driven businesses can do well by doing good.
And it’s the confluence of the many national and international open data initiatives, and the growing number of technically able, socially responsible organisations that provide the opportunity for social as well as economic growth. The World Wide Web Foundation now estimates that there are over 370 open data initiatives around the world. Collectively, and through portals such as Quandl and and datacatalogs.org, these initiatives have made a staggering quantity of data available – in excess of eight million data sets. In addition, several successful and data-rich companies are entering into a new spirit of philanthropy – by donating their data for the public good. There’s no doubt that opening up data signals a new willingness by governments and businesses all over the world to engage with their citizens and customers in a new and more transparent way.
The challenge, though, is ensuring that these popular national and international open data initiatives are cohesive and impactful. And that the plans drawn up by public sector bodies to release specific data sets are based on the potential the data has to achieve a beneficial outcome, not – or, at least, not solely – based on the cost or ease of publication. Despite the best of intentions, only a relatively small proportion of open data sets now available has the latent potential to create significant economic or social impact. In our push to open up data and government, it seems that we may have fallen into the trap of believing the ends are the same as the means; that effect is the same as cause…”

How Open Data Are Turned into Services?


New Paper by Muriel Foulonneau, Sébastien Martin, Slim Turki: “The Open Data movement has mainly been a data provision movement. The release of Open Data is usually motivated by (i) government transparency (citizen access to government data), (ii) the development of services by third parties for the benefit for citizens and companies (typically smart city approach), or (iii) the development of new services that stimulate the economy. The success of the Open Data movement and its return on investment should therefore be assessed among other criteria by the number and impact of the services created based on those data. In this paper, we study the development of services based on open data and means to make the data opening process more effective.”

Give the Data to the People


Harlan Krumholz in the New York Times: “LAST week, Johnson & Johnson announced that it was making all of its clinical trial data available to scientists around the world. It has hired my group, Yale University Open Data Access Project, or YODA, to fully oversee the release of the data. Everything in the company’s clinical research vaults, including unpublished raw data, will be available for independent review.

This is an extraordinary donation to society, and a reversal of the industry’s traditional tendency to treat data as an asset that would lose value if exposed to public scrutiny.

Today, more than half of the clinical trials in the United States, including many sponsored by academic and governmental institutions, are not published within two years of their completion. Often they are never published at all. The unreported results, not surprisingly, are often those in which a drug failed to perform better than a placebo. As a result, evidence-based medicine is, at best, based on only some of the evidence. One of the most troubling implications is that full information on a drug’s effects may never be discovered or released.

Even when studies are published, the actual data are usually not made available. End users of research — patients, doctors and policy makers — are implicitly told by a single group of researchers to “take our word for it.” They are often forced to accept the report without the prospect of other independent scientists’ reproducing the findings — a violation of a central tenet of the scientific method.

To be fair, the decision to share data is not easy. Companies worry that their competitors will benefit, that lawyers will take advantage, that incompetent scientists will misconstrue the data and come to mistaken conclusions. Researchers feel ownership of the data and may be reluctant to have others use it. So Johnson & Johnson, as well as companies like GlaxoSmithKline and Medtronic that have made more cautious moves toward transparency, deserve much credit. The more we share data, however, the more we find that many of these problems fail to materialize….

This program doesn’t mean that just anyone can gain access to the data without disclosing how they intend to use it. We require those who want the data to submit a proposal and identify their research team, funding and any conflicts of interest. They have to complete a short course on responsible conduct and sign an agreement that restricts them to their proposed research question. Most important, they must agree to share whatever they find. And we exclude applicants who seek data for commercial or legal purposes. Our intent is not to be tough gatekeepers, but to ensure that the data are used in a transparent way and contribute to overall scientific knowledge.

There are many benefits to this kind of sharing. It honors the contributions of the subjects and scientists who participated in the research. It is proof that an organization, whether it is part of industry or academia, wants to play a role as a good global citizen. It demonstrates that the organization has nothing to hide. And it enables scientists to use the data to learn new ways to help patients. Such an approach can even teach a company like Johnson & Johnson something it didn’t know about its own products.

For the good of society, this is a breakthrough that should be replicated throughout the research world.”

Why SayIt is (partly) a statement about the future of Open Data


Tom Steinberg from MySociety: “This is where SayIt comes in, as an example of a relatively low-cost approach to making sure that the next generation of government IT systems do produce Open Data.
SayIt is a newly launched open source tool for publishing transcripts of trials, debates, interviews and so on. It publishes them online in a way that matches modern expectations about how stuff should work on the web – responsive, searchable and so on. It’s being built as a Poplus Component, which means it’s part of an international network of groups collaborating on shared technologies. Here’s JK Rowling being interviewed, published via SayIt.
But how does this little tool relate to the business of getting governments to release more Open Data? Well, SayIt isn’t just about publishing data, it’s about making it too – in a few months we’ll be sharing an authoring interface for making new transcripts from whatever source a user has access to.
We hope that having iterated and improved this authoring interface, SayIt can become the tool of choice for public sector transcribers, replacing whatever tool they use today (almost certainly Word). Then, if they use SayIt to make a transcript, instead of Word, then it will produce new, instantly-online Open Data every time they use it….
But we can’t expect the public sector to use a tool like SayIt to make new Open Data unless it is cheaper, better and less burdensome than whatever they’re using now. We can’t – quite simply – expect to sell government procurement officers a new product mainly on the virtues of Open Data.  This means the tough task of persuading government employees that there is a new tool that is head-and-shoulders better than Excel or Word for certain purposes: formidable, familiar products that are much better than their critics like to let on.
So in order for SayIt to replace the current tools used by any current transcriber, it’s going to have to be really, really good. And really trustworthy. And it’s going to have to be well marketed. And that’s why we’ve chosen to build SayIt as an international, open source collaboration – as a Poplus Component. Because we think that without the billions of dollars it takes to compete with Microsoft, our best hope is to develop very narrow tools that do 0.01% of what Word does, but which do that one thing really really well. And our key strategic advantage, other than the trust that comes with Open Source and Open Standards, is the energy of the global civic hacking and government IT reform sector. SayIt is far more likely to succeed if it has ideas and inputs from contributors from around the world.

Regardless of whether or not SayIt ever succeeds in penetrating inside governments, this post is about an idea that such an approach represents. The idea is that people can advance the Open Data agenda not just by lobbying, but also by building and popularising tools that mean that data is born open in the first place. I hope this post will encourage more people to work on such tools, either on your own, or via collaborations like Poplus.”

Visual Insights: A Practical Guide to Making Sense of Data


New book by Katy Börner and David E. Polley: “In the age of Big Data, the tools of information visualization offer us a macroscope to help us make sense of the avalanche of data available on every subject. This book offers a gentle introduction to the design of insightful information visualizations. It is the only book on the subject that teaches nonprogrammers how to use open code and open data to design insightful visualizations. Readers will learn to apply advanced data mining and visualization techniques to make sense of temporal, geospatial, topical, and network data.

The book, developed for use in an information visualization MOOC, covers data analysis algorithms that enable extraction of patterns and trends in data, with chapters devoted to “when” (temporal data), “where” (geospatial data), “what” (topical data), and “with whom” (networks and trees); and to systems that drive research and development. Examples of projects undertaken for clients include an interactive visualization of the success of game player activity in World of Warcraft; a visualization of 311 number adoption that shows the diffusion of non-emergency calls in the United States; a return on investment study for two decades of HIV/AIDS research funding by NIAID; and a map showing the impact of the HiveNYC Learning Network.
Visual Insights will be an essential resource on basic information visualization techniques for scholars in many fields, students, designers, or anyone who works with data.”

Check out also the Information Visualization MOOC at http://ivmooc.cns.iu.edu/
 

It’s the Neoliberalism, Stupid: Why instrumentalist arguments for Open Access, Open Data, and Open Science are not enough.


Eric Kansa at LSE Blog: “…However, I’m increasingly convinced that advocating for openness in research (or government) isn’t nearly enough. There’s been too much of an instrumentalist justification for open data an open access. Many advocates talk about how it will cut costs and speed up research and innovation. They also argue that it will make research more “reproducible” and transparent so interpretations can be better vetted by the wider community. Advocates for openness, particularly in open government, also talk about the wonderful commercial opportunities that will come from freeing research…
These are all very big policy issues, but they need to be asked if the Open Movement really stands for reform and not just a further expansion and entrenchment of Neoliberalism. I’m using the term “Neoliberalism” because it resonates as a convenient label for describing how and why so many things seem to suck in Academia. Exploding student debt, vanishing job security, increasing compensation for top administrators, expanding bureaucracy and committee work, corporate management methodologies (Taylorism), and intensified competition for ever-shrinking public funding all fall under the general rubric of Neoliberalism. Neoliberal universities primarily serve the needs of commerce. They need to churn out technically skilled human resources (made desperate for any work by high loads of debt) and easily monetized technical advancements….
“Big Data,” “Data Science,” and “Open Data” are now hot topics at universities. Investments are flowing into dedicated centers and programs to establish institutional leadership in all things related to data. I welcome the new Data Science effort at UC Berkeley to explore how to make research data professionalism fit into the academic reward systems. That sounds great! But will these new data professionals have any real autonomy in shaping how they conduct their research and build their careers? Or will they simply be part of an expanding class of harried and contingent employees- hired and fired through the whims of creative destruction fueled by the latest corporate-academic hype-cycle?
Researchers, including #AltAcs and “data professionals”, need  a large measure of freedom. Miriam Posner’s discussion about the career and autonomy limits of Alt-academic-hood help highlight these issues. Unfortunately, there’s only one area where innovation and failure seem survivable, and that’s the world of the start-up. I’ve noticed how the “Entrepreneurial Spirit” gets celebrated lots in this space. I’m guilty of basking in it myself (10 years as a quasi-independent #altAc in a nonprofit I co-founded!).
But in the current Neoliberal setting, being an entrepreneur requires a singular focus on monetizing innovation. PeerJ and Figshare are nice, since they have business models that less “evil” than Elsevier’s. But we need to stop fooling ourselves that the only institutions and programs that we can and should sustain are the ones that can turn a profit. For every PeerJ or Figshare (and these are ultimately just as dependent on continued public financing of research as any grant-driven project), we also need more innovative organizations like the Internet Archive, wholly dedicated to the public good and not the relentless pressure to commoditize everything (especially their patrons’ privacy). We need to be much more critical about the kinds of programs, organizations, and financing strategies we (as a society) can support. I raised the political economy of sustainability issue at a recent ThatCamp and hope to see more discussion.
In reality so much of the Academy’s dysfunctions are driven by our new Gilded Age’s artificial scarcity of money. With wealth concentrated in so few hands, it is very hard to finance risk taking and entreprenurialism in the scholarly community, especially to finance any form of entrepreneurialism that does not turn a profit in a year or two.
Open Access and Open Data will make so much more of a difference if we had the same kind of dynamism in the academic and nonprofit sector as we have in the for-profit start-up sector. After all, Open Access and Open Data can be key enablers to allow much broader participation in research and education. However, broader participation still needs to be financed: you cannot eat an open access publication. We cannot gloss over this key issue.
We need more diverse institutional forms so that researchers can find (or found) the kinds of organizations that best channel their passions into contributions that enrich us all. We need more diverse sources of financing (new foundations, better financed Kickstarters) to connect innovative ideas with the capital needed to see them implemented. Such institutional reforms will make life in the research community much more livable, creative, and dynamic. It would give researchers more options for diverse and varied career trajectories (for-profit or not-for-profit) suited to their interests and contributions.
Making the case to reinvest in the public good will require a long, hard slog. It will be much harder than the campaign for Open Access and Open Data because it will mean contesting Neoliberal ideologies and constituencies that are deeply entrenched in our institutions. However, the constituencies harmed by Neoliberalism, particularly the student community now burdened by over $1 trillion in debt and the middle class more generally, are much larger and very much aware that something is badly amiss. As we celebrate the impressive strides made by the Open Movement in the past year, it’s time we broaden our goals to tackle the needs for wider reform in the financing and organization of research and education.
This post originally appeared on Digging Digitally and is reposted under a CC-BY license.”

Report “Big and open data in Europe: A growth engine or a missed opportunity?”


Press Release: “Big data and open data are not just trendy issues, they are the concern of the government institutions at the highest level. On January 29th, 2014 a Conference concerning Big & Open Data in Europe 2020 was held in the European Parliament.
Questions were asked and discussed like: Is Big & Open Data a truly transformative phenomena or just a ‘hot air’? Does it matter for Europe? How big is the economic potential of Big and Open Data for Europe till 2020? How each of the 28 Member States may benefit from it?…
The conference complemented a research project by demosEUROPA – Centre for European Strategy on Big and Open Data in Europe that aims at fostering and facilitating policy debate on the socioeconomic impact of data. The key outcome of the project, a pan-European macroeconomic study titledBig and open data In Europe: A growth engine or a missed opportunity?” carried out by the Warsaw Institute for Economic Studies (WISE) was presented.
We have the pleasure to be one of the first to present some of the findings of the report and offer the report for download.
The report analyses how technologies have the potential to influence various aspects of the European society, about their substantial, long term impact on our wealth and quality of life, but also about the new developmental challenges for the EU as a whole – as well as for its member states and their regions.
You will learn from the report:
–  the resulting economic gains of business applications of big data
– how to structure big data to move from Big Trouble to Big Value
– the costs and benefits of opening data to holders
– 3 challenges that  Europeans face with respect to big and open data
– key areas, growth opportunities and challenges for big and open data in Europe per particular regions.
The study also elaborates on the key principle of open data philosophy, which is open by default.
Europe by 2020. What will happen?
The report contains a prognosis for the 28 countries from the EU about the impact of big and open data from 2020 and its additional output and how it will affect trade, health, manufacturing, information and communication, finance & insurance and public administration in different regions. It foresees that the EU economy will grow by 1.9% by 2020 thanks to big and open data and describes the increase of the general GDP level by countries and sectors.
One of the many interesting findings of the report is that the positive impact of the data revolution will be felt more acutely in Northern Europe, while most of the New Member States and Southern European economies will benefit significantly less, with two notable exceptions being the Czech Republic and Poland. If you would like to have first-hand up-to-date information about the impact of big and open data on the future of Europe – download the report.”

Open Data and Clinical Trials


Editorial by Jeffrey M. Drazen, M.D at NEJM.org :”In the fall of 2013, the Institute of Medicine (IOM) convened a committee, on which I serve, to examine the sharing of data in the setting of clinical trials. The committee is charged with reviewing current practices on data sharing in the context of randomized, controlled trials and with making recommendations for future data-sharing standards. Over the past few months, the committee has prepared a draft report that reviews current practices on data sharing and lays out a number of potential data-sharing models. Full details regarding the committee’s charge and the interim report are available at www.iom.edu/activities/research/sharingclinicaltrialdata.aspx….
Open-data advocates argue that all the study data should be available to anyone at the time the first report is published or even earlier. Others argue that to maintain an incentive for researchers to pursue clinical investigations and to give those who gathered the data a chance to prepare and publish further reports, there should be a period of some specified length during which the data gatherers would have exclusive access to the information. Since these researchers could always agree to collaborate with others who were not involved in the study in order to use the data to help answer a scientific question, the period of exclusivity would really apply only to noncollaborative use of the data. That is, there would be a defined period during which the data would not be available to those who wanted to perform their own analyses and draw conclusions that could, for example, provide them with a scientific or commercial competitive advantage over the researchers who had originally gathered the data or allow them to derive conclusions that are potentially at odds with those drawn in the original publication.
As members of a community that either produces or uses data, what approach do you think serves our community best? There is no need to reply to the Journal, but please read the interim report and let the IOM know how you feel about this and the many other critical issues related to data sharing that are reviewed in the document. The IOM is collecting comments until March 24, 2014, at www8.nationalacademies.org/cp/projectview.aspx?key=49578.”