New Paper by Muriel Foulonneau, Sébastien Martin, Slim Turki: “The Open Data movement has mainly been a data provision movement. The release of Open Data is usually motivated by (i) government transparency (citizen access to government data), (ii) the development of services by third parties for the benefit for citizens and companies (typically smart city approach), or (iii) the development of new services that stimulate the economy. The success of the Open Data movement and its return on investment should therefore be assessed among other criteria by the number and impact of the services created based on those data. In this paper, we study the development of services based on open data and means to make the data opening process more effective.”
Give the Data to the People
Harlan Krumholz in the New York Times: “LAST week, Johnson & Johnson announced that it was making all of its clinical trial data available to scientists around the world. It has hired my group, Yale University Open Data Access Project, or YODA, to fully oversee the release of the data. Everything in the company’s clinical research vaults, including unpublished raw data, will be available for independent review.
This is an extraordinary donation to society, and a reversal of the industry’s traditional tendency to treat data as an asset that would lose value if exposed to public scrutiny.
Today, more than half of the clinical trials in the United States, including many sponsored by academic and governmental institutions, are not published within two years of their completion. Often they are never published at all. The unreported results, not surprisingly, are often those in which a drug failed to perform better than a placebo. As a result, evidence-based medicine is, at best, based on only some of the evidence. One of the most troubling implications is that full information on a drug’s effects may never be discovered or released.
Even when studies are published, the actual data are usually not made available. End users of research — patients, doctors and policy makers — are implicitly told by a single group of researchers to “take our word for it.” They are often forced to accept the report without the prospect of other independent scientists’ reproducing the findings — a violation of a central tenet of the scientific method.
To be fair, the decision to share data is not easy. Companies worry that their competitors will benefit, that lawyers will take advantage, that incompetent scientists will misconstrue the data and come to mistaken conclusions. Researchers feel ownership of the data and may be reluctant to have others use it. So Johnson & Johnson, as well as companies like GlaxoSmithKline and Medtronic that have made more cautious moves toward transparency, deserve much credit. The more we share data, however, the more we find that many of these problems fail to materialize….
This program doesn’t mean that just anyone can gain access to the data without disclosing how they intend to use it. We require those who want the data to submit a proposal and identify their research team, funding and any conflicts of interest. They have to complete a short course on responsible conduct and sign an agreement that restricts them to their proposed research question. Most important, they must agree to share whatever they find. And we exclude applicants who seek data for commercial or legal purposes. Our intent is not to be tough gatekeepers, but to ensure that the data are used in a transparent way and contribute to overall scientific knowledge.
There are many benefits to this kind of sharing. It honors the contributions of the subjects and scientists who participated in the research. It is proof that an organization, whether it is part of industry or academia, wants to play a role as a good global citizen. It demonstrates that the organization has nothing to hide. And it enables scientists to use the data to learn new ways to help patients. Such an approach can even teach a company like Johnson & Johnson something it didn’t know about its own products.
For the good of society, this is a breakthrough that should be replicated throughout the research world.”
Why SayIt is (partly) a statement about the future of Open Data
Tom Steinberg from MySociety: “This is where SayIt comes in, as an example of a relatively low-cost approach to making sure that the next generation of government IT systems do produce Open Data.
SayIt is a newly launched open source tool for publishing transcripts of trials, debates, interviews and so on. It publishes them online in a way that matches modern expectations about how stuff should work on the web – responsive, searchable and so on. It’s being built as a Poplus Component, which means it’s part of an international network of groups collaborating on shared technologies. Here’s JK Rowling being interviewed, published via SayIt.
But how does this little tool relate to the business of getting governments to release more Open Data? Well, SayIt isn’t just about publishing data, it’s about making it too – in a few months we’ll be sharing an authoring interface for making new transcripts from whatever source a user has access to.
We hope that having iterated and improved this authoring interface, SayIt can become the tool of choice for public sector transcribers, replacing whatever tool they use today (almost certainly Word). Then, if they use SayIt to make a transcript, instead of Word, then it will produce new, instantly-online Open Data every time they use it….
But we can’t expect the public sector to use a tool like SayIt to make new Open Data unless it is cheaper, better and less burdensome than whatever they’re using now. We can’t – quite simply – expect to sell government procurement officers a new product mainly on the virtues of Open Data. This means the tough task of persuading government employees that there is a new tool that is head-and-shoulders better than Excel or Word for certain purposes: formidable, familiar products that are much better than their critics like to let on.
So in order for SayIt to replace the current tools used by any current transcriber, it’s going to have to be really, really good. And really trustworthy. And it’s going to have to be well marketed. And that’s why we’ve chosen to build SayIt as an international, open source collaboration – as a Poplus Component. Because we think that without the billions of dollars it takes to compete with Microsoft, our best hope is to develop very narrow tools that do 0.01% of what Word does, but which do that one thing really really well. And our key strategic advantage, other than the trust that comes with Open Source and Open Standards, is the energy of the global civic hacking and government IT reform sector. SayIt is far more likely to succeed if it has ideas and inputs from contributors from around the world.
Regardless of whether or not SayIt ever succeeds in penetrating inside governments, this post is about an idea that such an approach represents. The idea is that people can advance the Open Data agenda not just by lobbying, but also by building and popularising tools that mean that data is born open in the first place. I hope this post will encourage more people to work on such tools, either on your own, or via collaborations like Poplus.”
Visual Insights: A Practical Guide to Making Sense of Data
New book by Katy Börner and David E. Polley: “In the age of Big Data, the tools of information visualization offer us a macroscope to help us make sense of the avalanche of data available on every subject. This book offers a gentle introduction to the design of insightful information visualizations. It is the only book on the subject that teaches nonprogrammers how to use open code and open data to design insightful visualizations. Readers will learn to apply advanced data mining and visualization techniques to make sense of temporal, geospatial, topical, and network data.
Visual Insights will be an essential resource on basic information visualization techniques for scholars in many fields, students, designers, or anyone who works with data.”
Check out also the Information Visualization MOOC at http://ivmooc.cns.iu.edu/
It’s the Neoliberalism, Stupid: Why instrumentalist arguments for Open Access, Open Data, and Open Science are not enough.
Eric Kansa at LSE Blog: “…However, I’m increasingly convinced that advocating for openness in research (or government) isn’t nearly enough. There’s been too much of an instrumentalist justification for open data an open access. Many advocates talk about how it will cut costs and speed up research and innovation. They also argue that it will make research more “reproducible” and transparent so interpretations can be better vetted by the wider community. Advocates for openness, particularly in open government, also talk about the wonderful commercial opportunities that will come from freeing research…
These are all very big policy issues, but they need to be asked if the Open Movement really stands for reform and not just a further expansion and entrenchment of Neoliberalism. I’m using the term “Neoliberalism” because it resonates as a convenient label for describing how and why so many things seem to suck in Academia. Exploding student debt, vanishing job security, increasing compensation for top administrators, expanding bureaucracy and committee work, corporate management methodologies (Taylorism), and intensified competition for ever-shrinking public funding all fall under the general rubric of Neoliberalism. Neoliberal universities primarily serve the needs of commerce. They need to churn out technically skilled human resources (made desperate for any work by high loads of debt) and easily monetized technical advancements….
“Big Data,” “Data Science,” and “Open Data” are now hot topics at universities. Investments are flowing into dedicated centers and programs to establish institutional leadership in all things related to data. I welcome the new Data Science effort at UC Berkeley to explore how to make research data professionalism fit into the academic reward systems. That sounds great! But will these new data professionals have any real autonomy in shaping how they conduct their research and build their careers? Or will they simply be part of an expanding class of harried and contingent employees- hired and fired through the whims of creative destruction fueled by the latest corporate-academic hype-cycle?
Researchers, including #AltAcs and “data professionals”, need a large measure of freedom. Miriam Posner’s discussion about the career and autonomy limits of Alt-academic-hood help highlight these issues. Unfortunately, there’s only one area where innovation and failure seem survivable, and that’s the world of the start-up. I’ve noticed how the “Entrepreneurial Spirit” gets celebrated lots in this space. I’m guilty of basking in it myself (10 years as a quasi-independent #altAc in a nonprofit I co-founded!).
But in the current Neoliberal setting, being an entrepreneur requires a singular focus on monetizing innovation. PeerJ and Figshare are nice, since they have business models that less “evil” than Elsevier’s. But we need to stop fooling ourselves that the only institutions and programs that we can and should sustain are the ones that can turn a profit. For every PeerJ or Figshare (and these are ultimately just as dependent on continued public financing of research as any grant-driven project), we also need more innovative organizations like the Internet Archive, wholly dedicated to the public good and not the relentless pressure to commoditize everything (especially their patrons’ privacy). We need to be much more critical about the kinds of programs, organizations, and financing strategies we (as a society) can support. I raised the political economy of sustainability issue at a recent ThatCamp and hope to see more discussion.
In reality so much of the Academy’s dysfunctions are driven by our new Gilded Age’s artificial scarcity of money. With wealth concentrated in so few hands, it is very hard to finance risk taking and entreprenurialism in the scholarly community, especially to finance any form of entrepreneurialism that does not turn a profit in a year or two.
Open Access and Open Data will make so much more of a difference if we had the same kind of dynamism in the academic and nonprofit sector as we have in the for-profit start-up sector. After all, Open Access and Open Data can be key enablers to allow much broader participation in research and education. However, broader participation still needs to be financed: you cannot eat an open access publication. We cannot gloss over this key issue.
We need more diverse institutional forms so that researchers can find (or found) the kinds of organizations that best channel their passions into contributions that enrich us all. We need more diverse sources of financing (new foundations, better financed Kickstarters) to connect innovative ideas with the capital needed to see them implemented. Such institutional reforms will make life in the research community much more livable, creative, and dynamic. It would give researchers more options for diverse and varied career trajectories (for-profit or not-for-profit) suited to their interests and contributions.
Making the case to reinvest in the public good will require a long, hard slog. It will be much harder than the campaign for Open Access and Open Data because it will mean contesting Neoliberal ideologies and constituencies that are deeply entrenched in our institutions. However, the constituencies harmed by Neoliberalism, particularly the student community now burdened by over $1 trillion in debt and the middle class more generally, are much larger and very much aware that something is badly amiss. As we celebrate the impressive strides made by the Open Movement in the past year, it’s time we broaden our goals to tackle the needs for wider reform in the financing and organization of research and education.
This post originally appeared on Digging Digitally and is reposted under a CC-BY license.”
Report “Big and open data in Europe: A growth engine or a missed opportunity?”
Press Release: “Big data and open data are not just trendy issues, they are the concern of the government institutions at the highest level. On January 29th, 2014 a Conference concerning Big & Open Data in Europe 2020 was held in the European Parliament.
Questions were asked and discussed like: Is Big & Open Data a truly transformative phenomena or just a ‘hot air’? Does it matter for Europe? How big is the economic potential of Big and Open Data for Europe till 2020? How each of the 28 Member States may benefit from it?…
The conference complemented a research project by demosEUROPA – Centre for European Strategy on Big and Open Data in Europe that aims at fostering and facilitating policy debate on the socioeconomic impact of data. The key outcome of the project, a pan-European macroeconomic study titled “Big and open data In Europe: A growth engine or a missed opportunity?” carried out by the Warsaw Institute for Economic Studies (WISE) was presented.
We have the pleasure to be one of the first to present some of the findings of the report and offer the report for download.
The report analyses how technologies have the potential to influence various aspects of the European society, about their substantial, long term impact on our wealth and quality of life, but also about the new developmental challenges for the EU as a whole – as well as for its member states and their regions.
You will learn from the report:
– the resulting economic gains of business applications of big data
– how to structure big data to move from Big Trouble to Big Value
– the costs and benefits of opening data to holders
– 3 challenges that Europeans face with respect to big and open data
– key areas, growth opportunities and challenges for big and open data in Europe per particular regions.
The study also elaborates on the key principle of open data philosophy, which is open by default.
Europe by 2020. What will happen?
The report contains a prognosis for the 28 countries from the EU about the impact of big and open data from 2020 and its additional output and how it will affect trade, health, manufacturing, information and communication, finance & insurance and public administration in different regions. It foresees that the EU economy will grow by 1.9% by 2020 thanks to big and open data and describes the increase of the general GDP level by countries and sectors.
One of the many interesting findings of the report is that the positive impact of the data revolution will be felt more acutely in Northern Europe, while most of the New Member States and Southern European economies will benefit significantly less, with two notable exceptions being the Czech Republic and Poland. If you would like to have first-hand up-to-date information about the impact of big and open data on the future of Europe – download the report.”
Open Data and Clinical Trials
Editorial by Jeffrey M. Drazen, M.D at NEJM.org :”In the fall of 2013, the Institute of Medicine (IOM) convened a committee, on which I serve, to examine the sharing of data in the setting of clinical trials. The committee is charged with reviewing current practices on data sharing in the context of randomized, controlled trials and with making recommendations for future data-sharing standards. Over the past few months, the committee has prepared a draft report that reviews current practices on data sharing and lays out a number of potential data-sharing models. Full details regarding the committee’s charge and the interim report are available at www.iom.edu/activities/research/sharingclinicaltrialdata.aspx….
Open-data advocates argue that all the study data should be available to anyone at the time the first report is published or even earlier. Others argue that to maintain an incentive for researchers to pursue clinical investigations and to give those who gathered the data a chance to prepare and publish further reports, there should be a period of some specified length during which the data gatherers would have exclusive access to the information. Since these researchers could always agree to collaborate with others who were not involved in the study in order to use the data to help answer a scientific question, the period of exclusivity would really apply only to noncollaborative use of the data. That is, there would be a defined period during which the data would not be available to those who wanted to perform their own analyses and draw conclusions that could, for example, provide them with a scientific or commercial competitive advantage over the researchers who had originally gathered the data or allow them to derive conclusions that are potentially at odds with those drawn in the original publication.
As members of a community that either produces or uses data, what approach do you think serves our community best? There is no need to reply to the Journal, but please read the interim report and let the IOM know how you feel about this and the many other critical issues related to data sharing that are reviewed in the document. The IOM is collecting comments until March 24, 2014, at www8.nationalacademies.org/cp/projectview.aspx?key=49578.”
How Government Can Make Open Data Work
Joel Gurin in Information Week: “At the GovLab at New York University, where I am senior adviser, we’re taking a different approach than McKinsey’s to understand the evolving value of government open data: We’re studying open data companies from the ground up. I’m now leading the GovLab’s Open Data 500 project, funded by the John S. and James L. Knight Foundation, to identify and examine 500 American companies that use government open data as a key business resource.
Our preliminary results show that government open data is fueling companies both large and small, across the country, and in many sectors of the economy, including health, finance, education, energy, and more. But it’s not always easy to use this resource. Companies that use government open data tell us it is often incomplete, inaccurate, or trapped in hard-to-use systems and formats.
It will take a thorough and extended effort to make government data truly useful. Based on what we are hearing and the research I did for my book, here are some of the most important steps the federal government can take, starting now, to make it easier for companies to add economic value to the government’s data.
1. Improve data quality
The Open Data Policy not only directs federal agencies to release more open data; it also requires them to release information about data quality. Agencies will have to begin improving the quality of their data simply to avoid public embarrassment. We can hope and expect that they will do some data cleanup themselves, demand better data from the businesses they regulate, or use creative solutions like turning to crowdsourcing for help, as USAID did to improve geospatial data on its grantees.
2. Keep improving open data resources
The government has steadily made Data.gov, the central repository of federal open data, more accessible and useful, including a significant relaunch last week. To the agency’s credit, the GSA, which administers Data.gov, plans to keep working to make this key website still better. As part of implementing the Open Data Policy, the administration has also set up Project Open Data on GitHub, the world’s largest community for open-source software. These resources will be helpful for anyone working with open data either inside or outside of government. They need to be maintained and continually improved.
3. Pass DATA
The Digital Accountability and Transparency Act would bring transparency to federal government spending at an unprecedented level of detail. The Act has strong bipartisan support. It passed the House with only one dissenting vote and was unanimously approved by a Senate committee, but still needs full Senate approval and the President’s signature to become law. DATA is also supported by technology companies who see it as a source of new open data they can use in their businesses. Congress should move forward and pass DATA as the logical next step in the work that the Obama administration’s Open Data Policy has begun.
4. Reform the Freedom of Information Act
Since it was passed in 1966, the federal Freedom of Information Act has gone through two major revisions, both of which strengthened citizens’ ability to access many kinds of government data. It’s time for another step forward. Current legislative proposals would establish a centralized web portal for all federal FOIA requests, strengthen the FOIA ombudsman’s office, and require agencies to post more high-interest information online before they receive formal requests for it. These changes could make more information from FOIA requests available as open data.
5. Engage stakeholders in a genuine way
Up to now, the government’s release of open data has largely been a one-way affair: Agencies publish datasets that they hope will be useful without consulting the organizations and companies that want to use it. Other countries, including the UK, France, and Mexico, are building in feedback loops from data users to government data providers, and the US should, too. The Open Data Policy calls for agencies to establish points of contact for public feedback. At the GovLab, we hope that the Open Data 500 will help move that process forward. Our research will provide a basis for new, productive dialogue between government agencies and the businesses that rely on them.
6. Keep using federal challenges to encourage innovation
The federal Challenge.gov website applies the best principles of crowdsourcing and collective intelligence. Agencies should use this approach extensively, and should pose challenges using the government’s open data resources to solve business, social, or scientific problems. Other approaches to citizen engagement, including federally sponsored hackathons and the White House Champions of Change program, can play a similar role.
Through the Open Data Policy and other initiatives, the Obama administration has set the right goals. Now it’s time to implement and move toward what US CTO Todd Park calls “data liberation.” Thousands of companies, organizations, and individuals will benefit.”
New Open Data Tool Helps Countries Compare Progress on Education
World Bank Group: “The World Bank Group today launched a new open data tool that provides in-depth, comparative, and easily accessible data on education policies around the world. The Systems Approach for Better Education Results (SABER) web tool helps countries collect and analyze information on their education policies, benchmark themselves against other countries, and prioritize areas for reform, with the goal of ensuring that all children and youth go to school and learn….
To date, the Bank Group, through SABER, has analyzed more than 100 countries to guide more effective reforms and investments in education at all levels, from pre-primary to tertiary education and workforce development.
Through SABER, the Bank Group aims to improve education quality by supplying policymakers, civil society, school administrators, teachers, parents, and students with more, and more meaningful, data about key education policy areas, including early childhood development, student assessment, teachers, school autonomy and accountability, and workforce development, among others.
SABER helps countries improve their education systems in three ways:
- Providing new data on policies and institutions. SABER collects comparable country data on education policies and institutions that are publicly available at: http://worldbank.org/education/saber, allowing governments, researchers, and other stakeholders to measure and monitor progress.
- Benchmarking education policies and institutions. Each policy area is rated on a four-point scale, from “Latent” to “Emerging” to “Established” and “Advanced.” These ratings highlight a country’s areas of strength and weakness while promoting cross-country learning.
- Highlighting key policy choices. SABER data collection and analysis produce an objective snapshot of how well a country’s education system is performing in relation to global good practice. This helps highlight the most important policy choices to spur learning.”
Opening up open data: An interview with Tim O’Reilly
McKinsey: “The tech entrepreneur, author, and investor looks at how open data is becoming a critical tool for business and government, as well as what needs to be done for it to be more effective.
…
We’re increasingly living in a world of black boxes. We don’t understand the way things work. And open-source software, open data are critical tools. We see this in the field of computer security. People say, “Well, we have to keep this secret.” Well, it turns out that the strongest security protocols are those that are secure even when people know how they work.
…
It seems to me that almost every great advance is a platform advance. When we have common standards, so much more happens.
And you think about the standardization of railroad gauges, the standardization of communications, protocols. Think about the standardization of roads, how fundamental those are to our society. And that’s actually kind of a bridge for my work on open government, because I’ve been thinking a lot about the notion of government as a platform.
…
We should define a little bit what we mean by “open,” because there’s open as in it’s open source. Anybody can take it and reuse it in whatever way they want. And I’m not sure that’s always necessary. There’s a pragmatic open and there’s an ideological open. And the pragmatic open is that it’s available. It’s available in a timely way, in a nonpreferential way, so that some people don’t get better access than others.
And if you look at so many of our apps now on the web, because they are ad-supported and free, we get a lot of the benefits of open. When the cost is low enough, it does in fact create many of the same conditions as a commons. That being said, that requires great restraint, as I said earlier, on the part of companies, because it becomes easy for them to say, “Well, actually we just need to take a little bit more of the value for ourselves. And oh, we just need a bit more of that.” And before long, it really isn’t open at all.
…
Eric Ries, of Lean Startupfame, talks about a start-up as a machine for learning under conditions of extreme uncertainty.
He said it doesn’t have to do with being a small company, being anything new. He says it’s just whenever you’re trying to do something new, where you don’t know the answers, you have to experiment. You have to have a mechanism for measuring. You have to have mechanisms for changing what you do based on the response to that measurement…
That’s one of the biggest problems, I think, in our government today, that we put out programs. Somebody has a theory about what’s going to work and what the benefit will be. We don’t measure it. We don’t actually see if it did what we thought it was going to do. And we keep doing it. And then it doesn’t work, so we do something else. And then we layer on program after program that doesn’t actually meet its objectives. And if we actually brought in the mind-set that said, “No, actually we’re going to figure out if we actually accomplish what we set out to accomplish; and if we don’t, we’re going to change it,” that would be huge.”