How Open Is University Data?


Daniel Castro  at GovTech: “Many states now support open data, or data that’s made freely available without restriction in a nonproprietary, machine-readable format, to increase government transparency, improve public accountability and participation, and unlock opportunities for civic innovation. To date, 10 states have adopted open data policies, via executive order or legislation, and 24 states have built open data portals. But while many agencies have joined the open data movement, state colleges and universities have largely ignored this opportunity. To remedy this, policymakers should consider how to extend open data policies to state colleges and universities.

There are many potential benefits of open data for higher education. First, it can help prospective students and their parents better understand the value of different degree programs. One way to control rising higher ed costs is to create more informed consumers. The feds are already pushing for such changes. President Obama and Education Secretary Arne Duncan called for schools to make more information publicly available about the costs of obtaining a college degree, and the White House launched the College Scorecard, an online tool to compare data about the average tuition cost, size of loan payments and loan default rate for different schools.

But students deserve more detailed information. Prospective students should be able to decide where to attend and what to study based on historical data like program costs, percentage of students completing the program and how long they take to do so, and what kind of earning power they have after graduating.

Second, open data can aid better fiscal oversight and accountability of university operations. In 2014, states provided about $76 billion in support for higher ed, yet few colleges and universities have adopted open data policies to increase the transparency of their budgets. Contrast this with California cities like Oakland, Palo Alto and Los Angeles, which created online tools to let others explore and visualize their budgets. Additional oversight, including from the public, could help reduce fraud, waste and abuse in higher education, save taxpayers money and create more opportunities for public participation in state budgeting.

Third, open data can be a valuable resource for producing innovations that make universities a better place to work and study. Large campuses are basically small cities, and many cities have found open data useful for improving public safety and optimizing transportation services. Universities hold much untapped data: course catalogs, syllabi, bus schedules, campus menus, campus directories, faculty evaluations, etc. Creating portals to release these data sets and building application programming interfaces to access this information would give developers direct access to data that students, faculty, alumni and other stakeholders could use to build apps and services to improve the college experience….(More)”

Philadelphia’s Newly Upgraded Open Data Portal


Michael Grass at Government Executive: “If you’re looking for streets where vending is prohibited in the city of Philadelphia, the city’s newly upgraded open data portal has that information. If you’re looking for information on reported bicycle thefts, the city’s open data portal has that information, too. Same goes for the city’s budget.

Philadelphia’s recently relaunched open data portal, Open Data Philly, has 264 data sets, applications and APIs available for the public to access and use. Much of that information comes from municipal sources.

“The redesign of OpenDataPhilly will increase access to available data, thereby enabling our citizens to become more engaged and knowledgeable and our government more accountable,” Mayor Michael Nutter said in a statement last month.

But Philadelphia’s open data portal isn’t just designed to unlock datasets at City Hall.

The city’s universities, cultural and non-profit organizations and commercial entities are part of the portal as well. Portal users interested in historic maps of the city can access the Philadelphia GeoHistory Network, a project of Philadelphia’s Athenaeum Museum, which maintains a tool where layers of historic maps can overlaid on an interactive Google map.

You can even find a list of current happy hour specials, courtesy of DrinkPhilly….(More)”

“Data on the Web” Best Practices


W3C First Public Working Draft: “…The best practices described below have been developed to encourage and enable the continued expansion of the Web as a medium for the exchange of data. The growth of open data by governments across the world [OKFN-INDEX], the increasing publication of research data encouraged by organizations like the Research Data Alliance [RDA], the harvesting and analysis of social media, crowd-sourcing of information, the provision of important cultural heritage collections such as at the Bibliothèque nationale de France [BNF] and the sustained growth in the Linked Open Data Cloud [LODC], provide some examples of this phenomenon.

In broad terms, data publishers aim to share data either openly or with controlled access. Data consumers (who may also be producers themselves) want to be able to find and use data, especially if it is accurate, regularly updated and guaranteed to be available at all times. This creates a fundamental need for a common understanding between data publishers and data consumers. Without this agreement, data publishers’ efforts may be incompatible with data consumers’ desires.

Publishing data on the Web creates new challenges, such as how to represent, describe and make data available in a way that it will be easy to find and to understand. In this context, it becomes crucial to provide guidance to publishers that will improve consistency in the way data is managed, thus promoting the re-use of data and also to foster trust in the data among developers, whatever technology they choose to use, increasing the potential for genuine innovation.

This document sets out a series of best practices that will help publishers and consumers face the new challenges and opportunities posed by data on the Web.

Best practices cover different aspects related to data publishing and consumption, like data formats, data access, data identification and metadata. In order to delimit the scope and elicit the required features for Data on the Web Best Practices, the DWBP working group compiled a set of use cases [UCR] that represent scenarios of how data is commonly published on the Web and how it is used. The set of requirements derived from these use cases were used to guide the development of the best practice.

The Best Practices proposed in this document are intended to serve a more general purpose than the practices suggested in Best Practices for Publishing Linked Data [LD-BP] since it is domain-independent and whilst it recommends the use of Linked Data, it also promotes best practices for data on the web in formats such as CSV and JSON. The Best Practices related to the use of vocabularies incorporate practices that stem from Best Practices for Publishing Linked Data where appropriate….(More)

CrowdFlower Launches Open Data Project


Anthony Ha at Techcrunch: “Crowdsourcing company CrowdFlower allows businesses to tap into a distributed workforce of 5 million contributors for basic tasks like sentiment analysis. Today it’s releasing some of that data to the public through its new Data for Everyone initiative…. hope is to turn CrowdFlower into a central repository where open data can be found by researchers and entrepreneurs. (Factual was another startup trying to become a hub for open data, though in recent years, it’s become more focused on gathering location data to power mobile ads.)…

As for the data that’s available now, …There’s a lot of Twitter sentiment analysis covering things like from attitudes towards brands and products, yogurt (?), and climate change. Among the more recent data sets, I was particularly taken in the gender breakdown of who’s been on the cover of Time magazine and, yes, the analysis of who thought the dress (you know the one) was gold and white versus blue and black…. (More)”

If Data Sharing is the Answer, What is the Question?


Christine L. Borgman at ERCIM News: “Data sharing has become policy enforced by governments, funding agencies, journals, and other stakeholders. Arguments in favor include leveraging investments in research, reducing the need to collect new data, addressing new research questions by reusing or combining extant data, and reproducing research, which would lead to greater accountability, transparency, and less fraud. Arguments against data sharing rarely are expressed in public fora, so popular is the idea. Much of the scholarship on data practices attempts to understand the socio-technical barriers to sharing, with goals to design infrastructures, policies, and cultural interventions that will overcome these barriers.
However, data sharing and reuse are common practice in only a few fields. Astronomy and genomics in the sciences, survey research in the social sciences, and archaeology in the humanities are the typical exemplars, which remain the exceptions rather than the rule. The lack of success of data sharing policies, despite accelerating enforcement over the last decade, indicates the need not just for a much deeper understanding of the roles of data in contemporary science but also for developing new models of scientific practice. Science progressed for centuries without data sharing policies. Why is data sharing deemed so important to scientific progress now? How might scientific practice be different if these policies were in place several generations ago?
Enthusiasm for “big data” and for data sharing are obscuring the complexity of data in scholarship and the challenges for stewardship. Data practices are local, varying from field to field, individual to individual, and country to country. Studying data is a means to observe how rapidly the landscape of scholarly work in the sciences, social sciences, and the humanities is changing. Inside the black box of data is a plethora of research, technology, and policy issues. Data are best understood as representations of observations, objects, or other entities used as evidence of phenomena for the purposes of research or scholarship. Rarely do they stand alone, separable from software, protocols, lab and field conditions, and other context. The lack of agreement on what constitutes data underlies the difficulties in sharing, releasing, or reusing research data.
Concerns for data sharing and open access raise broader questions about what data to keep, what to share, when, how, and with whom. Open data is sometimes viewed simply as releasing data without payment of fees. In research contexts, open data may pose complex issues of licensing, ownership, responsibility, standards, interoperability, and legal harmonization. To scholars, data can be assets, liabilities, or both. Data have utilitarian value as evidence, but they also serve social and symbolic purposes for control, barter, credit, and prestige. Incentives for scientific advancement often run counter to those for sharing data.
….
Rather than assume that data sharing is almost always a “good thing” and that doing so will promote the progress of science, more critical questions should be asked: What are the data? What is the utility of sharing or releasing data, and to whom? Who invests the resources in releasing those data and in making them useful to others? When, how, why, and how often are those data reused? Who benefits from what kinds of data transfer, when, and how? What resources must potential re-users invest in discovering, interpreting, processing, and analyzing data to make them reusable? Which data are most important to release, when, by what criteria, to whom, and why? What investments must be made in knowledge infrastructures, including people, institutions, technologies, and repositories, to sustain access to data that are released? Who will make those investments, and for whose benefit?
Only when these questions are addressed by scientists, scholars, data professionals, librarians, archivists, funding agencies, repositories, publishers, policy makers, and other stakeholders in research will satisfactory answers arise to the problems of data sharing…(More)”.

Breaking Public Administrations’ Data Silos. The Case of Open-DAI, and a Comparison between Open Data Platforms.


Paper by Raimondo Iemma, Federico Morando, and Michele Osella: “An open reuse of public data and tools can turn the government into a powerful ‘platform’ also involving external innovators. However, the typical information system of a public agency is not open by design. Several public administrations have started adopting technical solutions to overcome this issue, typically in the form of middleware layers operating as ‘buses’ between data centres and the outside world. Open-DAI is an open source platform designed to expose data as services, directly pulling from legacy databases of the data holder. The platform is the result of an ongoing project funded under the EU ICT PSP call 2011. We present the rationale and features of Open-DAI, also through a comparison with three other open data platforms: the Socrata Open Data portal, CKAN, and ENGAGE….(More)”

Open data could turn Europe’s digital desert into a digital rainforest


Joanna Roberts interviews Dirk Helbing, Professor of Computational Social Science at ETH Zurich at Horizon: “…If we want to be competitive, Europe needs to find its own way. How can we differentiate ourselves and make things better? I believe Europe should not engage in the locked data strategy that we see in all these huge IT giants. Instead, Europe should engage in open data, open innovation, and value-sensitive design, particularly approaches that support informational self-determination. So everyone can use this data, generate new kinds of data, and build applications on top. This is going to create ever more possibilities for everyone else, so in a sense that will turn a digital desert into a digital rainforest full of opportunities for everyone, with a rich information ecosystem.’…
The Internet of Things is the next big emerging information communication technology. It’s based on sensors. In smartphones there are about 15 sensors; for light, for noise, for location, for all sorts of things. You could also buy additional external sensors for humidity, for chemical substances and almost anything that comes to your mind. So basically this allows us to measure the environment and all the features of our physical, biological, economic, social and technological environment.
‘Imagine if there was one company in the world controlling all the sensors and collecting all the information. I think that might potentially be a dystopian surveillance nightmare, because you couldn’t take a single step or speak a single word without it being recorded. Therefore, if we want the Internet of Things to be consistent with a stable democracy then I believe we need to run it as a citizen web, which means to create and manage the planetary nervous system together. The citizens themselves would buy the sensors and activate them or not, would decide themselves what sensor data they would share with whom and for what purpose, so informational self-determination would be at the heart, and everyone would be in control of their own data.’….
A lot of exciting things will become possible. We would have a real-time picture of the world and we could use this data to be more aware of what the implications of our decisions and actions are. We could avoid mistakes and discover opportunities we would otherwise have missed. We will also be able to measure what’s going on in our society and economy and why. In this way, we will eventually identify the hidden forces that determine the success or failure of a company, of our economy or even our society….(More)”

Making emotive games from open data


Katie Collins at WIRED: “Microsoft researcher Kati London’s aim is “to try to get people to think of data in terms of personalities, relationships and emotions”, she tells the audience at the Story Festival in London. Through Project Sentient Data, she uses her background in games development to create fun but meaningful experiences that bridge online interactions and things that are happening in the real world.
One such experience invited children to play against the real-time flow of London traffic through an online game called the Code of Everand. The aim was to test the road safety knowledge of 9-11 year olds and “make alertness something that kids valued”.
The core mechanic of the game was that of a normal world populated by little people, containing spirit channels that only kids could see and go through. Within these spirit channels, everything from lorries and cars from the streets became monsters. The children had to assess what kind of dangers the monsters posed and use their tools to dispel them.
“Games are great ways to blur and observe the ways people interact with real-world data,” says London.
In one of her earlier projects back in 2005, London used her knowledge of horticulture to bring artificial intelligence to plants. “Almost every workspace I go into has a half dead plant in it, so we gave plants the ability to tell us what they need.” It was, she says, an exercise in “humanising data” that led to further projects that saw her create self aware street signs and a dynamic city map that expressed shame neighbourhood by neighbourhood depending on the open dataset of public complaints in New York.
A further project turned complaint data into cartoons on Instagram every week. London praised the open data initiative in New York, but added that for people to access it, they had to know it existed and know where to find it. The cartoons were a “lightweight” form of “civic engagement” that helped to integrate hyperlocal issues into everyday conversation.
London also gamified community engagement through a project commissioned by the Knight Foundation called Macon Money….(More)”.

Data for good


NESTA: “This report explores how capturing, sharing and analysing data in new ways can transform how charities work and how social action happens.

Key Findings

  • Citizens Advice (CAB) and Data Kind partnered to develop the Civic Dashboard. A tool which mines data from CAB consultations to understand emerging social issues in the UK.
  • Shooting Star Chase volunteers streamlined the referral paths of how children come to be at the hospices saving up to £90,000 for children’s hospices around the country by refining the referral system.
  • In a study of open grant funding data, NCVO identified 33,000 ‘below the radar organisations’ not currently registered in registers and databases on the third sector
  • In their social media analysis of tweets related to the Somerset Floods, Demos found that 39,000 tweets were related to social action

New ways of capturing, sharing and analysing data have the potential to transform how community and voluntary sector organisations work and how social action happens. However, while analysing and using data is core to how some of the world’s fastest growing businesses understand their customers and develop new products and services, civil society organisations are still some way off from making the most of this potential.
Over the last 12 months Nesta has grant funded a number of research projects that explore two dimensions of how big and open data can be used for the common good. Firstly, how it can be used by charities to develop better products and services and secondly, how it can help those interested in civil society better understand social action and civil society activity.

  • Citizens Advice Bureau (CAB) and Datakind, a global community of data scientists interested in how data can be used for a social purpose, were grant funded to explore how a datadriven approach to mining the rich data that CAB holds on social issues in the UK could be used to develop a real–time dashboard to identify emerging social issues. The project also explored how data–driven methods could better help other charities such as St Mungo’s and Buttle UK, and how data could be shared more effectively between charities as part of this process, to create collaborative data–driven projects.
  • Five organisations (The RSA, Cardiff University, The Demos Centre for Analysis of Social Media, NCVO and European Alternatives) were grant funded to explore how data–driven methods, such as open data analysis and social media analysis, can help us understand informal social action, often referred to as ‘below the radar activity’ in new ways.

This paper is not the definitive story of the opportunities in using big and open data for the common good, but it can hopefully provide insight on what can be done and lessons for others interested in exploring the opportunities in these methods….(More).”

Unleashing the Power of Data to Serve the American People


Memorandum: Unleashing the Power of Data to Serve the American People
To: The American People
From: Dr. DJ Patil, Deputy U.S. CTO for Data Policy and Chief Data Scientist

….While there is a rich history of companies using data to their competitive advantage, the disproportionate beneficiaries of big data and data science have been Internet technologies like social media, search, and e-commerce. Yet transformative uses of data in other spheres are just around the corner. Precision medicine and other forms of smarter health care delivery, individualized education, and the “Internet of Things” (which refers to devices like cars or thermostats communicating with each other using embedded sensors linked through wired and wireless networks) are just a few of the ways in which innovative data science applications will transform our future.

The Obama administration has embraced the use of data to improve the operation of the U.S. government and the interactions that people have with it. On May 9, 2013, President Obama signed Executive Order 13642, which made open and machine-readable data the new default for government information. Over the past few years, the Administration has launched a number of Open Data Initiatives aimed at scaling up open data efforts across the government, helping make troves of valuable data — data that taxpayers have already paid for — easily accessible to anyone. In fact, I used data made available by the National Oceanic and Atmospheric Administration to improve numerical methods of weather forecasting as part of my doctoral work. So I know firsthand just how valuable this data can be — it helped get me through school!

Given the substantial benefits that responsibly and creatively deployed data can provide to us and our nation, it is essential that we work together to push the frontiers of data science. Given the importance this Administration has placed on data, along with the momentum that has been created, now is a unique time to establish a legacy of data supporting the public good. That is why, after a long time in the private sector, I am returning to the federal government as the Deputy Chief Technology Officer for Data Policy and Chief Data Scientist.

Organizations are increasingly realizing that in order to maximize their benefit from data, they require dedicated leadership with the relevant skills. Many corporations, local governments, federal agencies, and others have already created such a role, which is usually called the Chief Data Officer (CDO) or the Chief Data Scientist (CDS). The role of an organization’s CDO or CDS is to help their organization acquire, process, and leverage data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

The Role of the First-Ever U.S. Chief Data Scientist

Similarly, my role as the U.S. CDS will be to responsibly source, process, and leverage data in a timely fashion to enable transparency, provide security, and foster innovation for the benefit of the American public, in order to maximize the nation’s return on its investment in data.

So what specifically am I here to do? As I start, I plan to focus on these four activities:

…(More)”