Open data is shaking up civic life in eastern Europe


 in the Financial Times: “I often imagine how different the world would look if citizens and social activists were able to fully understand and use data, and new technologies. Unfortunately, the entry point to this world is often inaccessible for most civil society groups…

The concept of open data has revolutionised thinking about citizens’ participation in civic life. Since the fall of communism, citizens across central and eastern Europe have been fighting for more transparent and responsive governments, and to improve collaboration between civil society and the public sector. When an institution makes its data public, it is a sign that it is committed to being transparent and accountable. A few cities have opened up data about budget spending, for example, but these remain the exception rather than the rule. Open data provides citizens with a tool to directly engage in civic life. For example, they can analyse public expenses to check how their taxes are used, track their MP’s votes or monitor the legislative process….

One of the successful projects in Ukraine is the Open School app, which provides reviews and ratings of secondary schools based on indicators such as the number of pupils who go on to university, school subject specialisations and accessibility. It allows students and parents to make informed decisions about their educational path… Another example comes from the Serbian city of Pancevo, where a maths teacher and a tax inspector have worked together to help people navigate the tax system. The idea is simple: the more people know about taxes, the less likely they are to unconsciously violate the law. Open Taxes is a free, web-based, interactive guide to key national and local taxes…(More)”

Better Data for Better Policy: Accessing New Data Sources for Statistics Through Data Collaboratives


Medium Blog by Stefaan Verhulst: “We live in an increasingly quantified world, one where data is driving key business decisions. Data is claimed to be the new competitive advantage. Yet, paradoxically, even as our reliance on data increases and the call for agile, data-driven policy making becomes more pronounced, many Statistical Offices are confronted with shrinking budgets and an increased demand to adjust their practices to a data age. If Statistical Offices fail to find new ways to deliver “evidence of tomorrow”, by leveraging new data sources, this could mean that public policy may be formed without access to the full range of available and relevant intelligence — as most business leaders have. At worst, a thinning evidence base and lack of rigorous data foundation could lead to errors and more “fake news,” with possibly harmful public policy implications.

While my talk was focused on the key ways data can inform and ultimately transform the full policy cycle (see full presentation here), a key premise I examined was the need to access, utilize and find insight in the vast reams of data and data expertise that exist in private hands through the creation of new kinds of public and private partnerships or “data collaboratives” to establish more agile and data-driven policy making.

Screen Shot 2017-10-20 at 5.18.23 AM

Applied to statistics, such approaches have already shown promise in a number of settings and countries. Eurostat itself has, for instance, experimented together with Statistics Belgium, with leveraging call detail records provided by Proximus to document population density. Statistics Netherlands (CBS) recently launched a Center for Big Data Statistics (CBDS)in partnership with companies like Dell-EMC and Microsoft. Other National Statistics Offices (NSOs) are considering using scanner data for monitoring consumer prices (Austria); leveraging smart meter data (Canada); or using telecom data for complementing transportation statistics (Belgium). We are now living undeniably in an era of data. Much of this data is held by private corporations. The key task is thus to find a way of utilizing this data for the greater public good.

Value Proposition — and Challenges

There are several reasons to believe that public policy making and official statistics could indeed benefit from access to privately collected and held data. Among the value propositions:

  • Using private data can increase the scope and breadth and thus insights offered by available evidence for policymakers;
  • Using private data can increase the quality and credibility of existing data sets (for instance, by complementing or validating them);
  • Private data can increase the timeliness and thus relevance of often-outdated information held by statistical agencies (social media streams, for example, can provide real-time insights into public behavior); and
  • Private data can lower costs and increase other efficiencies (for example, through more sophisticated analytical methods) for statistical organizations….(More)”.

Civil, the blockchain-based journalism marketplace, is building its first batch of publications


 “Civil is an idea that has started to turn heads, both among investors and within journalism circles. .. It’s also attracted its first publication, Popula, an alternative news and politics site run by journalist… Maria Bustillos, and joined by Sasha Frere-Jones, Ryan Bradley, Aaron Bady, and other big-name reporters. The site goes live in January.

Built on top of blockchain (the same technology that underpins bitcoin), Civil promises to use the technology to build decentralized marketplaces for readers and journalists to work together to fund coverage of topics that interest them, or for those in the public interest. Readers will support reporters using “CVL” tokens, Civil’s cryptocurrency, giving them a speculative stake in the currency that will — hopefully — increase in value as more people buy in over time. This, Civil, hopes will encourage more people to invest in the marketplaces, creating a self-sustaining system that will help fund more reporting…

Civil is also pitching its technology as a way for journalists around the world to fight censorship. Because all changes made to the blockchain are public, it’s impossible for third parties — governmental or otherwise — to change or alter the information on it without notice. “No one will be able to remove the record or prevent her organization from publishing what it wants to publish,” said Matthew Iles, Civil’s founder, referring to Popula. He also pointed to the other key parts of Civil’s value proposition, which include direct connections with readers, ease of access to financing, and a reduced reliance on third-party tech companies….(More)”.

Linux Foundation Debuts Community Data License Agreement


Press Release: “The Linux Foundation, the nonprofit advancing professional open source management for mass collaboration, today announced the Community Data License Agreement(CDLA) family of open data agreements. In an era of expansive and often underused data, the CDLA licenses are an effort to define a licensing framework to support collaborative communities built around curating and sharing “open” data.

Inspired by the collaborative software development models of open source software, the CDLA licenses are designed to enable individuals and organizations of all types to share data as easily as they currently share open source software code. Soundly drafted licensing models can help people form communities to assemble, curate and maintain vast amounts of data, measured in petabytes and exabytes, to bring new value to communities of all types, to build new business opportunities and to power new applications that promise to enhance safety and services.

The growth of big data analytics, machine learning and artificial intelligence (AI) technologies has allowed people to extract unprecedented levels of insight from data. Now the challenge is to assemble the critical mass of data for those tools to analyze. The CDLA licenses are designed to help governments, academic institutions, businesses and other organizations open up and share data, with the goal of creating communities that curate and share data openly.

For instance, if automakers, suppliers and civil infrastructure services can share data, they may be able to improve safety, decrease energy consumption and improve predictive maintenance. Self-driving cars are heavily dependent on AI systems for navigation, and need massive volumes of data to function properly. Once on the road, they can generate nearly a gigabyte of data every second. For the average car, that means two petabytes of sensor, audio, video and other data each year.

Similarly, climate modeling can integrate measurements captured by government agencies with simulation data from other organizations and then use machine learning systems to look for patterns in the information. It’s estimated that a single model can yield a petabyte of data, a volume that challenges standard computer algorithms, but is useful for machine learning systems. This knowledge may help improve agriculture or aid in studying extreme weather patterns.

And if government agencies share aggregated data on building permits, school enrollment figures, sewer and water usage, their citizens benefit from the ability of commercial entities to anticipate their future needs and respond with infrastructure and facilities that arrive in anticipation of citizens’ demands.

“An open data license is essential for the frictionless sharing of the data that powers both critical technologies and societal benefits,” said Jim Zemlin, Executive Director of The Linux Foundation. “The success of open source software provides a powerful example of what can be accomplished when people come together around a resource and advance it for the common good. The CDLA licenses are a key step in that direction and will encourage the continued growth of applications and infrastructure.”…(More)”.

The role of policy entrepreneurs in open government data policy innovation diffusion: An analysis of Australian Federal and State Governments


Paper by Akemi TakeokaChatfield and Christopher G.Reddick: “Open government data (OGD) policy differs substantially from the existing Freedom of Information policies. Consequently OGD can be viewed as a policy innovation. Drawing on both innovation diffusion theory and its application to public policy innovation research, we examine Australia’s OGD policy diffusion patterns at both the federal and state government levels based on the policy adoption timing and CKAN portal “Organization” and “Category” statistics. We found that state governments that had adopted OGD policies earlier had active policy entrepreneurs (or lead departments/agencies) responsible for the policy innovation diffusion across the different government departments. We also found that their efficacy ranking was relatively high in terms of OGD portal openness when openness is measured by the greater number of datasets proactively and systematically published through their OGD portals. These findings have important implications for the role played by OGD policy entrepreneurs in openly sharing the government-owned datasets with the public….(More)”.

Enabling Blockchain Innovation in the U.S. Federal Government


Primer by the American Council for Technology – Industry Advisory Council: “… intended to be a foundational tool in the understanding of blockchain and its use cases within the United States federal government. To that end, it should help allay the concerns that some may have about this new technology by providing an introduction to blockchain and its related technologies, and how blockchain can be safely and securely applied to the right government use cases. Blockchain has the potential to help government to reduce fraud, errors and the cost of paper-intensive processes, while enabling collaboration across multiple divisions and agencies to provide more efficient and effective services to citizens. Moreover, the adoption of blockchain may also allow governmental agencies to provide new value-added services to businesses and others which can generate new sources of revenue for these agencies….(More)”.

Our laws don’t do enough to protect our health data


 at the Conversation: “A particularly sensitive type of big data is medical big data. Medical big data can consist of electronic health records, insurance claims, information entered by patients into websites such as PatientsLikeMeand more. Health information can even be gleaned from web searches, Facebook and your recent purchases.

Such data can be used for beneficial purposes by medical researchers, public health authorities, and healthcare administrators. For example, they can use it to study medical treatments, combat epidemics and reduce costs. But others who can obtain medical big data may have more selfish agendas.

I am a professor of law and bioethics who has researched big data extensively. Last year, I published a book entitled Electronic Health Records and Medical Big Data: Law and Policy.

I have become increasingly concerned about how medical big data might be used and who could use it. Our laws currently don’t do enough to prevent harm associated with big data.

What your data says about you

Personal health information could be of interest to many, including employers, financial institutions, marketers and educational institutions. Such entities may wish to exploit it for decision-making purposes.

For example, employers presumably prefer healthy employees who are productive, take few sick days and have low medical costs. However, there are laws that prohibit employers from discriminating against workers because of their health conditions. These laws are the Americans with Disabilities Act (ADA) and the Genetic Information Nondiscrimination Act. So, employers are not permitted to reject qualified applicants simply because they have diabetes, depression or a genetic abnormality.

However, the same is not true for most predictive information regarding possible future ailments. Nothing prevents employers from rejecting or firing healthy workers out of the concern that they will later develop an impairment or disability, unless that concern is based on genetic information.

What non-genetic data can provide evidence regarding future health problems? Smoking status, eating preferences, exercise habits, weight and exposure to toxins are all informative. Scientists believe that biomarkers in your blood and other health details can predict cognitive decline, depression and diabetes.

Even bicycle purchases, credit scores and voting in midterm elections can be indicators of your health status.

Gathering data

How might employers obtain predictive data? An easy source is social media, where many individuals publicly post very private information. Through social media, your employer might learn that you smoke, hate to exercise or have high cholesterol.

Another potential source is wellness programs. These programs seek to improve workers’ health through incentives to exercise, stop smoking, manage diabetes, obtain health screenings and so on. While many wellness programs are run by third party vendors that promise confidentiality, that is not always the case.

In addition, employers may be able to purchase information from data brokers that collect, compile and sell personal information. Data brokers mine sources such as social media, personal websites, U.S. Census records, state hospital records, retailers’ purchasing records, real property records, insurance claims and more. Two well-known data brokers are Spokeo and Acxiom.

Some of the data employers can obtain identify individuals by name. But even information that does not provide obvious identifying details can be valuable. Wellness program vendors, for example, might provide employers with summary data about their workforce but strip away particulars such as names and birthdates. Nevertheless, de-identified information can sometimes be re-identified by experts. Data miners can match information to data that is publicly available….(More)”.

Reboot for the AI revolution


Yuval Noah Harari in Nature: “The ongoing artificial-intelligence revolution will change almost every line of work, creating enormous social and economic opportunities — and challenges. Some believe that intelligent computers will push humans out of the job market and create a new ‘useless class’; others maintain that automation will generate a wide range of new human jobs and greater prosperity for all. Almost everybody agrees that we should take action to prevent the worst-case scenarios….

Governments might decide to deliberately slow down the pace of automation, to lessen the resulting shocks and allow time for readjustments. But it will probably be both impossible and undesirable to prevent automation and job loss completely. That would mean giving up the immense positive potential of AI and robotics. If self-driving vehicles drive more safely and cheaply than humans, it would be counterproductive to ban them just to protect the jobs of taxi and lorry drivers.

A more sensible strategy is to create new jobs. In particular, as routine jobs are automated, opportunities for new non-routine jobs will mushroom. For example, general physicians who focus on diagnosing known diseases and administering familiar treatments will probably be replaced by AI doctors. Precisely because of that, there will be more money to pay human experts to do groundbreaking medical research, develop new medications and pioneer innovative surgical techniques.

This calls for economic entrepreneurship and legal dexterity. Above all, it necessitates a revolution in education…Creating new jobs might prove easier than retraining people to fill them. A huge useless class might appear, owing to both an absolute lack of jobs and a lack of relevant education and mental flexibility….

With insights gleaned from early warning signs and test cases, scholars should strive to develop new socio-economic models. The old ones no longer hold. For example, twentieth-century socialism assumed that the working class was crucial to the economy, and socialist thinkers tried to teach the proletariat how to translate its immense economic power into political clout. In the twenty-first century, if the masses lose their economic value they might have to struggle against irrelevance rather than exploitation….The challenges posed in the twenty-first century by the merger of infotech and biotech are arguably bigger than those thrown up by steam engines, railways, electricity and fossil fuels. Given the immense destructive power of our modern civilization, we cannot afford more failed models, world wars and bloody revolutions. We have to do better this time….(More)”

Laboratories for news? Experimenting with journalism hackathons


Jan Lauren Boyles in Journalism: “Journalism hackathons are computationally based events in which participants create news product prototypes. In the ideal case, the gatherings are rooted in local community, enabling a wide set of institutional stakeholders (legacy journalists, hacker journalists, civic hackers, and the general public) to gather in conversation around key civic issues. This study explores how and to what extent journalism hackathons operate as a community-based laboratory for translating open data from practitioners to the public. Surfaced from in-depth interviews with event organizers encompassing nine countries, the findings illustrate that journalism hackathons are most successful when collaboration integrates civic organizations and community leaders….(More)”.

How “Big Data” Went Bust


The problem with “big data” is not that data is bad. It’s not even that big data is bad: Applied carefully, massive data sets can reveal important trends that would otherwise go undetected. It’s the fetishization of data, and its uncritical use, that tends to lead to disaster, as Julia Rose West recently wrote for Slate. And that’s what “big data,” as a catchphrase, came to represent.

By its nature, big data is hard to interpret. When you’re collecting billions of data points—clicks or cursor positions on a website; turns of a turnstile in a large public space; hourly wind speed observations from around the world; tweets—the provenance of any given data point is obscured. This in turn means that seemingly high-level trends might turn out to be artifacts of problems in the data or methodology at the most granular level possible. But perhaps the bigger problem is that the data you have are usually only a proxy for what you really want to know. Big data doesn’t solve that problem—it magnifies it….

Aside from swearing off data and reverting to anecdote and intuition, there are at least two viable ways to deal with the problems that arise from the imperfect relationship between a data set and the real-world outcome you’re trying to measure or predict.

One is, in short: moar data. This has long been Facebook’s approach. When it became apparent that users’ “likes” were a flawed proxy for what they actually wanted to see more of in their feeds, the company responded by adding more and more proxies to its model. It began measuring other things, like the amount of time they spent looking at a post in their feed, the amount of time they spent reading a story they had clicked on, and whether they hit “like” before or after they had read the piece. When Facebook’s engineers had gone as far as they could in weighting and optimizing those metrics, they found that users were still unsatisfied in important ways. So the company added yet more metrics to the sauce: It started running huge user-survey panels, added new reaction emojis by which users could convey more nuanced sentiments, and started using A.I. to detect clickbait-y language in posts by pages and publishers. The company knows none of these proxies are perfect. But by constantly adding more of them to the mix, it can theoretically edge ever closer to an algorithm that delivers to users the posts that they most want to see.

One downside of the moar data approach is that it’s hard and expensive. Another is that the more variables are added to your model, the more complex, opaque, and unintelligible its methodology becomes. This is part of the problem Pasquale articulated in The Black Box Society. Even the most sophisticated algorithm, drawing on the best data sets, can go awry—and when it does, diagnosing the problem can be nigh-impossible. There are also the perils of “overfitting” and false confidence: The more sophisticated your model becomes, the more perfectly it seems to match up with all your past observations, and the more faith you place in it, the greater the danger that it will eventually fail you in a dramatic way. (Think mortgage crisis, election prediction models, and Zynga.)

Another possible response to the problems that arise from biases in big data sets is what some have taken to calling “small data.” Small data refers to data sets that are simple enough to be analyzed and interpreted directly by humans, without recourse to supercomputers or Hadoop jobs. Like “slow food,” the term arose as a conscious reaction to the prevalence of its opposite….(More)”