Public Data Is More Important Than Ever–And Now It’s Easier To Find


Meg Miller at Co.Design: “Public data, in theory, is meant to be accessible to everyone. But in practice, even finding it can be near impossible, to say nothing of figuring out what to do with it once you do. Government data websites are often clunky and outdated, and some data is still trapped on physical media–like CDs or individual hard drives.

Tens of thousands of these CDs and hard drives, full of data on topics from Arkansas amusement parks to fire incident reporting, have arrived at the doorstep of the New York-based start-up Enigma over the past four years. The company has obtained thousands upon thousands more datasets by way of Freedom of Information Act (FOIA) requests. Enigma specializes in open data: gathering it, curating it, and analyzing it for insights into a client’s industry, for example, or for public service initiatives.

Enigma also shares its 100,000 datasets with the world through an online platform called Public—the broadest collection of public data that is open and searchable by everyone. Public has been around since Enigma launched in 2013, but today the company is introducing a redesigned version of the site that’s fresher and more user-friendly, with easier navigation and additional features that allow users to drill further down into the data.

But while the first iteration of Public was mostly concerned with making Enigma’s enormous trove of data—which it was already gathering and reformating for client work—accessible to the public, the new site focuses more on linking that data in new ways. For journalists, researchers, and data scientists, the tool will offer more sophisticated ways of making sense of the data that they have access to through Enigma….

…the new homepage also curates featured datasets and collections to enforce a sense of discoverability. For example, an Enigma-curated collection of U.S. sanctions data from the U.S. Treasury Department’s Office of Foreign Assets Control (OFAC) shows data on the restrictions on entities or individuals that American companies can and can’t do business with in an effort to achieve specific national security or foreign policy objectives. A new round of sanctions against Russia have been in the news lately as an effort by President Trump to loosen restrictions on blacklisted businesses and individuals in Russia was overruled by the Senate last week. Enigma’s curated data selection on U.S. sanctions could help journalists contextualize recent events with data that shows changes in sanctions lists over time by presidential administration, for instance–or they could compare the U.S. sanctions list to the European Union’s….(More).

Blockchains, personal data and the challenge of governance


Theo Bass at NESTA: “…There are a number of dominant internet platforms (Google, Facebook, Amazon, etc.) that hoard, analyse and sell information about their users in the name of a more personalised and efficient service. This has become a problem.

People feel they are losing control over how their data is used and reused on the web. 500 million adblocker downloads is a symptom of a market which isn’t working well for people. As Irene Ng mentions in a recent guest blog on the Nesta website, the secondary data market is thriving (online advertising is a major player), as companies benefit from the opacity and lack of transparency about where profit is made from personal data.

It’s said that blockchain’s key characteristics could provide a foundational protocol for a fairer digital identity system on the web. Beyond its application as digital currency, blockchain could provide a new set of technical standards for transparency, openness, and user consent, on top of which a whole new generation of services might be built.

While the aim is ambitious, a handful of projects are rising to the challenge.

Blockstack is creating a global system of digital IDs, which are written into the bitcoin blockchain. Nobody can touch them other than the owner of that ID. Blockstack are building a new generation of applications on top of this infrastructure which promises to provide “a new decentralized internet where users own their data and apps run locally”.

Sovrin attempts to provide users with “self-sovereign identity”. The argument is that “centralized” systems for storing personal data make it a “treasure chest for attackers”. Sovrin argues that users should more easily be able to have “ownership” over their data, and the exchange of data should be made possible through a decentralised, tamper-proof ledger of transactions between users.

Our own DECODE project is piloting a set of collaboratively owned, local sharing economy platforms in Barcelona and Amsterdam. The blockchain aims to provide a public record of entitlements over where people’s data is stored, who can access it and for what purpose (with some additional help from new techniques in zero-knowledge cryptography to preserve people’s privacy).

There’s no doubt this is an exciting field of innovation. But the debate is characterised by a lot of hype. The following sections therefore discuss some of the challenges thrown up when we start thinking about implementations beyond bitcoin.

Blockchains and the challenge of governance

As mentioned above, bitcoin is a “bearer asset”. This is a necessary feature of decentralisation — all users maintain sole ownership over the digital money they hold on the network. If users get hacked (digital wallets sometimes do), or if a password gets lost, the money is irretrievable.

While the example of losing a password might seem trivial, it highlights some difficult questions for proponents of blockchain’s wider uses. What happens if there’s a dispute over an online transaction, but no intermediary to settle it? What happens if a someone’s digital assets or their digital identity is breached and sensitive data falls into the wrong hands? It might be necessary to assign responsibility to a governing actor to help resolve the issue, but of course this would require the introduction of a trusted middleman.

Bitcoin doesn’t try to answer these questions; its anonymous creators deliberately tried to avoid implementing a clear model of governance over the network, probably because they knew that bitcoin would be used by people as a method for subverting the law. Bitcoin still sees a lot of use in gray economies, including for the sale of drugs and gambling.

But if blockchains are set to enter the mainstream, providing for businesses, governments and nonprofits, then they won’t be able to function irrespective of the law. They will need to find use-cases that can operate alongside legal frameworks and jurisdictional boundaries. They will need to demonstrate regulatory compliance, create systems of rules and provide accountability when things go awry. This cannot just be solved through increasingly sophisticated coding.

All of this raises a potential paradox recently elaborated in a post by Vili Lehdonvirta of the Oxford Internet Institute: is it possible to successfully govern blockchains without undermining their entire purpose?….

If blockchain advocates only work towards purely technical solutions and ignore real-world challenges of trying to implement decentralisation, then we’ll only ever see flawed implementations of the technology. This is already happening in the form of centrally administered, proprietary or ‘half-baked’ blockchains, which don’t offer much more value than traditional databases….(More)”.

Regulation of Big Data: Perspectives on Strategy, Policy, Law and Privacy


Paper by Pompeu CasanovasLouis de KokerDanuta Mendelson and David Watts: “…presents four complementary perspectives stemming from governance, law, ethics, and computer science. Big, Linked, and Open Data constitute complex phenomena whose economic and political dimensions require a plurality of instruments to enhance and protect citizens’ rights. Some conclusions are offered in the end to foster a more general discussion.

This article contends that the effective regulation of Big Data requires a combination of legal tools and other instruments of a semantic and algorithmic nature. It commences with a brief discussion of the concept of Big Data and views expressed by Australian and UK participants in a study of Big Data use in a law enforcement and national security perspective. The second part of the article highlights the UN’s Special Rapporteur on the Right to Privacy interest in the themes and the focus of their new program on Big Data. UK law reforms regarding authorisation of warrants for the exercise of bulk data powers is discussed in the third part. Reflecting on these developments, the paper closes with an exploration of the complex relationship between law and Big Data and the implications for regulation and governance of Big Data….(More)”.

Open Data’s Effect on Food Security


Jeremy de Beer, Jeremiah Baarbé, and Sarah Thuswaldner at Open AIR: “Agricultural data is a vital resource in the effort to address food insecurity. This data is used across the food-production chain. For example, farmers rely on agricultural data to decide when to plant crops, scientists use data to conduct research on pests and design disease resistant plants, and governments make policy based on land use data. As the value of agricultural data is understood, there is a growing call for governments and firms to open their agricultural data.

Open data is data that anyone can access, use, or share. Open agricultural data has the potential to address food insecurity by making it easier for farmers and other stakeholders to access and use the data they need. Open data also builds trust and fosters collaboration among stakeholders that can lead to new discoveries to address the problems of feeding a growing population.

 

A network of partnerships is growing around agricultural data research. The Open African Innovation Research (Open AIR) network is researching open agricultural data in partnership with the Plant Phenotyping and Imaging Research Centre (P2IRC) and the Global Institute for Food Security (GIFS). This research builds on a partnership with the Global Open Data for Agriculture and Nutrition (GODAN) and they are exploring partnerships with Open Data for Development (OD4D) and other open data organizations.

…published two works on open agricultural data. Published in partnership with GODAN, “Ownership of Open Data” describes how intellectual property law defines ownership rights in data. Firms that collect data own the rights to data, which is a major factor in the power dynamics of open data. In July, Jeremiah Baarbé and Jeremy de Beer will be presenting “A Data Commons for Food Security” …The paper proposes a licensing model that allows farmers to benefit from the datasets to which they contribute. The license supports SME data collectors, who need sophisticated legal tools; contributors, who need engagement, privacy, control, and benefit sharing; and consumers who need open access….(More)“.

Teaching machines to understand – and summarize – text


 and  in The Conversation: “We humans are swamped with text. It’s not just news and other timely information: Regular people are drowning in legal documents. The problem is so bad we mostly ignore it. Every time a person uses a store’s loyalty rewards card or connects to an online service, his or her activities are governed by the equivalent of hundreds of pages of legalese. Most people pay no attention to these massive documents, often labeled “terms of service,” “user agreement” or “privacy policy.”

These are just part of a much wider societal problem of information overload. There is so much data stored – exabytes of it, as much stored as has ever been spoken by people in all of human history – that it’s humanly impossible to read and interpret everything. Often, we narrow down our pool of information by choosing particular topics or issues to pay attention to. But it’s important to actually know the meaning and contents of the legal documents that govern how our data is stored and who can see it.

As computer science researchers, we are working on ways artificial intelligence algorithms could digest these massive texts and extract their meaning, presenting it in terms regular people can understand….

Examining privacy policies

A modern internet-enabled life today more or less requires trusting for-profit companies with private information (like physical and email addresses, credit card numbers and bank account details) and personal data (photos and videos, email messages and location information).

These companies’ cloud-based systems typically keep multiple copies of users’ data as part of backup plans to prevent service outages. That means there are more potential targets – each data center must be securely protected both physically and electronically. Of course, internet companies recognize customers’ concerns and employ security teams to protect users’ data. But the specific and detailed legal obligations they undertake to do that are found in their impenetrable privacy policies. No regular human – and perhaps even no single attorney – can truly understand them.

In our study, we ask computers to summarize the terms and conditions regular users say they agree to when they click “Accept” or “Agree” buttons for online services. We downloaded the publicly available privacy policies of various internet companies, including Amazon AWS, Facebook, Google, HP, Oracle, PayPal, Salesforce, Snapchat, Twitter and WhatsApp….

Our software examines the text and uses information extraction techniques to identify key information specifying the legal rights, obligations and prohibitions identified in the document. It also uses linguistic analysis to identify whether each rule applies to the service provider, the user or a third-party entity, such as advertisers and marketing companies. Then it presents that information in clear, direct, human-readable statements….(More)”

Artificial intelligence can predict which congressional bills will pass


Other algorithms have predicted whether a bill will survive a congressional committee, or whether the Senate or House of Representatives will vote to approve it—all with varying degrees of success. But John Nay, a computer scientist and co-founder of Skopos Labs, a Nashville-based AI company focused on studying policymaking, wanted to take things one step further. He wanted to predict whether an introduced bill would make it all the way through both chambers—and precisely what its chances were.

Nay started with data on the 103rd Congress (1993–1995) through the 113th Congress (2013–2015), downloaded from a legislation-tracking website call GovTrack. This included the full text of the bills, plus a set of variables, including the number of co-sponsors, the month the bill was introduced, and whether the sponsor was in the majority party of their chamber. Using data on Congresses 103 through 106, he trained machine-learning algorithms—programs that find patterns on their own—to associate bills’ text and contextual variables with their outcomes. He then predicted how each bill would do in the 107th Congress. Then, he trained his algorithms on Congresses 103 through 107 to predict the 108th Congress, and so on.

Nay’s most complex machine-learning algorithm combined several parts. The first part analyzed the language in the bill. It interpreted the meaning of words by how they were embedded in surrounding words. For example, it might see the phrase “obtain a loan for education” and assume “loan” has something to do with “obtain” and “education.” A word’s meaning was then represented as a string of numbers describing its relation to other words. The algorithm combined these numbers to assign each sentence a meaning. Then, it found links between the meanings of sentences and the success of bills that contained them. Three other algorithms found connections between contextual data and bill success. Finally, an umbrella algorithm used the results from those four algorithms to predict what would happen…. his program scored about 65% better than simply guessing that a bill wouldn’t pass, Nay reported last month in PLOS ONE…(More).

Why blockchain could be your next form of ID as a world citizen


 at TechRepublic: “Blockchain is moving from banking to the refugee crisis, as Microsoft and Accenture on Monday announced a partnership to use the technology to provide a legal form of identification for 1.1 billion people worldwide as part of the global public-private partnership ID2020.

The two tech giants developed a prototype that taps Accenture’s blockchain capabilities and runs on Microsoft Azure. The tech tool uses a person’s biometric data, such as a fingerprint or iris scan, to unlock the record-keeping blockchain technology and create a legal ID. This will allow refugees to have a personal identity record they can access from an app on a smartphone to receive assistance at border crossings, or to access basic services such as healthcare, according to a press release.

The prototype is designed so that personally identifiable information (PII) always exists “off chain,” and is not stored in a centralized system. Citizens use their biometric data to access their information, and chose when to share it—preventing the system from being accessed by tyrannical governments that refugees are fleeing from, as ZDNet noted.

Accenture’s platform is currently used in the Biometric Identity Management System operated by the United Nations High Commissioner for Refugees, which has enrolled more than 1.3 million refugees in 29 nations across Asia, Africa, and the Caribbean. The system is predicted to support more than 7 million refugees from 75 countries by 2020, the press release noted.

“People without a documented identity suffer by being excluded from modern society,” said David Treat, a managing director in Accenture’s global blockchain business, in the press release. “Our prototype is personal, private and portable, empowering individuals to access and share appropriate information when convenient and without the worry of using or losing paper documentation.”

ID is key for accessing education, healthcare, voting, banking, housing, and other family benefits, the press release noted. ID2020’s goal is to create a secure, established digital ID system for all citizens worldwide….

Blockchain will likely play an increasing role in both identification and security moving forward, especially as it relates to the Internet of Things (IoT). For example, Telstra, an Australian telecommunications company, is currently experimenting with a combination of blockchain and biometric security for its smart home products, ZDNet reported….(More)”.

Does democracy cause innovation? An empirical test of the popper hypothesis


Yanyan GaoLeizhen ZangAntoine Roth, and Puqu Wang in Research Policy: “Democratic countries produce higher levels of innovation than autocratic ones, but does democratization itself lead to innovation growth either in the short or in the long run? The existing literature has extensively examined the relationship between democracy and growth but seldom explored the effect of democracy on innovation, which might be an important channel through which democracy contributes to economic growth. This article aims to fill this gap and contribute to the long-standing debate on the relationship between democracy and innovation by offering empirical evidence based on a data set covering 156 countries between 1964 and 2010. The results from the difference-in-differences method show that democracy itself has no direct positive effect on innovation measured with patent counts, patent citations and patent originality….(More)”.

 

Big Data, Data Science, and Civil Rights


Paper by Solon Barocas, Elizabeth Bradley, Vasant Honavar, and Foster Provost:  “Advances in data analytics bring with them civil rights implications. Data-driven and algorithmic decision making increasingly determine how businesses target advertisements to consumers, how police departments monitor individuals or groups, how banks decide who gets a loan and who does not, how employers hire, how colleges and universities make admissions and financial aid decisions, and much more. As data-driven decisions increasingly affect every corner of our lives, there is an urgent need to ensure they do not become instruments of discrimination, barriers to equality, threats to social justice, and sources of unfairness. In this paper, we argue for a concrete research agenda aimed at addressing these concerns, comprising five areas of emphasis: (i) Determining if models and modeling procedures exhibit objectionable bias; (ii) Building awareness of fairness into machine learning methods; (iii) Improving the transparency and control of data- and model-driven decision making; (iv) Looking beyond the algorithm(s) for sources of bias and unfairness—in the myriad human decisions made during the problem formulation and modeling process; and (v) Supporting the cross-disciplinary scholarship necessary to do all of that well…(More)”.

Can we predict political uprisings?


 at The Conversation: “Forecasting political unrest is a challenging task, especially in this era of post-truth and opinion polls.

Several studies by economists such as Paul Collier and Anke Hoeffler in 1998 and 2002 describe how economic indicators, such as slow income growth and natural resource dependence, can explain political upheaval. More specifically, low per capita income has been a significant trigger of civil unrest.

Economists James Fearon and David Laitin have also followed this hypothesis, showing how specific factors played an important role in Chad, Sudan and Somalia in outbreaks of political violence.

According to the International Country Risk Guide index, the internal political stability of Sudan fell by 15% in 2014, compared to the previous year. This decrease was after a reduction of its per capita income growth rate from 12% in 2012 to 2% in 2013.

By contrast, when the income per capita growth increased in 1997 compared to 1996, the score for political stability in Sudan increased by more than 100% in 1998. Political stability across any given year seems to be a function of income growth in the previous one.

When economics lie

But as the World Bank admitted, “economic indicators failed to predict Arab Spring”.

Usual economic performance indicators, such as gross domestic product, trade, foreign direct investment, showed higher economic development and globalisation of the Arab Spring countries over a decade. Yet, in 2010, the region witnessed unprecedented uprisings that caused the collapse of regimes such as those in Tunisia, Egypt and Libya.

In our 2016 study we used data for more than 100 countries for the 1984–2012 period. We wanted to look at criteria other than economics to better understand the rise of political upheavals.

We found out and quantified how corruption is a destabilising factor when youth (15-24 years old) exceeds 20% of adult population.

Let’s examine the two main components of the study: demographics and corruption….

We are 90% confident that a youth bulge beyond 20% of adult population, on average, combined with high levels of corruption can significantly destabilise political systems within specific countries when other factors described above also taken into account. We are 99% confident about a youth bulge beyond 30% levels.

Our results can help explain the risk of internal conflict and the possible time window for it happening. They could guide policy makers and international organisations in allocating their anti-corruption budget better, taking into account the demographic structure of societies and the risk of political instability….(More).