Welcome to E-Estonia, the tiny nation that’s leading Europe in digital innovation


 in The Conversation: “Big Brother does “just want to help” – in Estonia, at least. In this small nation of 1.3 million people, citizens have overcome fears of an Orwellian dystopia with ubiquitous surveillance to become a highly digital society.

The government took nearly all its services online in 2003 with the e-Estonia State Portal. The country’s innovative digital governance was not the result of a carefully crafted master plan, it was a pragmatic and cost-efficient response to budget limitations.

It helped that citizens trusted their politicians after Estonia regained independence in 1991. And, in turn, politicians trusted the country’s engineers, who had no commitment to legacy hardware or software systems, to build something new.

This proved to be a winning formula that can now benefit all the European countries.

The once-only principle

With its digital governance, Estonia introduced the “once-only” principle, mandating that the state is not allowed to ask citizens for the same information twice.

In other words, if you give your address or a family member’s name to the census bureau, the health insurance provider will not later ask you for it again. No department of any government agency can make citizens repeat information already stored in their database or that of some other agency….The once-only principle has been such a big success that, based on Estonia’s common-sense innovation, the EU enacted a digital Once Only Principle and Initiative early this year. It ensures that “citizens and businesses supply certain standard information only once, because public administration offices take action to internally share this data, so that no additional burden falls on citizens and businesses.”…

‘Twice-mandatory’ principle

Governments should always be brainstorming, asking themselves, for example, if one government agency needs this information, who else might benefit from it? And beyond need, what insights could we glean from this data?

Financier Vernon Hill introduced an interesting “One to Say YES, Two to Say NO” rule when founding Metro Bank UK: “It takes only one person to make a yes decision, but it requires two people to say no. If you’re going to turn away business, you need a second check for that.”

Imagine how simple and powerful a policy it would be if governments learnt this lesson. What if every bit of information collected from citizens or businesses had to be used for two purposes (at least!) or by two agencies in order to merit requesting it?

The Estonian Tax and Customs Board is, perhaps unexpectedly given the reputation of tax offices, an example of the potential for such a paradigm shift. In 2014, it launched a new strategy to address tax fraud, requiring every business transaction of over €1,000 to be declared monthly by the entities involved.

To minimise the administrative burden of this, the government introduced an application-programming interface that allows information to be automatically exchanged between the company’s accounting software and the state’s tax system.

Though there was some negative push back in the media at the beginning by companies and former president Toomas Hendrik Ilves even vetoed the initial version of the act, the system was a spectacular success. Estonia surpassed its original estimate of €30 million in reduced tax fraud by more than twice.

Latvia, Spain, Belgium, Romania, Hungary and several others have taken a similar path for controlling and detecting tax fraud. But analysing this data beyond fraud is where the real potential is hidden….(More).”

A Data-driven Approach to Assess the Potential of Smart Cities: The Case of Open Data for Brussels Capital Region


Paper by Miguel Angel Gomez Zotano and Hugues Bersini in Energy Procedia: “The success of smart city projects is intrinsically related to the existence of large volumes of data that could be processed to achieve their objectives. For this purpose, the plethora of data stored by public administrations becomes an incredibly rich source of insight and information due to its volume and diversity. However, it was only with the Open Government Movement when governments have been concerned with the need to open their data to citizens and businesses. Thus, with the emergence of open data portals, these myriad of data enables the development of new business models. The achievement of the benefits sought by making this data available triggers new challenges to cope with the diversity of sources involved. The business potential could be jeopardized by the scarcity of relevant data in the different blocks and domains that makes a city and by the lack of a common approach to data publication, in terms of format, content, etc.

This paper introduces a holistic approach that relies on the Smart City Ontology as the cornerstone to standardise and structure data. This approach, which is proposed to be an analytical tool to assess the potential of data in a given smart city, analyses three main aspects: availability of data, the criteria that data should fulfil to be considered eligible and the model used to structure and organise data. The approach has been applied to the case of Brussels Capital Region, which first results are presented and discussed in this paper. The main conclusion that has been obtained is that, besides its commitment with open data and smart cities, Brussels is not mature enough to fully exploit the real intelligence that smart cities could provide. This maturity would be achieved in the following years with the implementation of the new Brussels’ Smart City Strategy…(More)”.

Access to New Data Sources for Statistics: Business Models and Incentives for the Corporate Sector


Screen Shot 2017-03-28 at 11.45.07 AMReport by Thilo Klein and Stefaan Verhulst: “New data sources, commonly referred to as “Big Data”, have attracted growing interest from National Statistical Institutes. They have the potential to complement official and more conventional statistics used, for instance, to determine progress towards the Sustainable Development Goals (SDGs) and other targets. However, it is often assumed that this type of data is readily available, which is not necessarily the case. This paper examines legal requirements and business incentives to obtain agreement on private data access, and more generally ways to facilitate the use of Big Data for statistical purposes. Using practical cases, the paper analyses the suitability of five generic data access models for different data sources and data uses in an emerging new data ecosystem. Concrete recommendations for policy action are presented in the conclusions….(More)”.

Prediction and Inference from Social Networks and Social Media


Book edited by Kawash, Jalal, Agarwal, Nitin, Özyer, Tansel: “This book addresses the challenges of social network and social media analysis in terms of prediction and inference. The chapters collected here tackle these issues by proposing new analysis methods and by examining mining methods for the vast amount of social content produced. Social Networks (SNs) have become an integral part of our lives; they are used for leisure, business, government, medical, educational purposes and have attracted billions of users. The challenges that stem from this wide adoption of SNs are vast. These include generating realistic social network topologies, awareness of user activities, topic and trend generation, estimation of user attributes from their social content, and behavior detection. This text has applications to widely used platforms such as Twitter and Facebook and appeals to students, researchers, and professionals in the field….(More)”

Standard Business Reporting: Open Data to Cut Compliance Costs


Report by the Data Foundation: “Imagine if U.S. companies’ compliance costs could be reduced, by billions of dollars. Imagine if this could happen without sacrificing any transparency to investors and governments. Open data can make that possible.

This first-ever research report, co-published by the Data Foundation and PwC, explains how Standard Business Reporting (SBR), in which multiple regulatory agencies adopt a common open data structure for the information they collect, reduces costs for both companies and agencies.

SBR programs are in place in the Netherlands, Australia, and elsewhere – but the concept is unknown in the United States. Our report is intended to introduce SBR to U.S. policymakers and lay the groundwork for future change….(More)”.

Congress Takes Blockchain 101


Mike Orcutt at MIT Technology Review: “Congressman David Schweikert is determined to enlighten his colleagues in Washington about the blockchain. The opportunities the technology creates for society are vast, he says, and right now education is key to keeping the government from “screwing it up.”

Schweikert, a Republican from Arizona, co-chairs the recently launched Congressional Blockchain Caucus. He and fellow co-chair, Democratic Representative Jared Polis of Colorado, say they created it in response to increasing interest and curiosity on Capitol Hill about blockchain technology. “Members of Congress are starting to get visits from people that are doing things with the blockchain and talking about it,” says Polis. “They are interested in learning more, and we hope to provide the forum to do that.”

Blockchain technology is difficult to explain, and misconceptions among policymakers are almost inevitable. One important concept Schweikert says more people need to understand is that a blockchain is not necessarily Bitcoin, and there are plenty of applications of blockchains beyond transferring digital currency. Digital currencies, and especially Bitcoin, the most popular blockchain by far, make some policymakers and government officials wary. But focusing on currency keeps people from seeing the potential the blockchain has to reinvent how we control and manage valuable information, Schweikert argues.

A blockchain is a decentralized, online record-keeping system, or ledger, maintained by a network of computers that verify and record transactions using established cryptographic techniques. Bitcoin’s system, which is open-source, depends on people all around the world called miners. They use specialized computers to verify and record transactions, and receive Bitcoin currency in reward. Several other digital currencies work in a similar fashion.

Digital currency is not the main reason so many institutions have begun experimenting with blockchains in recent years, though. Blockchains can also be used to securely and permanently store other information besides currency transaction records. For instance, banks and other financial companies see this as a way to manage information vital to the transfer of ownership of financial assets more efficiently than they do now. Some experiments have involved the Bitcoin blockchain, some use the newer blockchain software platform called Ethereum, and others have used private or semi-private blockchains.

The government should adopt blockchain technology too, say the Congressmen. A decentralized ledger is better than a conventional database “whenever we need better consumer control of information and security” like in health records, tax returns, voting records, and identity management, says Polis. Several federal agencies and state governments are already experimenting with blockchain applications. The Department of Homeland Security, for example, is running a test to track data from its border surveillance devices in a distributed ledger….

Services for transferring money fall under the jurisdiction of several federal regulators, and face a patchwork of state licensing laws. New blockchain-based business models are challenging traditional notions of money transmission, she says, and many companies are unsure where they fit in the complicated legal landscape.

Boring has argued that financial technology companies would benefit from a regulatory safe zone, or “sandbox”—like those that are already in place in the U.K. and Singapore—where they could test products without the risk of “inadvertent regulatory violations.” We don’t need any new legislation from Congress yet, though—that could stifle innovation even more, she says. “What Congress should be doing is educating themselves on the issues.”…(More)”

Dark Web


Kristin Finklea for the Congressional Research Service: “The layers of the Internet go far beyond the surface content that many can easily access in their daily searches. The other content is that of the Deep Web, content that has not been indexed by traditional search engines such as Google. The furthest corners of the Deep Web, segments known as the Dark Web, contain content that has been intentionally concealed. The Dark Web may be used for legitimate purposes as well as to conceal criminal or otherwise malicious activities. It is the exploitation of the Dark Web for illegal practices that has garnered the interest of officials and policymakers.

Individuals can access the Dark Web by using special software such as Tor (short for The Onion Router). Tor relies upon a network of volunteer computers to route users’ web traffic through a series of other users’ computers such that the traffic cannot be traced to the original user. Some developers have created tools—such as Tor2web—that may allow individuals access to Torhosted content without downloading and installing the Tor software, though accessing the Dark Web through these means does not anonymize activity. Once on the Dark Web, users often navigate it through directories such as the “Hidden Wiki,” which organizes sites by category, similar to Wikipedia. Individuals can also search the Dark Web with search engines, which may be broad, searching across the Deep Web, or more specific, searching for contraband like illicit drugs, guns, or counterfeit money.

While on the Dark Web, individuals may communicate through means such as secure email, web chats, or personal messaging hosted on Tor. Though tools such as Tor aim to anonymize content and activity, researchers and security experts are constantly developing means by which certain hidden services or individuals could be identified or “deanonymized.” Anonymizing services such as Tor have been used for legal and illegal activities ranging from maintaining privacy to selling illegal goods—mainly purchased with Bitcoin or other digital currencies. They may be used to circumvent censorship, access blocked content, or maintain the privacy of sensitive communications or business plans. However, a range of malicious actors, from criminals to terrorists to state-sponsored spies, can also leverage cyberspace and the Dark Web can serve as a forum for conversation, coordination, and action. It is unclear how much of the Dark Web is dedicated to serving a particular illicit market at any one time, and, because of the anonymity of services such as Tor, it is even further unclear how much traffic is actually flowing to any given site.

Just as criminals can rely upon the anonymity of the Dark Web, so too can the law enforcement, military, and intelligence communities. They may, for example, use it to conduct online surveillance and sting operations and to maintain anonymous tip lines. Anonymity in the Dark Web can be used to shield officials from identification and hacking by adversaries. It can also be used to conduct a clandestine or covert computer network operation such as taking down a website or a denial of service attack, or to intercept communications. Reportedly, officials are continuously working on expanding techniques to deanonymize activity on the Dark Web and identify malicious actors online….(More)”

Bit By Bit: Social Research in the Digital Age


Open Review of Book by Matthew J. Salganik: “In the summer of 2009, mobile phones were ringing all across Rwanda. In addition to the millions of calls between family, friends, and business associates, about 1,000 Rwandans received a call from Joshua Blumenstock and his colleagues. The researchers were studying wealth and poverty by conducting a survey of people who had been randomly sampled from a database of 1.5 million customers from Rwanda’s largest mobile phone provider. Blumenstock and colleagues asked the participants if they wanted to participate in a survey, explained the nature of the research to them, and then asked a series of questions about their demographic, social, and economic characteristics.

Everything I have said up until now makes this sound like a traditional social science survey. But, what comes next is not traditional, at least not yet. They used the survey data to train a machine learning model to predict someone’s wealth from their call data, and then they used this model to estimate the wealth of all 1.5 million customers. Next, they estimated the place of residence of all 1.5 million customers by using the geographic information embedded in the call logs. Putting these two estimates together—the estimated wealth and the estimated place of residence—Blumenstock and colleagues were able to produce high-resolution estimates of the geographic distribution of wealth across Rwanda. In particular, they could produce an estimated wealth for each of Rwanda’s 2,148 cells, the smallest administrative unit in the country.

It was impossible to validate these estimates because no one had ever produced estimates for such small geographic areas in Rwanda. But, when Blumenstock and colleagues aggregated their estimates to Rwanda’s 30 districts, they found that their estimates were similar to estimates from the Demographic and Health Survey, the gold standard of surveys in developing countries. Although these two approaches produced similar estimates in this case, the approach of Blumenstock and colleagues was about 10 times faster and 50 times cheaper than the traditional Demographic and Health Surveys. These dramatically faster and lower cost estimates create new possibilities for researchers, governments, and companies (Blumenstock, Cadamuro, and On 2015).

In addition to developing a new methodology, this study is kind of like a Rorschach inkblot test; what people see depends on their background. Many social scientists see a new measurement tool that can be used to test theories about economic development. Many data scientists see a cool new machine learning problem. Many business people see a powerful approach for unlocking value in the digital trace data that they have already collected. Many privacy advocates see a scary reminder that we live in a time of mass surveillance. Many policy makers see a way that new technology can help create a better world. In fact, this study is all of those things, and that is why it is a window into the future of social research….(More)”

Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

A solution to the single-question crowd wisdom problem


Dražen Prelec,H. Sebastian Seung & John McCoy in Nature: “Once considered provocative, the notion that the wisdom of the crowd is superior to any individual has become itself a piece of crowd wisdom, leading to speculation that online voting may soon put credentialed experts out of business. Recent applications include political and economic forecasting, evaluating nuclear safety, public policy, the quality of chemical probes, and possible responses to a restless volcano. Algorithms for extracting wisdom from the crowd are typically based on a democratic voting procedure. They are simple to apply and preserve the independence of personal judgment. However, democratic methods have serious limitations. They are biased for shallow, lowest common denominator information, at the expense of novel or specialized knowledge that is not widely shared. Adjustments based on measuring confidence do not solve this problem reliably. Here we propose the following alternative to a democratic vote: select the answer that is more popular than people predict. We show that this principle yields the best answer under reasonable assumptions about voter behaviour, while the standard ‘most popular’ or ‘most confident’ principles fail under exactly those same assumptions. Like traditional voting, the principle accepts unique problems, such as panel decisions about scientific or artistic merit, and legal or historical disputes. The potential application domain is thus broader than that covered by machine learning and psychometric methods, which require data across multiple questions…(More).