On the cultural ideology of Big Data


Nathan Jurgenson in The New Inquiry: “Modernity has long been obsessed with, perhaps even defined by, its epistemic insecurity, its grasping toward big truths that ultimately disappoint as our world grows only less knowable. New knowledge and new ways of understanding simultaneously produce new forms of nonknowledge, new uncertainties and mysteries. The scientific method, based in deduction and falsifiability, is better at proliferating questions than it is at answering them. For instance, Einstein’s theories about the curvature of space and motion at the quantum level provide new knowledge and generates new unknowns that previously could not be pondered.

Since every theory destabilizes as much as it solidifies in our view of the world, the collective frenzy to generate knowledge creates at the same time a mounting sense of futility, a tension looking for catharsis — a moment in which we could feel, if only for an instant, that we know something for sure. In contemporary culture, Big Data promises this relief.

As the name suggests, Big Data is about size. Many proponents of Big Data claim that massive databases can reveal a whole new set of truths because of the unprecedented quantity of information they contain. But the big in Big Data is also used to denote a qualitative difference — that aggregating a certain amount of information makes data pass over into Big Data, a “revolution in knowledge,” to use a phrase thrown around by startups and mass-market social-science books. Operating beyond normal science’s simple accumulation of more information, Big Data is touted as a different sort of knowledge altogether, an Enlightenment for social life reckoned at the scale of masses.

As with the similarly inferential sciences like evolutionary psychology and pop-neuroscience, Big Data can be used to give any chosen hypothesis a veneer of science and the unearned authority of numbers. The data is big enough to entertain any story. Big Data has thus spawned an entire industry (“predictive analytics”) as well as reams of academic, corporate, and governmental research; it has also sparked the rise of “data journalism” like that of FiveThirtyEight, Vox, and the other multiplying explainer sites. It has shifted the center of gravity in these fields not merely because of its grand epistemological claims but also because it’s well-financed. Twitter, for example recently announced that it is putting $10 million into a “social machines” Big Data laboratory.

The rationalist fantasy that enough data can be collected with the “right” methodology to provide an objective and disinterested picture of reality is an old and familiar one: positivism. This is the understanding that the social world can be known and explained from a value-neutral, transcendent view from nowhere in particular. The term comes from Positive Philosophy (1830-1842), by August Comte, who also coined the term sociology in this image. As Western sociology began to congeal as a discipline (departments, paid jobs, journals, conferences), Emile Durkheim, another of the field’s founders, believed it could function as a “social physics” capable of outlining “social facts” akin to the measurable facts that could be recorded about the physical properties of objects. It’s an arrogant view, in retrospect — one that aims for a grand, general theory that can explain social life, a view that became increasingly rooted as sociology became focused on empirical data collection.

A century later, that unwieldy aspiration has been largely abandoned by sociologists in favor of reorienting the discipline toward recognizing complexities rather than pursuing universal explanations for human sociality. But the advent of Big Data has resurrected the fantasy of a social physics, promising a new data-driven technique for ratifying social facts with sheer algorithmic processing power…(More)”

Policy Analytics, Modelling, and Informatics


Book edited by J. Ramon Gil-Garcia, Theresa A. Pardo and Luis F. Luna-Reyes: “This book provides a comprehensive approach to the study of policy analytics, modelling and informatics. It includes theories and concepts for understanding tools and techniques used by governments seeking to improve decision making through the use of technology, data, modelling, and other analytics, and provides relevant case studies and practical recommendations. Governments around the world face policy issues that require strategies and solutions using new technologies, new access to data and new analytical tools and techniques such as computer simulation, geographic information systems, and social network analysis for the successful implementation of public policy and government programs. Chapters include cases, concepts, methodologies, theories, experiences, and practical recommendations on data analytics and modelling for public policy and practice, and addresses a diversity of data tools, applied to different policy stages in several contexts, and levels and branches of government. This book will be of interest of researchers, students, and practitioners in e-government, public policy, public administration, policy analytics and policy informatics….(More)”.

Open mapping from the ground up: learning from Map Kibera


Report by Erica Hagen for Making All Voices Count: “In Nairobi in 2009, 13 young residents of the informal settlement of Kibera mapped their community using OpenStreetMap, an online mapping platform. This was the start of Map Kibera, and eight years of ongoing work to date on digital mapping, citizen media and open data. In this paper, Erica Hagen – one of the initiators of Map Kibera – reflects on the trajectory of this work. Through research interviews with Map Kibera staff, participants and clients, and users of the data and maps the project has produced, she digs into what it means for citizens to map their communities, and examines the impact of open local information on members of the community. The paper begins by situating the research and Map Kibera in selected literature on transparency, accountability and mapping. It then presents three case studies of mapping in Kibera – in the education, security and water sectors – discussing evidence about the effects not only on project participants, but also on governmental and non-governmental actors in each of the three sectors. It concludes that open, community-based data collection can lead to greater trust, which is sorely lacking in marginalised places. In large-scale data gathering, it is often unclear to those involved why the data is needed or what will be done with it. But the experience of Map Kibera shows that by starting from the ground up and sharing open data widely, it is possible to achieve strong sector-wide ramifications beyond the scope of the initial project, including increased resources and targeting by government and NGOs. While debates continue over the best way to truly engage citizens in the ‘data revolution’ and tracking the Sustainable Development Goals, the research here shows that engaging people fully in the information value chain can be the missing link between data as a measurement tool, and information having an impact on social development….(More)”.

Nobody reads privacy policies – here’s how to fix that


 at the Conversation: “…The key to turning privacy notices into something useful for consumers is to rethink their purpose. A company’s policy might show compliance with the regulations the firm is bound to follow, but remains impenetrable to a regular reader.

The starting point for developing consumer-friendly privacy notices is to make them relevant to the user’s activity, understandable and actionable. As part of the Usable Privacy Policy Project, my colleagues and I developed a way to make privacy notices more effective.

The first principle is to break up the documents into smaller chunks and deliver them at times that are appropriate for users. Right now, a single multi-page policy might have many sections and paragraphs, each relevant to different services and activities. Yet people who are just casually browsing a website need only a little bit of information about how the site handles their IP addresses, if what they look at is shared with advertisers and if they can opt out of interest-based ads. Those people doesn’t need to know about many other things listed in all-encompassing policies, like the rules associated with subscribing to the site’s email newsletter, nor how the site handles personal or financial information belonging to people who make purchases or donations on the site.

When a person does decide to sign up for email updates or pay for a service through the site, then an additional short privacy notice could tell her the additional information she needs to know. These shorter documents should also offer users meaningful choices about what they want a company to do – or not do – with their data. For instance, a new subscriber might be allowed to choose whether the company can share his email address or other contact information with outside marketing companies by clicking a check box.

Understanding users’ expectations

Notices can be made even simpler if they focus particularly on unexpected or surprising types of data collection or sharing. For instance, in another study, we learned that most people know their fitness tracker counts steps – so they didn’t really need a privacy notice to tell them that. But they did not expect their data to be collectedaggregated and shared with third parties. Customers should be asked for permission to do this, and allowed to restrict sharing or opt out entirely.

Most importantly, companies should test new privacy notices with users, to ensure final versions are understandable and not misleading, and that offered choices are meaningful….(More)”

Blockchain Could Help Us Reclaim Control of Our Personal Data


Michael Mainelli at Harvard Business Review: “…numerous smaller countries, such as Singapore, are exploring national identity systems that span government and the private sector. One of the more successful stories of governments instituting an identity system is Estonia, with its ID-kaarts. Reacting to cyber-attacks against the nation, the Estonian government decided that it needed to become more digital, and even more secure. They decided to use a distributed ledger to build their system, rather than a traditional central database. Distributed ledgers are used in situations where multiple parties need to share authoritative information with each other without a central third party, such as for data-logging clinical assessments or storing data from commercial deals. These are multi-organization databases with a super audit trail. As a result, the Estonian system provides its citizens with an all-digital government experience, significantly reduced bureaucracy, and significantly high citizen satisfaction with their government dealings.

Cryptocurrencies such as Bitcoin have increased the awareness of distributed ledgers with their use of a particular type of ledger — blockchain — to hold the details of coin accounts among millions of users. Cryptocurrencies have certainly had their own problems with their wallets and exchanges — even ID-kaarts are not without their technical problems — but the distributed ledger technology holds firm for Estonia and for cryptocurrencies. These technologies have been working in hostile environments now for nearly a decade.

The problem with a central database like the ones used to house social security numbers, or credit reports, is that once it’s compromised, a thief has the ability to copy all of the information stored there. Hence the huge numbers of people that can be affected — more than 140 million people in the Equifax breach, and more than 50 million at Home Depot — though perhaps Yahoo takes the cake with more than three billion alleged customer accounts hacked.  Of course, if you can find a distributed ledger online, you can copy it, too. However, a distributed ledger, while available to everyone, may be unreadable if its contents are encrypted. Bitcoin’s blockchain is readable to all, though you can encrypt things in comments. Most distributed ledgers outside cryptocurrencies are encrypted in whole or in part. The effect is that while you can have a copy of the database, you can’t actually read it.

This characteristic of encrypted distributed ledgers has big implications for identity systems.  You can keep certified copies of identity documents, biometric test results, health data, or academic and training certificates online, available at all times, yet safe unless you give away your key. At a whole system level, the database is very secure. Each single ledger entry among billions would need to be found and then individually “cracked” at great expense in time and computing, making the database as a whole very safe.

Distributed ledgers seem ideal for private distributed identity systems, and many organizations are working to provide such systems to help people manage the huge amount of paperwork modern society requires to open accounts, validate yourself, or make payments.  Taken a small step further, these systems can help you keep relevant health or qualification records at your fingertips.  Using “smart” ledgers, you can forward your documentation to people who need to see it, while keeping control of access, including whether another party can forward the information. You can even revoke someone’s access to the information in the future….(More)”.

The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement


Book by Andrew Guthrie Ferguson on “The consequences of big data and algorithm-driven policing and its impact on law enforcement…In a high-tech command center in downtown Los Angeles, a digital map lights up with 911 calls, television monitors track breaking news stories, surveillance cameras sweep the streets, and rows of networked computers link analysts and police officers to a wealth of law enforcement intelligence.
This is just a glimpse into a future where software predicts future crimes, algorithms generate virtual “most-wanted” lists, and databanks collect personal and biometric information.  The Rise of Big Data Policing introduces the cutting-edge technology that is changing how the police do their jobs and shows why it is more important than ever that citizens understand the far-reaching consequences of big data surveillance as a law enforcement tool.
Andrew Guthrie Ferguson reveals how these new technologies —viewed as race-neutral and objective—have been eagerly adopted by police departments hoping to distance themselves from claims of racial bias and unconstitutional practices.  After a series of high-profile police shootings and federal investigations into systemic police misconduct, and in an era of law enforcement budget cutbacks, data-driven policing has been billed as a way to “turn the page” on racial bias.
But behind the data are real people, and difficult questions remain about racial discrimination and the potential to distort constitutional protections.
In this first book on big data policing, Ferguson offers an examination of how new technologies will alter the who, where, when and how we police.  These new technologies also offer data-driven methods to improve police accountability and to remedy the underlying socio-economic risk factors that encourage crime….(More)”

Selected Readings on Blockchain and Identity


By Hannah Pierce and Stefaan Verhulst

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of blockchain and identity was originally published in 2017.

The potential of blockchain and other distributed ledger technologies to create positive social change has inspired enthusiasm, broad experimentation, and some skepticism. In this edition of the Selected Readings series, we explore and curate the literature on blockchain and how it impacts identity as a means to access services and rights. (In a previous edition we considered the Potential of Blockchain for Transforming Governance).

Introduction

In 2008, an unknown source calling itself Satoshi Nakamoto released a paper named Bitcoin: A Peer-to-Peer Electronic Cash System which introduced Blockchain. Blockchain is a novel technology that uses a distributed ledger to record transactions and ensure compliance. Blockchain and other Distributed Ledger technologies (DLTs) rely on an ability to act as a vast, transparent, and secure public database.

Distributed ledger technologies (DLTs) have disruptive potential beyond innovation in products, services, revenue streams and operating systems within industry. By providing transparency and accountability in new and distributed ways, DLTs have the potential to positively empower underserved populations in myriad ways, including providing a means for establishing a trusted digital identity.

Consider the potential of DLTs for 2.4 billion people worldwide, about 1.5 billion of whom are over the age of 14, who are unable to prove identity to the satisfaction of authorities and other organizations – often excluding them from property ownership, free movement, and social protection as a result. At the same time, transition to a DLT led system of ID management involves various risks, that if not understood and mitigated properly, could harm potential beneficiaries.

Annotated Selected Reading List

Governance

Cuomo, Jerry, Richard Nash, Veena Pureswaran, Alan Thurlow, Dave Zaharchuk. “Building trust in government: Exploring the potential of blockchains.” IBM Institute for Business Value. January 2017.

This paper from the IBM Institute for Business Value culls findings from surveys conducted with over 200 government leaders in 16 countries regarding their experiences and expectations for blockchain technology. The report also identifies “Trailblazers”, or governments that expect to have blockchain technology in place by the end of the year, and details the views and approaches that these early adopters are taking to ensure the success of blockchain in governance. These Trailblazers also believe that there will be high yields from utilizing blockchain in identity management and that citizen services, such as voting, tax collection and land registration, will become increasingly dependent upon decentralized and secure identity management systems. Additionally, some of the Trailblazers are exploring blockchain application in borderless services, like cross-province or state tax collection, because the technology removes the need for intermediaries like notaries or lawyers to verify identities and the authenticity of transactions.

Mattila, Juri. “The Blockchain Phenomenon: The Disruptive Potential of Distributed Consensus Architectures.” Berkeley Roundtable on the International Economy. May 2016.

This working paper gives a clear introduction to blockchain terminology, architecture, challenges, applications (including use cases), and implications for digital trust, disintermediation, democratizing the supply chain, an automated economy, and the reconfiguration of regulatory capacity. As far as identification management is concerned, Mattila argues that blockchain can remove the need to go through a trusted third party (such as a bank) to verify identity online. This could strengthen the security of personal data, as the move from a centralized intermediary to a decentralized network lowers the risk of a mass data security breach. In addition, using blockchain technology for identity verification allows for a more standardized documentation of identity which can be used across platforms and services. In light of these potential capabilities, Mattila addresses the disruptive power of blockchain technology on intermediary businesses and regulating bodies.

Identity Management Applications

Allen, Christopher.  “The Path to Self-Sovereign Identity.” Coindesk. April 27, 2016.

In this Coindesk article, author Christopher Allen lays out the history of digital identities, then explains a concept of a “self-sovereign” identity, where trust is enabled without compromising individual privacy. His ten principles for self-sovereign identity (Existence, Control, Access, Transparency, Persistence, Portability, Interoperability, Consent, Minimization, and Protection) lend themselves to blockchain technology for administration. Although there are actors making moves toward the establishment of self-sovereign identity, there are a few challenges that face the widespread implementation of these tenets, including legal risks, confidentiality issues, immature technology, and a reluctance to change established processes.

Jacobovitz, Ori. “Blockchain for Identity Management.” Department of Computer Science, Ben-Gurion University. December 11, 2016.

This technical report discusses advantages of blockchain technology in managing and authenticating identities online, such as the ability for individuals to create and manage their own online identities, which offers greater control over access to personal data. Using blockchain for identity verification can also afford the potential of “digital watermarks” that could be assigned to each of an individual’s transactions, as well as negating the creation of unique usernames and passwords online. After arguing that this decentralized model will allow individuals to manage data on their own terms, Jacobvitz provides a list of companies, projects, and movements that are using blockchain for identity management.

Mainelli, Michael. “Blockchain Will Help Us Prove Our Identities in a Digital World.” Harvard Business Review. March 16, 2017.

In this Harvard Business Review article, author Michael Mainelli highlights a solution to identity problems for rich and poor alike–mutual distributed ledgers (MDLs), or blockchain technology. These multi-organizational data bases with unalterable ledgers and a “super audit trail” have three parties that deal with digital document exchanges: subjects are individuals or assets, certifiers are are organizations that verify identity, and inquisitors are entities that conducts know-your-customer (KYC) checks on the subject. This system will allow for a low-cost, secure, and global method of proving identity. After outlining some of the other benefits that this technology may have in creating secure and easily auditable digital documents, such as greater tolerance that comes from viewing widely public ledgers, Mainelli questions if these capabilities will turn out to be a boon or a burden to bureaucracy and societal behavior.

Personal Data Security Applications

Banafa, Ahmed. “How to Secure the Internet of Things (IoT) with Blockchain.” Datafloq. August 15, 2016.

This article details the data security risks that are coming up as the Internet of Things continues to expand, and how using blockchain technology can protect the personal data and identity information that is exchanged between devices. Banafa argues that, as the creation and collection of data is central to the functions of Internet of Things devices, there is an increasing need to better secure data that largely confidential and often personally identifiable. Decentralizing IoT networks, then securing their communications with blockchain can allow to remain scalable, private, and reliable. Enabling blockchain’s peer-to-peer, trustless communication may also enable smart devices to initiate personal data exchanges like financial transactions, as centralized authorities or intermediaries will not be necessary.

Shrier, David, Weige Wu and Alex Pentland. “Blockchain & Infrastructure (Identity, Data Security).” Massachusetts Institute of Technology. May 17, 2016.

This paper, the third of a four-part series on potential blockchain applications, covers the potential of blockchains to change the status quo of identity authentication systems, privacy protection, transaction monitoring, ownership rights, and data security. The paper also posits that, as personal data becomes more and more valuable, that we should move towards a “New Deal on Data” which provides individuals data protection–through blockchain technology– and the option to contribute their data to aggregates that work towards the common good. In order to achieve this New Deal on Data, robust regulatory standards and financial incentives must be provided to entice individuals to share their data to benefit society.

Paraguay’s transparency alchemists


Story by the Open Contracting Partnership: “….The “Cocido de oro” scandal is seen as part of a well-organized and well-informed youth movement that has sprung up in Paraguay in recent years. An equally dramatic controversyinvolving alleged corruption and unfair staff appointments at one of the country’s top public universities led to the resignation of the Chancellor and other senior staff in September 2015. Mostly high school and university students, they are no longer willing to tolerate the waste and corruption in public spending — a hangover from 35 years of authoritarian rule. They expect their government to be more open and accountable, and public decision-making processes to be more inclusive and democratic.

Thanks to government initiatives that have sought to give citizens greater access to information about public institutions, these students, along with investigative journalists and other civil society groups, are starting to engage actively in civic affairs. And they are data-savvy, basing recommendations on empirical evidence about government policies and processes, how they are implemented, and whether they are working.

Leading the pack is the country’s public procurement office, which runs a portal that ranks among the most open government data sources in the world. Together with information about budgets, public bodies’ payrolls, and other government data, this is helping Paraguayans to tackle some of the biggest long-standing problems faced by the government, like graft, overpricing, nepotism and influence-peddling….

The government recognizes there’s still a long way to go in their quest to open up public data. Few institutions have opened their databases or publish their data on an open data portal, and use of the data that has been published is still limited, according to a report on the country’s third OGP Action Plan. Priority data sets aren’t accessible in ways that meet the needs of civil society, the report adds.

And yet, the tremors of a tectonic shift in transparency and accountability in Paraguay are already being felt. In a short time, armed with access to information, citizens have started engaging with how public money is and should be spent.

The government is now doubling down on its strategy of fostering public participation, using cutting-edge technology to increase citizens’ access to data about their state institutions. Health, education, and municipal-level government, and procurement spending across these areas are being prioritized….(More).

How We Can Stop Earthquakes From Killing People Before They Even Hit


Justin Worland in Time Magazine: “…Out of that realization came a plan to reshape disaster management using big data. Just a few months later, Wani worked with two fellow Stanford students to create a platform to predict the toll of natural disasters. The concept is simple but also revolutionary. The One Concern software pulls geological and structural data from a variety of public and private sources and uses machine learning to predict the impact of an earthquake down to individual city blocks and buildings. Real-time information input during an earthquake improves how the system responds. And earthquakes represent just the start for the company, which plans to launch a similar program for floods and eventually other natural disasters….

Previous software might identify a general area where responders could expect damage, but it would appear as a “big red blob” that wasn’t helpful when deciding exactly where to send resources, Dayton says. The technology also integrates information from many sources and makes it easy to parse in an emergency situation when every moment matters. The instant damage evaluations mean fast and actionable information, so first responders can prioritize search and rescue in areas most likely to be worst-hit, rather than responding to 911 calls in the order they are received.

One Concern is not the only company that sees an opportunity to use data to rethink disaster response. The mapping company Esri has built rapid-response software that shows expected damage from disasters like earthquakes, wildfires and hurricanes. And the U.S. government has invested in programs to use data to shape disaster response at agencies like the National Oceanic and Atmospheric Administration (NOAA)….(More)”.

A Better Way to Trace Scattered Refugees


Tina Rosenberg in The New York Times: “…No one knew where his family had gone. Then an African refugee in Ottawa told him about Refunite. He went on its website and opened an account. He gave his name, phone number and place of origin, and listed family members he was searching for.

Three-quarters of a century ago, while World War II still raged, the Allies created the International Tracing Service to help the millions who had fled their homes. Its central name index grew to 50 million cards, with information on 17.5 million individuals. The index still exists — and still gets queries — today.

Index cards have become digital databases, of course. And some agencies have brought tracing into the digital age in other ways. Unicef, for example, equips staff during humanitarian emergencies with a software called Primero, which helps them get children food, medical care and other help — and register information about unaccompanied children. A parent searching for a child can register as well. An algorithm makes the connection — “like a date-finder or matchmaker,” said Robert MacTavish, who leads the Primero project.

Most United Nations agencies rely for family tracing on the International Committee of the Red Cross, the global network of national Red Cross and Red Crescent societies. Florence Anselmo, who directs the I.C.R.C.’s Central Tracing Agency, said that the I.C.R.C. and United Nations agencies can’t look in one another’s databases. That’s necessary for privacy reasons, but it’s an obstacle to family tracing.

Another problem: Online databases allow the displaced to do their own searches. But the I.C.R.C. has these for only a few emergency situations. Anselmo said that most tracing is done by the staff of national Red Cross societies, who respond to requests from other countries. But there is no global database, so people looking for loved ones must guess which countries to search.

The organization is working on developing an algorithm for matching, but for now, the search engines are human. “When we talk about tracing, it’s not only about data matching,” Anselmo said. “There’s a whole part about accompanying families: the human aspect, professionals as well as volunteers who are able to look for people — even go house to house if needed.”

This is the mom-and-pop general store model of tracing: The customer makes a request at the counter, then a shopkeeper with knowledge of her goods and a kind smile goes to the back and brings it out, throwing in a lollipop. But the world has 65 million forcibly displaced people, a record number. Personalized help to choose from limited stock is appropriate in many cases. But it cannot possibly be enough.

Refunite seeks to become the eBay of family tracing….(More)”