New paper by Ewan Sutherland: “While attention has been given to the uses of big data by network operators and to the provision of open data by governments, there has been no systematic attempt to re-examine the regulatory systems for telecommunications. The power of public authorities to access the big data held by operators could transform regulation by simplifying proof of bias or discrimination, making operators more susceptible to behavioural remedies, while it could also be used to deliver much finer granularity of decision making. By opening up data held by government and its agencies to enterprises, think tanks and research groups it should be possible to transform market regulation.“
Smart Machines: IBM's Watson and the Era of Cognitive Computing
In Smart Machines, John E. Kelly III, director of IBM Research, and Steve Hamm, a writer at IBM and a former business and technology journalist, introduce the fascinating world of “cognitive systems” to general audiences and provide a window into the future of computing. Cognitive systems promise to penetrate complexity and assist people and organizations in better decision making. They can help doctors evaluate and treat patients, augment the ways we see, anticipate major weather events, and contribute to smarter urban planning. Kelly and Hamm’s comprehensive perspective describes this technology inside and out and explains how it will help us conquer the harnessing and understanding of “big data,” one of the major computing challenges facing businesses and governments in the coming decades. Absorbing and impassioned, their book will inspire governments, academics, and the global tech industry to work together to power this exciting wave in innovation.”
See also Why cognitive systems?
And Data for All: On the Validity and Usefulness of Open Government Data
Paper presented at the the 13th International Conference on Knowledge Management and Knowledge Technologies: “Open Government Data (OGD) stands for a relatively young trend to make data that is collected and maintained by state authorities available for the public. Although various Austrian OGD initiatives have been started in the last few years, less is known about the validity and the usefulness of the data offered. Based on the data-set on Vienna’s stock of trees, we address two questions in this paper. First of all, we examine the quality of the data by validating it according to knowledge from a related discipline. It shows that the data-set we used correlates with findings from meteorology. Then, we explore the usefulness and exploitability of OGD by describing a concrete scenario in which this data-set can be supportive for citizens in their everyday life and by discussing further application areas in which OGD can be beneficial for different stakeholders and even commercially used.”
Choose Your Own Route on Finland's Algorithm-Driven Public Bus
Brian Merchant at Motherboard: “Technology should probably be transforming public transit a lot faster than it is. Yes, apps like Hopstop have made finding stops easier and I’ve started riding the bus in unfamiliar parts of town a bit more often thanks to Google Maps’ route info. But these are relatively small steps, and it’s all limited to making scheduling information more widely available. Where’s the innovation on the other side? Where’s the Uber-like interactivity, the bus that comes to you after a tap on the iPhone?
In Finland, actually. The Kutsuplus is Helsinki’s groundbreaking mass transit hybrid program that lets riders choose their own routes, pay for fares on their phones, and summon their own buses. It’s a pretty interesting concept. With a ten minute lead time, you summon a Kutsuplus bus to a stop using the official app, just as you’d call a livery cab on Uber. Each minibus in the fleet seats at least nine people, and there’s room for baby carriages and bikes.
You can call your own private Kutsuplus, but if you share the ride, you share the costs—it’s about half the price of a cab fare, and a dollar or two more expensive than old school bus transit. You can then pick your own stop, also using the app.
The interesting part is the scheduling, which is entirely automated. If you’re sharing the ride, an algorithm determines the most direct route, and you only get charged as though you were riding solo. You can pay with a Kutsuplus wallet on the app, or, eventually, bill the charge to your phone bill.”
Seven Principles for Big Data and Resilience Projects
PopTech & Rockefeler Bellagio Fellows: “The following is a draft “Code of Conduct” that seeks to provide guidance on best practices for resilience building projects that leverage Big Data and Advanced Computing. These seven core principles serve to guide data projects to ensure they are socially just, encourage local wealth- & skill-creation, require informed consent, and be maintainable over long timeframes. This document is a work in progress, so we very much welcome feedback. Our aim is not to enforce these principles on others but rather to hold ourselves accountable and in the process encourage others to do the same. Initial versions of this draft were written during the 2013 PopTech & Rockefeller Foundation workshop in Bellagio, August 2013.
Open Source Data Tools – Wherever possible, data analytics and manipulation tools should be open source, architecture independent and broadly prevalent (R, python, etc.). Open source, hackable tools are generative, and building generative capacity is an important element of resilience….
Transparent Data Infrastructure – Infrastructure for data collection and storage should operate based on transparent standards to maximize the number of users that can interact with the infrastructure. Data infrastructure should strive for built-in documentation, be extensive and provide easy access. Data is only as useful to the data scientist as her/his understanding of its collection is correct…
Develop and Maintain Local Skills – Make “Data Literacy” more widespread. Leverage local data labor and build on existing skills. The key and most constraint ingredient to effective data solutions remains human skill/knowledge and needs to be retained locally. In doing so, consider cultural issues and language. Catalyze the next generation of data scientists and generate new required skills in the cities where the data is being collected…
Local Data Ownership – Use Creative Commons and licenses that state that data is not to be used for commercial purposes. The community directly owns the data it generates, along with the learning algorithms (machine learning classifiers) and derivatives. Strong data protection protocols need to be in place to protect identities and personally identifying information…
Ethical Data Sharing – Adopt existing data sharing protocols like the ICRC’s (2013). Permission for sharing is essential. How the data will be used should be clearly articulated. An opt in approach should be the preference wherever possible, and the ability for individuals to remove themselves from a data set after it has been collected must always be an option. Projects should always explicitly state which third parties will get access to data, if any, so that it is clear who will be able to access and use the data…
Right Not To Be Sensed – Local communities have a right not to be sensed. Large scale city sensing projects must have a clear framework for how people are able to be involved or choose not to participate. All too often, sensing projects are established without any ethical framework or any commitment to informed consent. It is essential that the collection of any sensitive data, from social and mobile data to video and photographic records of houses, streets and individuals, is done with full public knowledge, community discussion, and the ability to opt out…
Learning from Mistakes – Big Data and Resilience projects need to be open to face, report, and discuss failures. Big Data technology is still very much in a learning phase. Failure and the learning and insights resulting from it should be accepted and appreciated. Without admitting what does not work we are not learning effectively as a community. Quality control and assessment for data-driven solutions is notably harder than comparable efforts in other technology fields. The uncertainty about quality of the solution is created by the uncertainty inherent in data…”
Special issue of FirstMonday: "Making data — Big data and beyond"
Introduction by Rasmus Helles and Klaus Bruhn Jensen: “Data are widely understood as minimal units of information about the world, waiting to be found and collected by scholars and other analysts. With the recent prominence of ‘big data’ (Mayer–Schönberger and Cukier, 2013), the assumption that data are simply available and plentiful has become more pronounced in research as well as public debate. Challenging and reflecting on this assumption, the present special issue considers how data are made. The contributors take big data and other characteristic features of the digital media environment as an opportunity to revisit classic issues concerning data — big and small, fast and slow, experimental and naturalistic, quantitative and qualitative, found and made.
Data are made in a process involving multiple social agents — communicators, service providers, communication researchers, commercial stakeholders, government authorities, international regulators, and more. Data are made for a variety of scholarly and applied purposes, oriented by knowledge interests (Habermas, 1971). And data are processed and employed in a whole range of everyday and institutional contexts with political, economic, and cultural implications. Unfortunately, the process of generating the materials that come to function as data often remains opaque and certainly under–documented in the published research.
The following eight articles seek to open up some of the black boxes from which data can be seen to emerge. While diverse in their theoretical and topical focus, the articles generally approach the making of data as a process that is extended in time and across spatial and institutional settings. In the common culinary metaphor, data are repeatedly processed, rather than raw. Another shared point of attention is meta–data — the type of data that bear witness to when, where, and how other data such as Web searches, e–mail messages, and phone conversations are exchanged, and which have taken on new, strategic importance in digital media. Last but not least, several of the articles underline the extent to which the making of data as well as meta–data is conditioned — facilitated and constrained — by technological and institutional structures that are inherent in the very domain of analysis. Researchers increasingly depend on the practices and procedures of commercial entities such as Google and Facebook for their research materials, as illustrated by the pivotal role of application programming interfaces (API). Research on the Internet and other digital media also requires specialized tools of data management and analysis, calling, once again, for interdisciplinary competences and dialogues about ‘what the data show.’”
See Table of Contents
The move toward 'crowdsourcing' public safety
What is “crowdsourcing public safety” and why are public safety agencies moving toward this trend?
Crowdsourcing—the term coined by our own assistant professor of journalism Jeff Howe—involves taking a task or job traditionally performed by a distinct agent, or employee, and having that activity be executed by an “undefined, generally large group of people in an open call.” Crowdsourcing public safety involves engaging and enabling private citizens to assist public safety professionals in addressing natural disasters, terror attacks, organized crime incidents, and large-scale industrial accidents.
Public safety agencies have long recognized the need for citizen involvement. Tip lines and missing persons bulletins have been used to engage citizens for years, but with advances in mobile applications and big data analytics, the ability of public safety agencies to receive, process, and make use of high volume, tips, and leads makes crowdsourcing searches and investigations more feasible. You saw this in the FBI Boston Marathon Bombing web-based Tip Line. You see it in the “See Something Say Something” initiatives throughout the country. You see it in AMBER alerts or even remote search and rescue efforts. You even see it in more routine instances like Washington State’s HERO program to reduce traffic violations.
Have these efforts been successful, and what challenges remain?
There are a number of issues to overcome with regard to crowdsourcing public safety—such as maintaining privacy rights, ensuring data quality, and improving trust between citizens and law enforcement officers. Controversies over the National Security Agency’s surveillance program and neighborhood watch programs – particularly the shooting death of teenager Trayvon Martin by neighborhood watch captain George Zimmerman, reflect some of these challenges. It is not clear yet from research the precise set of success criteria, but those efforts that appear successful at the moment have tended to be centered around a particular crisis incident—such as a specific attack or missing person. But as more crowdsourcing public safety mobile applications are developed, adoption and use is likely to increase. One trend to watch is whether national public safety programs are able to tap into the existing social networks of community-based responders like American Red Cross volunteers, Community Emergency Response Teams, and United Way mentors.
The move toward crowdsourcing public safety is part of an overall trend toward improving community resilience, which refers to a system’s ability to bounce back after a crisis or disturbance. Stephen Flynn and his colleagues at Northeastern’s George J. Kostas Research Institute for Homeland Security are playing a key role in driving a national conversation in this area. Community resilience is inherently multi-disciplinary, so you see research being done regarding transportation infrastructure, social media use after a crisis event, and designing sustainable urban environments. Northeastern is a place where use-inspired research is addressing real-world problems. It will take a village to improve community resilience capabilities, and our institution is a vital part of thought leadership for that village.”
Data Discrimination Means the Poor May Experience a Different Internet
MIT Technology Review: “Data analytics are being used to implement a subtle form of discrimination, while anonymous data sets can be mined to reveal health data and other private information, a Microsoft researcher warned this morning at MIT Technology Review’s EmTech conference.
Kate Crawford, principal researcher at Microsoft Research, argued that these problems could be addressed with new legal approaches to the use of personal data.
In a new paper, she and a colleague propose a system of “due process” that would give people more legal rights to understand how data analytics are used in determinations made against them, such as denial of health insurance or a job. “It’s the very start of a conversation about how to do this better,” Crawford, who is also a visiting professor at the MIT Center for Civic Media, said in an interview before the event. “People think ‘big data’ avoids the problem of discrimination, because you are dealing with big data sets, but in fact big data is being used for more and more precise forms of discrimination—a form of data redlining.”
During her talk this morning, Crawford added that with big data, “you will never know what those discriminations are, and I think that’s where the concern begins.”
The Best American Infographics 2013
New book by Gareth Cook: “The rise of infographics across virtually all print and electronic media—from a striking breakdown of classic cocktails to a graphic tracking 200 influential moments that changed the world to visually arresting depictions of Twitter traffic—reveals patterns in our lives and our world in fresh and surprising ways. In the era of big data, where information moves faster than ever, infographics provide us with quick, often influential bursts of art and knowledge—on the environment, politics, social issues, health, sports, arts and culture, and more—to digest, to tweet, to share, to go viral.
The Best American Infographics captures the finest examples from the past year, including the ten best interactive infographics, of this mesmerizing new way of seeing and understanding our world.”
See also selection of some in Wired.
If big data is an atomic bomb, disarmament begins in Silicon Valley
Derrick Harris at GigaOM: “Big data is like atomic energy, according to scientist Albert-László Barabási in a Monday column on Politico. It’s very beneficial when used ethically, and downright destructive when turned into a weapon. He argues scientists can help resolve the damage done by government spying by embracing the principles of nuclear nonproliferation that helped bring an end to Cold War fears and distrust.
Barabási’s analogy is rather poetic:
“Powered by the right type of Big Data, data mining is a weapon. It can be just as harmful, with long-term toxicity, as an atomic bomb. It poisons trust, straining everything from human relations to political alliances and free trade. It may target combatants, but it cannot succeed without sifting through billions of data points scraped from innocent civilians. And when it is a weapon, it should be treated like a weapon.”
I think he’s right, but I think the fight to disarm the big data bomb begins in places like Silicon Valley and Madison Avenue. And it’s not just scientists; all citizens should have a role…
I write about big data and data mining for a living, and I think the underlying technologies and techniques are incredibly valuable, even if the applications aren’t always ideal. On the one hand, advances in machine learning from companies such as Google and Microsoft are fantastic. On the other hand, Facebook’s newly expanded Graph Search makes Europe’s proposed right-to-be-forgotten laws seem a lot more sensible.
But it’s all within the bounds of our user agreements and beauty is in the eye of the beholder.
Perhaps the reason we don’t vote with our feet by moving to web platforms that embrace privacy, even though we suspect it’s being violated, is that we really don’t know what privacy means. Instead of regulating what companies can and can’t do, perhaps lawmakers can mandate a degree of transparency that actually lets users understand how data is being used, not just what data is being collected. Great, some company knows my age, race, ZIP code and web history: What I really need to know is how it’s using that information to target, discriminate against or otherwise serve me.
An intelligent national discussion about the role of the NSA is probably in order. For all anyone knows, it could even turn out we’re willing to put up with more snooping than the goverment might expect. But until we get a handle on privacy from the companies we choose to do business with, I don’t think most Americans have the stomach for such a difficult fight.”