Trust, Security, and Privacy in Crowdsourcing


Guest Editorial to Special Issue of IEEE Internet of Things Journal: “As we become increasingly reliant on intelligent, interconnected devices in every aspect of our lives, critical trust, security, and privacy concerns are raised as well.

First, the sensing data provided by individual participants is not always reliable. It may be noisy or even faked due to various reasons, such as poor sensor quality, lack of sensor calibration, background noise, context impact, mobility, incomplete view of observations, or malicious attacks. The crowdsourcing applications should be able to evaluate the trustworthiness of collected data in order to filter out the noisy and fake data that may disturb or intrude a crowdsourcing system. Second, providing data (e.g., photographs taken with personal mobile devices) or using IoT applications may compromise data providers’ personal data privacy (e.g., location, trajectory, and activity privacy) and identity privacy. Therefore, it becomes essential to assess the trust of the data while preserving the data providers’ privacy. Third, data analytics and mining in crowdsourcing may disclose the privacy of data providers or related entities to unauthorized parities, which lowers the willingness of participants to contribute to the crowdsourcing system, impacts system acceptance, and greatly impedes its further development. Fourth, the identities of data providers could be forged by malicious attackers to intrude the whole crowdsourcing system. In this context, trust, security, and privacy start to attract a special attention in order to achieve high quality of service in each step of crowdsourcing with regard to data collection, transmission, selection, processing, analysis and mining, as well as utilization.

Trust, security, and privacy in crowdsourcing receives increasing attention. Many methods have been proposed to protect privacy in the process of data collection and processing. For example, data perturbation can be adopted to hide the real data values during data collection. When preprocessing the collected data, data anonymization (e.g., k-anonymization) and fusion can be applied to break the links between the data and their sources/providers. In application layer, anonymity is used to mask the real identities of data sources/providers. To enable privacy-preserving data mining, secure multiparty computation (SMC) and homomorphic encryption provide options for protecting raw data when multiple parties jointly run a data mining algorithm. Through cryptographic techniques, no party knows anything else than its own input and expected results. For data truth discovery, applicable solutions include correlation-based data quality analysis and trust evaluation of data sources. But current solutions are still imperfect, incomprehensive, and inefficient….(More)”.

Data Science Thinking: The Next Scientific, Technological and Economic Revolution


Book by Longbing Cao: “This book explores answers to the fundamental questions driving the research, innovation and practices of the latest revolution in scientific, technological and economic development: how does data science transform existing science, technology, industry, economy, profession and education?  How does one remain competitive in the data science field? What is responsible for shaping the mindset and skillset of data scientists?

Data Science Thinking paints a comprehensive picture of data science as a new scientific paradigm from the scientific evolution perspective, as data science thinking from the scientific-thinking perspective, as a trans-disciplinary science from the disciplinary perspective, and as a new profession and economy from the business perspective.

The topics cover an extremely wide spectrum of essential and relevant aspects of data science, spanning its evolution, concepts, thinking, challenges, discipline, and foundation, all the way to industrialization, profession, education, and the vast array of opportunities that data science offers. The book’s three parts each detail layers of these different aspects….(More)”.

The Risks of Dangerous Dashboards in Basic Education


Lant Pritchett at the Center for Global Development: “On June 1, 2009 Air France flight 447 from Rio de Janeiro to Paris crashed into the Atlantic Ocean killing all 228 people on board. While the Airbus 330 was flying on auto-pilot, the different speed indicators received by the on-board navigation computers started to give conflicting speeds, almost certainly because the pitot tubes responsible for measuring air speed had iced over. Since the auto-pilot could not resolve conflicting signals and hence did not know how fast the plane was actually going, it turned control of the plane over to the two first officers (the captain was out of the cockpit). Subsequent flight simulator trials replicating the conditions of the flight conclude that had the pilots done nothing at all everyone would have lived—nothing was actually wrong; only the indicators were faulty, not the actual speed. But, tragically, the pilots didn’t do nothing….

What is the connection to education?

Many countries’ systems of basic education are in “stall” condition.

A recent paper of Beatty et al. (2018) uses information from the Indonesia Family Life Survey, a representative household survey that has been carried out in several waves with the same individuals since 2000 and contains information on whether individuals can answer simple arithmetic questions. Figure 1, showing the relationship between the level of schooling and the probability of answering a typical question correctly, has two shocking results.

First, the difference in the likelihood a person can answer a simple mathematics question correctly differs by only 20 percent between individuals who have completed less than primary school (<PS)—who can answer correctly (adjusted for guessing) about 20 percent of the time—and those who have completed senior secondary school or more (>=SSS), who answer correctly only about 40 percent of the time. These are simple multiple choice questions like whether 56/84 is the same fraction as (can be reduced to) 2/3, and whether 1/3-1/6 equals 1/6. This means that in an entire year of schooling, less than 2 additional children per 100 gain the ability to answer simple arithmetic questions.

Second, this incredibly poor performance in 2000 got worse by 2014. …

What has this got to do with education dashboards? The way large bureaucracies prefer to work is to specify process compliance and inputs and then measure those as a means of driving performance. This logistical mode of managing an organization works best when both process compliance and inputs are easily “observable” in the economist’s sense of easily verifiable, contractible, adjudicated. This leads to attention to processes and inputs that are “thin” in the Clifford Geertz sense (adopted by James Scott as his primary definition of how a “high modern” bureaucracy and hence the state “sees” the world). So in education one would specify easily-observable inputs like textbook availability, class size, school infrastructure. Even if one were talking about “quality” of schooling, a large bureaucracy would want this too reduced to “thin” indicators, like the fraction of teachers with a given type of formal degree, or process compliance measures, like whether teachers were hired based on some formal assessment.

Those involved in schooling can then become obsessed with their dashboards and the “thin” progress that is being tracked and easily ignore the loud warning signals saying: Stall!…(More)”.

Searching for the Smart City’s Democratic Future


Article by Bianca Wylie at the Center for International Governance Innovation: “There is a striking blue building on Toronto’s eastern waterfront. Wrapped top to bottom in bright, beautiful artwork by Montreal illustrator Cecile Gariepy, the building — a former fish-processing plant — stands out alongside the neighbouring parking lots and a congested highway. It’s been given a second life as an office for Sidewalk Labs — a sister company to Google that is proposing a smart city development in Toronto. Perhaps ironically, the office is like the smart city itself: something old repackaged to be light, fresh and novel.

“Our mission is really to use technology to redefine urban life in the twenty-first century.”

Dan Doctoroff, CEO of Sidewalk Labs, shared this mission in an interview with Freakonomics Radio. The phrase is a variant of the marketing language used by the smart city industry at large. Put more simply, the term “smart city” is usually used to describe the use of technology and data in cities.

No matter the words chosen to describe it, the smart city model has a flaw at its core: corporations are seeking to exert influence on urban spaces and democratic governance. And because most governments don’t have the policy in place to regulate smart city development — in particular, projects driven by the fast-paced technology sector — this presents a growing global governance concern.

This is where the story usually descends into warnings of smart city dystopia or failure. Loads of recent articles have detailed the science fiction-style city-of-the-future and speculated about the perils of mass data collection, and for good reason — these are important concepts that warrant discussion. It’s time, however, to push past dystopian narratives and explore solutions for the challenges that smart cities present in Toronto and globally…(More)”.

Data Publics: Urban Protest, Analytics and the Courts


Article by Anthony McCosker and Timothy Graham in MC Journal: “There are many examples globally of the use of social media to engage publics in battles over urban development or similar issues (e.g. Fredericks and Foth). Some have asked how social media might be better used by neighborhood organisations to mobilise protest and save historic buildings, cultural landmarks or urban sites (Johnson and Halegoua). And we can only note here the wealth of research literature on social movements, protest and social media. To emphasise Gerbaudo’s point, drawing on Mattoni, we “need to account for how exactly the use of these media reshapes the ‘repertoire of communication’ of contemporary movements and affects the experience of participants” (2). For us, this also means better understanding the role that social data plays in both aiding and reshaping urban protest or arming third sector groups with evidence useful in social institutions such as the courts.

New modes of digital engagement enable forms of distributed digital citizenship, which Meikle sees as the creative political relationships that form through exercising rights and responsibilities. Associated with these practices is the transition from sanctioned, simple discursive forms of social protest in petitions, to new indicators of social engagement in more nuanced social media data and the more interactive forms of online petition platforms like change.org or GetUp (Halpin et al.). These technical forms code publics in specific ways that have implications for contemporary protest action. That is, they provide the operational systems and instructions that shape social actions and relationships for protest purposes (McCosker and Milne).

All protest and social movements are underwritten by explicit or implicit concepts of participatory publics as these are shaped, enhanced, or threatened by communication technologies. But participatory protest publics are uneven, and as Kelty asks: “What about all the people who are neither protesters nor Twitter users? In the broadest possible sense this ‘General Public’ cannot be said to exist as an actual entity, but only as a kind of virtual entity” (27). Kelty is pointing to the porous boundary between a general public and an organised public, or formal enterprise, as a reminder that we cannot take for granted representations of a public, or the public as a given, in relation to Like or follower data for instance.

If carefully gauged, the concept of data publics can be useful. To start with, the notions of publics and publicness are notoriously slippery. Baym and boyd explore the differences between these two terms, and the way social media reconfigures what “public” is. Does a Comment or a Like on a Facebook Page connect an individual sufficiently to an issues-public? As far back as the 1930s, John Dewey was seeking a pragmatic approach to similar questions regarding human association and the pluralistic space of “the public”. For Dewey, “the machine age has so enormously expanded, multiplied, intensified and complicated the scope of the indirect consequences [of human association] that the resultant public cannot identify itself” (157). To what extent, then, can we use data to constitute a public in relation to social protest in the age of data analytics?

There are numerous well formulated approaches to studying publics in relation to social media and social networks. Social network analysis (SNA) determines publics, or communities, through links, ties and clustering, by measuring and mapping those connections and to an extent assuming that they constitute some form of sociality. Networked publics (Ito, 6) are understood as an outcome of social media platforms and practices in the use of new digital media authoring and distribution tools or platforms and the particular actions, relationships or modes of communication they afford, to use James Gibson’s sense of that term. “Publics can be reactors, (re)makers and (re)distributors, engaging in shared culture and knowledge through discourse and social exchange as well as through acts of media reception” (Ito 6). Hashtags, for example, facilitate connectivity and visibility and aid in the formation and “coordination of ad hoc issue publics” (Bruns and Burgess 3). Gray et al., following Ruppert, argue that “data publics are constituted by dynamic, heterogeneous arrangements of actors mobilised around data infrastructures, sometimes figuring as part of them, sometimes emerging as their effect”. The individuals of data publics are neither subjugated by the logics and metrics of digital platforms and data structures, nor simply sovereign agents empowered by the expressive potential of aggregated data (Gray et al.).

Data publics are more than just aggregates of individual data points or connections. They are inherently unstable, dynamic (despite static analysis and visualisations), or vibrant, and ephemeral. We emphasise three key elements of active data publics. First, to be more than an aggregate of individual items, a data public needs to be consequential (in Dewey’s sense of issues or problem-oriented). Second, sufficient connection is visible over time. Third, affective or emotional activity is apparent in relation to events that lend coherence to the public and its prevailing sentiment. To these, we add critical attention to the affordising processes – or the deliberate and incidental effects of datafication and analysis, in the capacities for data collection and processing in order to produce particular analytical outcomes, and the data literacies these require. We return to the latter after elaborating on the Save the Palace case….(More)”.

To the smart city and beyond? Developing a typology of smart urban innovation


Maja Nilssen in Technological Forecasting and Social Change: “The smart city is an increasingly popular topic in urban development, arousing both excitement and skepticism. However, despite increasing enthusiasm regarding the smartness of cities, the concept is still regarded as somewhat evasive. Encouraged by the multifaceted character of the concept, this article examines how we can categorize the different dimensions often included in the smart city concept, and how these dimensions are coupled to innovation. Furthermore, the article examines the implications of the different understandings of the smart city concept for cities’ abilities to be innovative.

Building on existing scholarly contributions on the smartness of cities and innovation literature, the article develops a typology of smart city initiatives based on the extent and types of innovations they involve. The typology is structured as a smart city continuum, comprising four dimensions of innovation: (1) technological, (2) organizational, (3) collaborative, (4) experimental.

The smart city continuum is then utilized to analyze empirical data from a Norwegian urban development project triggered by a critical juncture. The empirical data shows that the case holds elements of different dimensions of the continuum, supporting the need for a typology of smart cities as multifaceted urban innovation. The continuum can be used as an analytical model for different types of smart city initiatives, and thus shed light on what types of innovation are central in the smart city. Consequently, the article offers useful insights for both practitioners and scholars interested in smart city initiatives….(More)”

How Taiwan’s online democracy may show future of humans and machines


Shuyang Lin at the Sydney Morning Herald: “Taiwanese citizens have spent the past 30 years prototyping future democracy since the lift of martial law in 1987. Public participation in Taiwan has been developed in several formats, from face-to-face to deliberation over the internet. This trajectory coincides with the advancement of technology, and as new tools arrived, democracy evolved.

The launch of vTaiwan (v for virtual, vote, voice and verb), an experiment that prototypes an open consultation process for the civil society, showed that by using technology creatively humanity can facilitate deep and fair conversations, form collective consensus, and deliver solutions we can all live with.

It is a prototype that helps us envision what future democracy could look like….

Decision-making is not an easy task, especially when it has to do with a larger group of people. Group decision-making could take several protocols, such as mandate, to decide and take questions; advise, to listen before decisions; consent, to decide if no one objects; and consensus, to decide if everyone agrees. So there is a pressing need for us to be able to collaborate together in a large scale decision-making process to update outdated standards and regulations.

The future of human knowledge is on the web. Technology can help us to learn, communicate, and make better decisions faster with larger scale. The internet could be the facilitation and AI could be the catalyst. It is extremely important to be aware that decision-making is not a one-off interaction. The most important direction of decision-making technology development is to have it allow humans to be engaged in the process anytime and also have an invitation to request and submit changes.

Humans have started working with computers, and we will continue to work with them. They will help us in the decision-making process and some will even make decisions for us; the actors in collaboration don’t necessarily need to be just humans. While it is up to us to decide what and when to opt in or opt out, we should work together with computers in a transparent, collaborative and inclusive space.

Where shall we go as a society? What do we want from technology? As Audrey Tang,  Digital Minister without Portfolio of Taiwan, puts it: “Deliberation — listening to each other deeply, thinking together and working out something that we can all live with — is magical.”…(More)”.

Introducing the (World’s First) Ethical Operating System


Article by Paula Goldman and Raina Kumra: “Is it possible for tech developers to anticipate future risks? Or are these future risks so unknowable to us here in the present that, try as we might to make our tech safe, continued exposure to risks is simply the cost of engagement?

 Today, in collaboration with Institute for the Future (IFTF), a leading non-profit strategic futures organization, Omidyar Network is excited to introduce the Ethical Operating System (or Ethical OS for short), a toolkit for helping developers and designers anticipate the future impact of technologies they’re working on today. We designed the Ethical OS to facilitate better product development, faster deployment, and more impactful innovation — all while striving to minimize technical and reputational risks. The hope is that, with the Ethical OS in hand, technologists can begin to build responsibility into core business and product decisions, and contribute to a thriving tech industry.

The Ethical OS is already being piloted by nearly 20 tech companies, schools, and startups, including Mozilla and Techstars. We believe it can better equip technologists to grapple with three of the most pressing issues facing our community today:

    • If the technology you’re building right now will someday be used in unexpected ways, how can you hope to be prepared?

 

    • What new categories of risk should you pay special attention to right now?

 

  • Which design, team, or business model choices can actively safeguard users, communities, society, and your company from future risk?

As large sections of the public grow weary of a seemingly constant stream of data safety and security issues, and with growing calls for heightened government intervention and oversight, the time is now for the tech community to get this right.

We created the Ethical OS as a pilot to help make ethical thinking and future risk mitigation integral components of all design and development processes. It’s not going to be easy. The industry has far more work to do, both inside individual companies and collectively. But with our toolkit as a guide, developers will have a practical means of helping to begin working to ensure their tech is as good as their intentions…(More)”.

Mapping the Privacy-Utility Tradeoff in Mobile Phone Data for Development


Paper by Alejandro Noriega-Campero, Alex Rutherford, Oren Lederman, Yves A. de Montjoye, and Alex Pentland: “Today’s age of data holds high potential to enhance the way we pursue and monitor progress in the fields of development and humanitarian action. We study the relation between data utility and privacy risk in large-scale behavioral data, focusing on mobile phone metadata as paradigmatic domain. To measure utility, we survey experts about the value of mobile phone metadata at various spatial and temporal granularity levels. To measure privacy, we propose a formal and intuitive measure of reidentification riskthe information ratioand compute it at each granularity level. Our results confirm the existence of a stark tradeoff between data utility and reidentifiability, where the most valuable datasets are also most prone to reidentification. When data is specified at ZIP-code and hourly levels, outside knowledge of only 7% of a person’s data suffices for reidentification and retrieval of the remaining 93%. In contrast, in the least valuable dataset, specified at municipality and daily levels, reidentification requires on average outside knowledge of 51%, or 31 data points, of a person’s data to retrieve the remaining 49%. Overall, our findings show that coarsening data directly erodes its value, and highlight the need for using data-coarsening, not as stand-alone mechanism, but in combination with data-sharing models that provide adjustable degrees of accountability and security….(More)”.

Buzzwords and tortuous impact studies won’t fix a broken aid system


The Guardian: “Fifteen leading economists, including three Nobel winners, argue that the many billions of dollars spent on aid can do little to alleviate poverty while we fail to tackle its root causes….Donors increasingly want to see more impact for their money, practitioners are searching for ways to make their projects more effective, and politicians want more financial accountability behind aid budgets. One popular option has been to audit projects for results. The argument is that assessing “aid effectiveness” – a buzzword now ubiquitous in the UK’s Department for International Development – will help decide what to focus on.

Some go so far as to insist that development interventions should be subjected to the same kind of randomised control trials used in medicine, with “treatment” groups assessed against control groups. Such trials are being rolled out to evaluate the impact of a wide variety of projects – everything from water purification tablets to microcredit schemes, financial literacy classes to teachers’ performance bonuses.

Economist Esther Duflo at MIT’s Poverty Action Lab recently argued in Le Monde that France should adopt clinical trials as a guiding principle for its aid budget, which has grown significantly under the Macron administration.

But truly random sampling with blinded subjects is almost impossible in human communities without creating scenarios so abstract as to tell us little about the real world. And trials are expensive to carry out, and fraught with ethical challenges – especially when it comes to health-related interventions. (Who gets the treatment and who doesn’t?)

But the real problem with the “aid effectiveness” craze is that it narrows our focus down to micro-interventions at a local level that yield results that can be observed in the short term. At first glance this approach might seem reasonable and even beguiling. But it tends to ignore the broader macroeconomic, political and institutional drivers of impoverishment and underdevelopment. Aid projects might yield satisfying micro-results, but they generally do little to change the systems that produce the problems in the first place. What we need instead is to tackle the real root causes of poverty, inequality and climate change….(More)”.