How Can We Overcome the Challenge of Biased and Incomplete Data?


Knowledge@Wharton: “Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.  

In a conversation with Knowledge@Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases….

….Knowledge@Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?

Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you….(More)”.

A Taxonomy of Definitions for the Health Data Ecosystem


Announcement: “Healthcare technologies are rapidly evolving, producing new data sources, data types, and data uses, which precipitate more rapid and complex data sharing. Novel technologies—such as artificial intelligence tools and new internet of things (IOT) devices and services—are providing benefits to patients, doctors, and researchers. Data-driven products and services are deepening patients’ and consumers’ engagement and helping to improve health outcomes. Understanding the evolving health data ecosystem presents new challenges for policymakers and industry. There is an increasing need to better understand and document the stakeholders, the emerging data types and their uses.

The Future of Privacy Forum (FPF) and the Information Accountability Foundation (IAF) partnered to form the FPF-IAF Joint Health Initiative in 2018. Today, the Initiative is releasing A Taxonomy of Definitions for the Health Data Ecosystem; the publication is intended to enable a more nuanced, accurate, and common understanding of the current state of the health data ecosystem. The Taxonomy outlines the established and emerging language of the health data ecosystem. The Taxonomy includes definitions of:

  • The stakeholders currently involved in the health data ecosystem and examples of each;
  • The common and emerging data types that are being collected, used, and shared across the health data ecosystem;
  • The purposes for which data types are used in the health data ecosystem; and
  • The types of actions that are now being performed and which we anticipate will be performed on datasets as the ecosystem evolves and expands.

This report is as an educational resource that will enable a deeper understanding of the current landscape of stakeholders and data types….(More)”.

Platforms that trigger innovation


Report by the Caixa Foundation: “…The Work4Progress programme thus supports the creation of “Open Innovation Platforms for the creation of employment in Peru, India and Mozambique” by means of collaborative partnerships between local civil society organisations, private sector, administration, universities and Spanish NGOs.

The main innovation of this programme is the incorporation of new tools and methodologies in: (1) listening and identification of community needs, (2) the co-creation and prototyping of new solutions, (3) the exploration of instruments for scaling, (4) governance, (5) evolving evaluation systems and (6) financing strategies. The goal of all of the above is to try to incorporate innovation strategies comprehensively in all components.

Work4Progress has been designed with a Think-and-Do-Tank mentality. The
member organisations of the platforms are experimenting in the field, while a group of international experts helps us to obtain this knowledge and share it with centres of thought and action at international level. In fact, this is the objective of this publication: to share the theoretical framework of the programme, to connect these ideas with concrete examples and to continue to strengthen the meeting point between social innovation and development cooperation.

Work4Progress is offered as a ‘living lab’ to test new methodologies that may be useful for other philanthropic institutions, governments or entities specialising in international development….(More)”.

Commission publishes guidance on free flow of non-personal data


European Commission: “The guidance fulfils an obligation in the Regulation on the free flow of non-personal data (FFD Regulation), which requires the Commission to publish a guidance on the interaction between this Regulation and the General Data Protection Regulation (GDPR), especially as regards datasets composed of both personal and non-personal data. It aims to help users – in particular small and medium-sized enterprises – understand the interaction between the two regulations.

In line with the existing GDPR documents, prepared by the European Data Protection Board, this guidance document aims to clarify which rules apply when processing personal and non-personal data. It gives a useful overview of the central concepts of the free flow of personal and non-personal data within the EU, while explaining the relation between the two Regulations in practical terms and with concrete examples….

Non-personal data are distinct from personal data, as laid down in the GDPR Regulation. The non-personal data can be categorised in terms of origin, namely:

  • data which originally did not relate to an identified or identifiable natural person, such as data on weather conditions generated by sensors installed on wind turbines, or data on maintenance needs for industrial machines; or
  • data which was initially personal data, but later made anonymous.

While the guidance refers to more examples of non-personal data, it also explains the concept of personal data, anonymised and pseudonymised, to provide a better understanding as well describes the limitations between personal and non-personal data.

What are mixed datasets?

In most real-life situations, a dataset is very likely to be composed of both personal and non-personal data. This is often referred to as a “mixed dataset”. Mixed datasets represent the majority of datasets used in the data economy and commonly gathered thanks to technological developments such as the Internet of Things (i.e. digitally connecting objects), artificial intelligence and technologies enabling big data analytics.

Examples of mixed datasets include a company’s tax records, mentioning the name and telephone number of the managing director of the company. This can also include a company’s knowledge of IT problems and solutions based on individual incident reports, or a research institution’s anonymised statistical data and the raw data initially collected, such as the replies of individual respondents to statistical survey questions….(More)”.

MegaPixels


About: “…MegaPixels is an art and research project first launched in 2017 for an installation at Tactical Technology Collective’s GlassRoom about face recognition datasets. In 2018 MegaPixels was extended to cover pedestrian analysis datasets for a commission by Elevate Arts festival in Austria. Since then MegaPixels has evolved into a large-scale interrogation of hundreds of publicly-available face and person analysis datasets, the first of which launched on this site in April 2019.

MegaPixels aims to provide a critical perspective on machine learning image datasets, one that might otherwise escape academia and industry funded artificial intelligence think tanks that are often supported by the several of the same technology companies who have created datasets presented on this site.

MegaPixels is an independent project, designed as a public resource for educators, students, journalists, and researchers. Each dataset presented on this site undergoes a thorough review of its images, intent, and funding sources. Though the goals are similar to publishing an academic paper, MegaPixels is a website-first research project, with an academic publication to follow.

One of the main focuses of the dataset investigations presented on this site is to uncover where funding originated. Because of our emphasis on other researcher’s funding sources, it is important that we are transparent about our own….(More)”.

Principles and Policies for “Data Free Flow With Trust”


Paper by Nigel Cory, Robert D. Atkinson, and Daniel Castro: “Just as there was a set of institutions, agreements, and principles that emerged out of Bretton Woods in the aftermath of World War II to manage global economic issues, the countries that value the role of an open, competitive, and rules-based global digital economy need to come together to enact new global rules and norms to manage a key driver of today’s global economy: data. Japanese Prime Minister Abe’s new initiative for “data free flow with trust,” combined with Japan’s hosting of the G20 and leading role in e-commerce negotiations at the World Trade Organization (WTO), provides a valuable opportunity for many of the world’s leading digital economies (Australia, the United States, and European Union, among others) to rectify the gradual drift toward a fragmented and less-productive global digital economy. Prime Minister Abe is right in proclaiming, “We have yet to catch up with the new reality, in which data drives everything, where the D.F.F.T., the Data Free Flow with Trust, should top the agenda in our new economy,” and right in his call “to rebuild trust toward the system for international trade. That should be a system that is fair, transparent, and effective in protecting IP and also in such areas as e-commerce.”

The central premise of this effort should be a recognition that data and data-driven innovation are a force for good. Across society, data innovation—the use of data to create value—is creating more productive and innovative economies, transparent and responsive governments, better social outcomes (improved health care, safer and smarter cities, etc.).3But to maximize the innovative and productivity benefits of data, countries that support an open, rules-based global trading system need to agree on core principles and enact common rules. The benefits of a rules-based and competitive global digital economy are at risk as a diverse range of countries in various stages of political and economic development have policy regimes that undermine core processes, especially the flow of data and its associated legal responsibilities; the use of encryption to protect data and digital activities and technologies; and the blocking of data constituting illegal, pirated content….(More)”.

A Symphony, Not a Solo: How Collective Management Organisations Can Embrace Innovation and Drive Data Sharing in the Music Industry


Paper by David Osimo, Laia Pujol Priego, Turo Pekari and Ano Sirppiniemi: “…data is becoming a fundamental source of competitive advantage in music, just as in other sectors, and streaming services in particular are generating large volume of new data offering unique insight around customer taste and behavior. (As Financial Times recently put it, the music
industry is having its “moneyball” moment) But how are the different players getting ready for this change?

This policy brief aims to look at the question from the perspective of CMOs, the organisations charged with redistributing royalties from music users to music rightsholders (such as musical authors and publishers).

The paper is divided in three sections. Part I will look at the current positioning of CMOs in this new data-intensive ecosystem. Part II will discuss how greater data sharing and reuse can maximize innovation, comparing the music industries with other industries. Part III will make policy and business-model reform recommendations for CMOs to stimulate data-driven innovation, internally and in the industry as a whole….(More)”

Democracy in Retreat: Freedom in the World 2019


Freedom House: “In 2018, Freedom in the World recorded the 13th consecutive year of decline in global freedom. The reversal has spanned a variety of countries in every region, from long-standing democracies like the United States to consolidated authoritarian regimes like China and Russia. The overall losses are still shallow compared with the gains of the late 20th century, but the pattern is consistent and ominous. Democracy is in retreat.

In states that were already authoritarian, earning Not Free designations from Freedom House, governments have increasingly shed the thin façade of democratic practice that they established in previous decades, when international incentives and pressure for reform were stronger. More authoritarian powers are now banning opposition groups or jailing their leaders, dispensing with term limits, and tightening the screws on any independent media that remain. Meanwhile, many countries that democratized after the end of the Cold War have regressed in the face of rampant corruption, antiliberal populist movements, and breakdowns in the rule of law. Most troublingly, even long-standing democracies have been shaken by populist political forces that reject basic principles like the separation of powers and target minorities for discriminatory treatment.

Some light shined through these gathering clouds in 2018. Surprising improvements in individual countries—including Malaysia, Armenia, Ethiopia, Angola, and Ecuador—show that democracy has enduring appeal as a means of holding leaders accountable and creating the conditions for a better life. Even in the countries of Europe and North America where democratic institutions are under pressure, dynamic civic movements for justice and inclusion continue to build on the achievements of their predecessors, expanding the scope of what citizens can and should expect from democracy. The promise of democracy remains real and powerful. Not only defending it but broadening its reach is one of the great causes of our time….(More)”.

Data Stewardship on the map: A study of tasks and roles in Dutch research institutes


Report by Verheul, Ingeborg et al: “Good research requires good data stewardship. Data stewardship encompasses all the different tasks and responsibilities that relate to caring for data during the various phases of the whole research life cycle. The basic assumption is that the researcher himself/herself is primarily responsible for all data.

However, the researcher does need professional support to achieve this. To that end, diverse supportive data stewardship roles and functions have evolved in recent years. Often they have developed over the course of time.

Their functional implementation depends largely on their place in the organization. This comes as no surprise when one considers that data stewardship consists of many facets that are traditionally assigned to different departments. Researchers regularly take on data stewardship tasks as well, not only for themselves but also in a wider context for a research group. This data stewardship work often remains unnoticed….(More)”.

Social Media Monitoring: How the Department of Homeland Security Uses Digital Data in the Name of National Security


Report by the Brennan Center for Justice: “The Department of Homeland Security (DHS) is rapidly expanding its collection of social media information and using it to evaluate the security risks posed by foreign and American travelers. This year marks a major expansion. The visa applications vetted by DHS will include social media handles that the State Department is set to collect from some 15 million travelers per year.1 Social media can provide a vast trove of information about individuals, including their personal preferences, political and religious views, physical and mental health, and the identity of their friends and family. But it is susceptible to misinterpretation, and wholesale monitoring of social media creates serious risks to privacy and free speech. Moreover, despite the rush to implement these programs, there is scant evidence that they actually meet the goals for which they are deployed…(More)”