Using “Big Data” to forecast migration


Blog Post by Jasper Tjaden, Andres Arau, Muertizha Nuermaimaiti, Imge Cetin, Eduardo Acostamadiedo, Marzia Rango: Act 1 — High Expectations

“Data is the new oil,” they say. ‘Big Data’ is even bigger than that. The “data revolution” will contribute to solving societies’ problems and help governments adopt better policies and run more effective programs. In the migration field, digital trace data are seen as a potentially powerful tool to improve migration management processes (visa applicationsasylum decision and geographic allocation of asylum seeker, facilitating integration, “smart borders” etc.).1

Forecasting migration is one particular area where big data seems to excite data nerds (like us) and policymakers alike. If there is one way big data has already made a difference, it is its ability to bring different actors together — data scientists, business people and policy makers — to sit through countless slides with numbers, tables and graphs. Traditional migration data sources, like censuses, administrative data and surveys, have never quite managed to generate the same level of excitement.

Many EU countries are currently heavily investing in new ways to forecast migration. Relatively large numbers of asylum seekers in 2014, 2015 and 2016 strained the capacity of many EU governments. Better forecasting tools are meant to help governments prepare in advance.

In a recent European Migration Network study, 10 out of the 22 EU governments surveyed said they make use of forecasting methods, many using open source data for “early warning and risk analysis” purposes. The 2020 European Migration Network conference was dedicated entirely to the theme of forecasting migration, hosting more than 15 expert presentations on the topic. The recently proposed EU Pact on Migration and Asylum outlines a “Migration Preparedness and Crisis Blueprint” which “should provide timely and adequate information in order to establish the updated migration situational awareness and provide for early warning/forecasting, as well as increase resilience to efficiently deal with any type of migration crisis.” (p. 4) The European Commission is currently finalizing a feasibility study on the use of artificial intelligence for predicting migration to the EU; Frontex — the EU Border Agency — is scaling up efforts to forecast irregular border crossings; EASO — the European Asylum Support Office — is devising a composite “push-factor index” and experimenting with forecasting asylum-related migration flows using machine learning and data at scale. In Fall 2020, during Germany’s EU Council Presidency, the German Interior Ministry organized a workshop series around Migration 4.0 highlighting the benefits of various ways to “digitalize” migration management. At the same time, the EU is investing substantial resources in migration forecasting research under its Horizon2020 programme, including QuantMigITFLOWS, and HumMingBird.

Is all this excitement warranted?

Yes, it is….(More)” See also: Big Data for Migration Alliance

These crowdsourced maps will show exactly where surveillance cameras are watching


Mark Sullivan at FastCompany: “Amnesty International is producing a map of all the places in New York City where surveillance cameras are scanning residents’ faces.

The project will enlist volunteers to use their smartphones to identify, photograph, and locate government-owned surveillance cameras capable of shooting video that could be matched against people’s faces in a database through AI-powered facial recognition.

The map that will eventually result is meant to give New Yorkers the power of information against an invasive technology the usage of which and purpose is often not fully disclosed to the public. It’s also meant to put pressure on the New York City Council to write and pass a law restricting or banning it. Other U.S. cities, such as Boston, Portland, and San Francisco, have already passed such laws.

Facial recognition technology can be developed by scraping millions of images from social media profiles and driver’s licenses without people’s consent, Amnesty says. Software from companies like Clearview AI can then use computer vision algorithms to match those images against facial images captured by closed-circuit television (CCTV) or other video surveillance cameras and stored in a database.

Starting in May, volunteers will be able to use a software tool to identify all the facial recognition cameras within their view—like at an intersection where numerous cameras can often be found. The tool, which runs on a phone’s browser, lets users place a square around any cameras they see. The software integrates Google Street View and Google Earth to help volunteers label and attach geolocation data to the cameras they spot.

The map is part of a larger campaign called “Ban the Scan” that’s meant to educate people around the world on the civil rights dangers of facial recognition. Research has shown that facial recognition systems aren’t as accurate when it comes to analyzing dark-skinned faces, putting Black people at risk of being misidentified. Even when accurate, the technology exacerbates systemic racism because it is disproportionately used to identify people of color, who are already subject to discrimination by law enforcement officials. The campaign is sponsored by Amnesty in partnership with a number of other tech advocacy, privacy, and civil liberties groups.

In the initial phase of the project, which was announced last Thursday, Amnesty and its partners launched a website that New Yorkers can use to generate public comments on the New York Police Department’s (NYPD’s) use of facial recognition….(More)”.

Inside India’s booming dark data economy


Snigdha Poonam and Samarath Bansal at the Rest of the World: “…The black market for data, as it exists online in India, resembles those for wholesale vegetables or smuggled goods. Customers are encouraged to buy in bulk, and the variety of what’s on offer is mind-boggling: There are databases about parents, cable customers, pregnant women, pizza eaters, mutual funds investors, and almost any niche group one can imagine. A typical database consists of a spreadsheet with row after row of names and key details: Sheila Gupta, 35, lives in Kolkata, runs a travel agency, and owns a BMW; Irfaan Khan, 52, lives in Greater Noida, and has a son who just applied to engineering college. The databases are usually updated every three months (the older one is, the less it is worth), and if you buy several at the same time, you’ll get a discount. Business is always brisk, and transactions are conducted quickly. No one will ask you for your name, let alone inquire why you want the phone numbers of five million people who have applied for bank loans.

There isn’t a reliable estimate of the size of India’s data economy or of how much money it generates annually. Regarding the former, each broker we spoke to had a different guess: One said only about one or two hundred professionals make up the top tier, another that every big Indian city has at least a thousand people trading data. To find them, potential customers need only look for their ads on social media or run searches with industry keywords and hashtags — “data,” “leads,” “database” — combined with detailed information about the kind of data they want and the city they want it from.

Privacy experts believe that the data-brokering industry has existed since the early days of the internet’s arrival in India. “Databases have been bought and sold in India for at least 15 years now. I remember a case from way back in 2006 of leaked employee data from Naukri.com (one of India’s first online job portals) being sold on CDs,” says Nikhil Pahwa, the editor and publisher of MediaNama, which covers technology policy. By 2009, data brokers were running SMS-marketing companies that offered complementary services: procuring targeted data and sending text messages in bulk. Back then, there was simply less data, “and those who had it could sell it at whatever price,” says Himanshu Bhatt, a data broker who claims to be retired. That is no longer the case: “Today, everyone has every kind of data,” he said.

No broker we contacted would openly discuss their methods of hunting, harvesting, and selling data. But the day-to-day work generally consists of following the trails that people leave during their travels around the internet. Brokers trawl data storage websites armed with a digital fishing net. “I was shocked when I was surfing [cloud-hosted data sites] one day and came across Aadhaar cards,” Bhatt remarked, referring to India’s state-issued biometric ID cards. Images of them were available to download in bulk, alongside completed loan applications and salary sheets.

Again, the legal boundaries here are far from clear. Anybody who has ever filled out a form on a coupon website or requested a refund for a movie ticket has effectively entered their information into a database that can be sold without their consent by the company it belongs to. A neighborhood cell phone store can sell demographic information to a political party for hyperlocal campaigning, and a fintech company can stealthily transfer an individual’s details from an astrology app onto its own server, to gauge that person’s creditworthiness. When somebody shares employment history on LinkedIn or contact details on a public directory, brokers can use basic software such as web scrapers to extract that data.

But why bother hacking into a database when you can buy it outright? More often, “brokers will directly approach a bank employee and tell them, ‘I need the high-end database’,” Bhatt said. And as demand for information increases, so, too, does data vulnerability. A 2019 survey found that 69% of Indian companies haven’t set up reliable data security systems; 44% have experienced at least one breach already. “In the past 12 months, we have seen an increasing trend of Indians’ data [appearing] on the dark web,” says Beenu Arora, the CEO of the global cyberintelligence firm Cyble….(More)”.

The Politics of Technology in Latin America


Book edited by Avery Plaw, Barbara Carvalho Gurgel and David Ramírez Plascencia: “This book analyses the arrival of emerging and traditional information and technology for public and economic use in Latin America. It focuses on the governmental, economic and security issues and the study of the complex relationship between citizens and government.

The book is divided into three parts:

• ‘Digital data and privacy, prospects and barriers’ centers on the debates among the right of privacy and the loss of intimacy in the Internet,

• ‘Homeland security and human rights’ focuses on how novel technologies such as drones and autonomous weapons systems reconfigure the strategies of police authorities and organized crime,

• ‘Labor Markets, digital media and emerging technologies’ emphasize the legal, economic and social perils and challenges caused by the increased presence of social media, blockchain-based applications, artificial intelligence and automation technologies in the Latin American economy….(More)”.

Enslaved.org


About: “As of December 2020, we have built a robust, open-source architecture to discover and explore nearly a half million people records and 5 million data points. From archival fragments and spreadsheet entries, we see the lives of the enslaved in richer detail. Yet there’s much more work to do, and with the help of scholars, educators, and family historians, Enslaved.org will be rapidly expanding in 2021. We are just getting started….

In recent years, a growing number of archives, databases, and collections that organize and make sense of records of enslavement have become freely and readily accessible for scholarly and public consumption. This proliferation of projects and databases presents a number of challenges:

  • Disambiguating and merging individuals across multiple datasets is nearly impossible given their current, siloed nature;
  • Searching, browsing, and quantitative analysis across projects is extremely difficult;
  • It is often difficult to find projects and databases;
  • There are no best practices for digital data creation;
  • Many projects and datasets are in danger of going offline and disappearing.

In response to these challenges, Matrix: The Center for Digital Humanities & Social Sciences at Michigan State University (MSU), in partnership with the MSU Department of History, University of Maryland, and scholars at multiple institutions, developed Enslaved: Peoples of the Historical Slave TradeEnslaved.org’s primary focus is people—individuals who were enslaved, owned slaves, or participated in slave trading….(More)”.

Reclaiming Free Speech for Democracy and Human Rights in a Digitally Networked World


Paper by Rebecca MacKinnon: : “…divided into three sections. The first section discusses the relevance of international human rights standards to U.S. internet platforms and universities. The second section identifies three common challenges to universities and internet platforms, with clear policy implications. The third section recommends approaches to internet policy that can better protect human rights and strengthen democracy. The paper concludes with proposals for how universities can contribute to the creation of a more robust digital information ecosystem that protects free speech along with other human rights, and advances social justice.

1) International human rights standards are an essential complement to the First Amendment. While the First Amendment does not apply to how privately owned and operated digital platforms set and enforce rules governing their users’ speech, international human rights standards set forth a clear framework to which companies any other type of private organization can and should be held accountable. Scholars of international law and freedom of expression point out that Article 19 of the International Covenant on Civil and Political Rights encompasses not only free speech, but also the right to access information and to formulate opinions without interference. Notably, this aspect of international human rights law is relevant in addressing the harms caused by disinformation campaigns aided by algorithms and targeted profiling. In protecting freedom of expression, private companies and organizations must also protect and respect other human rights, including privacy, non-discrimination, assembly, the right to political participation, and the basic right to security of person.

2) Three core challenges are common to universities and internet platforms. These common challenges must be addressed in order to protect free speech alongside other fundamental human rights including non-discrimination:

Challenge 1: The pretense of neutrality amplifies bias in an unjust world. In an inequitable and unjust world, “neutral” platforms and institutions will perpetuate and even exacerbate inequities and power imbalances unless they understand and adjust for those inequities and imbalances. This fundamental civil rights concept is better understood by the leaders of universities than by those in charge of social media platforms, which have clear impact on public discourse and civic engagement.

Challenge 2: Rules and enforcement are inadequate without strong leadership and cultural norms. Rules governing speech, and their enforcement, can be ineffective and even counterproductive unless they are accompanied by values-based leadership. Institutional cultures should take into account the context and circumstances of unique situations, individuals, and communities. For rules to have legitimacy, communities that are governed by them must be actively engaged in building a shared culture of responsibility.

Challenge 3: Communities need to be able to shape how and where they enable discourse and conduct learning. Different types of discourse that serve different purposes require differently designed spaces—be they physical or digital. It is important for communities to be able to set their own rules of engagement, and shape their spaces for different types of discourse. Overdependence upon a small number of corporate-controlled platforms does not serve communities well. Online free speech not only will be better served by policies that foster competition and strengthen antitrust law; policies and resources must also support the development of nonprofit, open source, and community-driven digital public infrastructure.

3) A clear and consistent policy environment that supports civil rights objectives and is compatible with human rights standards is essential to ensure that the digital public sphere evolves in a way that genuinely protects free speech and advances social justice. Analysis of twenty different consensus declarations, charters, and principles produced by international coalitions of civil society organizations reveals broad consensus with U.S.-based advocates of civil rights-compatible technology policy….(More)”.

Using artificial intelligence to make decisions: Addressing the problem of algorithmic bias (2020)


Foreword of a Report by the Australian Human Rights Commission: “Artificial intelligence (AI) promises better, smarter decision making.

Governments are starting to use AI to make decisions in welfare, policing and law enforcement, immigration, and many other areas. Meanwhile, the private sector is already using AI to make decisions about pricing and risk, to determine what sorts of people make the ‘best’ customers… In fact, the use cases for AI are limited only by our imagination.

However, using AI carries with it the risk of algorithmic bias. Unless we fully understand and address this risk, the promise of AI will be hollow.

Algorithmic bias is a kind of error associated with the use of AI in decision making, and often results in unfairness. Algorithmic bias can arise in many ways. Sometimes the problem is with the design of the AI-powered decision-making tool itself. Sometimes the problem lies with the data set that was used to train the AI tool, which could replicate or even make worse existing problems, including societal inequality.

Algorithmic bias can cause real harm. It can lead to a person being unfairly treated, or even suffering unlawful discrimination, on the basis of characteristics such as their race, age, sex or disability.

This project started by simulating a typical decision-making process. In this technical paper, we explore how algorithmic bias can ‘creep in’ to AI systems and, most importantly, how this problem can be addressed.

To ground our discussion, we chose a hypothetical scenario: an electricity retailer uses an AI-powered tool to decide how to offer its products to customers, and on what terms. The general principles and solutions for mitigating the problem, however, will be relevant far beyond this specific situation.

Because algorithmic bias can result in unlawful activity, there is a legal imperative to address this risk. However, good businesses go further than the bare minimum legal requirements, to ensure they always act ethically and do not jeopardise their good name.

Rigorous design, testing and monitoring can avoid algorithmic bias. This technical paper offers some guidance for companies to ensure that when they use AI, their decisions are fair, accurate and comply with human rights….(More)”

Four Principles to Make Data Tools Work Better for Kids and Families


Blog by the Annie E. Casey Foundation: “Advanced data analytics are deeply embedded in the operations of public and private institutions and shape the opportunities available to youth and families. Whether these tools benefit or harm communities depends on their design, use and oversight, according to a report from the Annie E. Casey Foundation.

Four Principles to Make Advanced Data Analytics Work for Children and Families examines the growing field of advanced data analytics and offers guidance to steer the use of big data in social programs and policy….

The Foundation report identifies four principles — complete with examples and recommendations — to help steer the growing field of data science in the right direction.

Four Principles for Data Tools

  1. Expand opportunity for children and families. Most established uses of advanced analytics in education, social services and criminal justice focus on problems facing youth and families. Promising uses of advanced analytics go beyond mitigating harm and help to identify so-called odds beaters and new opportunities for youth.
    • Example: The Children’s Data Network at the University of Southern California is helping the state’s departments of education and social services explore why some students succeed despite negative experiences and what protective factors merit more investment.
    • Recommendation: Government and its philanthropic partners need to test if novel data science applications can create new insights and when it’s best to apply them.
       
  2. Provide transparency and evidence. Advanced analytical tools must earn and maintain a social license to operate. The public has a right to know what decisions these tools are informing or automating, how they have been independently validated, and who is accountable for answering and addressing concerns about how they work.
    • Recommendations: Local and state task forces can be excellent laboratories for testing how to engage youth and communities in discussions about advanced analytics applications and the policy frameworks needed to regulate their use. In addition, public and private funders should avoid supporting private algorithms whose design and performance are shielded by trade secrecy claims. Instead, they should fund and promote efforts to develop, evaluate and adapt transparent and effective models.
       
  3. Empower communities. The field of advanced data analytics often treats children and families as clients, patients and consumers. Put to better use, these same tools can help elucidate and reform the systems acting upon children and families. For this shift to occur, institutions must focus analyses and risk assessments on structural barriers to opportunity rather than individual profiles.
    • Recommendation: In debates about the use of data science, greater investment is needed to amplify the voices of youth and their communities.
       
  4. Promote equitable outcomes. Useful advanced analytics tools should promote more equitable outcomes for historically disadvantaged groups. New investments in advanced analytics are only worthwhile if they aim to correct the well-documented bias embedded in existing models.
    • Recommendations: Advanced analytical tools should only be introduced when they reduce the opportunity deficit for disadvantaged groups — a move that will take organizing and advocacy to establish and new policy development to institutionalize. Philanthropy and government also have roles to play in helping communities test and improve tools and examples that already exist….(More)”.

Right/Wrong:How Technology Transforms Our Ethics


Book by Juan Enriquez: “Most people have a strong sense of right and wrong, and they aren’t shy about expressing their opinions. But when we take a polarizing stand on something we regard as an eternal truth, we often forget that ethics evolve over time. Many shifts in the right versus wrong pendulum are driven by advances in technology. Our great-grandparents might be shocked by in vitro fertilization; our great-grandchildren might be shocked by the messiness of pregnancy, childbirth, and unedited genes. In Right/Wrong, Juan Enriquez reflects on what happens to our ethics as technology makes the once unimaginable a commonplace occurrence.

Evolving technology changes ethics. Enriquez points out that, contrary to common wisdom, technology often enables more ethical behaviors. Technology challenges old beliefs and upends institutions that do not grow and change. With wit and compassion, Enriquez takes on a series of technology-influenced ethical dilemmas, from sexual liberation to climate change to the “immortality” of mistakes on social media. (“Facebook, Twitter, Instagram, and Google are electronic tattoos.”) He cautions us to judge those who “should have known better,” given today’s vantage point, with less fury and more compassion. We need a quality often absent in today’s charged debates: humility. Judge those in the past as we hope to be judged in the future….(More)”.

The CARE Principles for Indigenous Data Governance


Paper by Stephanie Russo Carroll et al: “Concerns about secondary use of data and limited opportunities for benefit-sharing have focused attention on the tension that Indigenous communities feel between (1) protecting Indigenous rights and interests in Indigenous data (including traditional knowledges) and (2) supporting open data, machine learning, broad data sharing, and big data initiatives. The International Indigenous Data Sovereignty Interest Group (within the Research Data Alliance) is a network of nation-state based Indigenous data sovereignty networks and individuals that developed the ‘CARE Principles for Indigenous Data Governance’ (Collective Benefit, Authority to Control, Responsibility, and Ethics) in consultation with Indigenous Peoples, scholars, non-profit organizations, and governments. The CARE Principles are people– and purpose-oriented, reflecting the crucial role of data in advancing innovation, governance, and self-determination among Indigenous Peoples. The Principles complement the existing data-centric approach represented in the ‘FAIR Guiding Principles for scientific data management and stewardship’ (Findable, Accessible, Interoperable, Reusable). The CARE Principles build upon earlier work by the Te Mana Raraunga Maori Data Sovereignty Network, US Indigenous Data Sovereignty Network, Maiam nayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Collective, and numerous Indigenous Peoples, nations, and communities. The goal is that stewards and other users of Indigenous data will ‘Be FAIR and CARE.’ In this first formal publication of the CARE Principles, we articulate their rationale, describe their relation to the FAIR Principles, and present examples of their application….(More)” See also Selected Readings on Indigenous Data Sovereignty.