An Obsolete Paradigm


Blogpost by Paul Wormelli: “…Our national system of describing the extent of crime in the U.S. is broken beyond repair and deserves to be replaced by a totally new paradigm (system). 

Since 1930, we have relied on the metrics generated by the Uniform Crime Reporting (UCR) Program to describe crime in the U.S., but it simply does not do so, even with its evolution into the National Incident-Based Reporting System (NIBRS). Criminologists have long recognized the limited scope of the UCR summary crime data, leading to the creation of the National Crime Victimization Survey (NCVS) and other supplementary crime data measurement vehicles. However, despite these measures, the United States still has no comprehensive national data on the amount of crime that has occurred. Even after decades of collecting data, the 1968 Presidential Crime Commission report on the Challenge of Crime in a Free Society lamented the absence of sound and complete data on crime in the U.S., and called for the creation of a National Crime Survey (NCS) that eventually led to the creation of the NCVS. Since then, we have slowly attempted to make improvements that will lead to more robust data. Only in 2021 did the FBI end UCR summary-based crime data collection and move to NIBRS crime data collection on a national scale.

Admittedly, the shift to NIBRS will unleash a sea change in how we analyze crime data and use it for decision making. However, it still lacks the completeness of national crime reporting. In the landmark study of the National Academy of Sciences Committee on Statistics (funded by the FBI and the Bureau of Justice Statistics to make recommendations on modernizing crime statistics), the panel members grappled with this reality and called out the absence of national statistics on crime that would fully inform policymaking on this critical subject….(More)”

The coloniality of collaboration: sources of epistemic obedience in data-intensive astronomy in Chile


Paper by Sebastián Lehuedé: “Data collaborations have gained currency over the last decade as a means for data- and skills-poor actors to thrive as a fourth paradigm takes hold in the sciences. Against this backdrop, this article traces the emergence of a collaborative subject position that strives to establish reciprocal and technical-oriented collaborations so as to catch up with the ongoing changes in research.

Combining insights from the modernity/coloniality group, political theory and science and technology studies, the article argues that this positionality engenders epistemic obedience by bracketing off critical questions regarding with whom and for whom knowledge is generated. In particular, a dis-embedding of the data producers, the erosion of local ties, and a data conformism are identified as fresh sources of obedience impinging upon the capacity to conduct research attuned to the needs and visions of the local context. A discursive-material analysis of interviews and field notes stemming from the case of astronomy data in Chile is conducted, examining the vision of local actors aiming to gain proximity to the mega observatories producing vast volumes of data in the Atacama Desert.

Given that these observatories are predominantly under the control of organisations from the United States and Europe, the adoption of a collaborative stance is now seen as the best means to ensure skills and technology transfer to local research teams. Delving into the epistemological dimension of data colonialism, this article warns that an increased emphasis on collaboration runs the risk of reproducing planetary hierarchies in times of data-intensive research….(More)”.

Inclusive SDG Data Partnerships


Learning report” by Partners for Review (P4R/GIZ), the Danish Institute for Human Rights (DIHR), and the International Civil Society Centre: “It brought together National SDG Units, National Statistics Offices, National Human Rights Institutions and civil society organisations from across six countries. The initiative’s purpose is to advance data partnerships for the SDGs and to strengthen multi-actor data ecosystems at the national level. Goal is to meet the SDG data challenge by improving the use of alternative data sources, particularly data produced by civil society and human rights institutions, and complementary to official statistics….(More)”.

The Open Data Policy Lab’s City Incubator


The GovLab: “Hackathons. Data Jams. Dashboards. Mapping, analyzing, and releasing open data. These are some of the essential first steps in building a data-driven culture in government. Yet, it’s not always easy to get data projects such as these off the ground. Governments often work in difficult situations under constrained resources. They have to manage various stakeholders and constituencies who have to be sold on the value that data can generate in their daily work.

Through the Open Data Policy Lab, The GovLab and Microsoft are providing various resources — such as the Data Stewards Academy, and the Third Wave of Open Data Toolkit — to support this goal. Still, we recognize that more tailored guidance is needed so cities can build new sustainable data infrastructure and launch projects that meet their policy goals.

Today, we’re providing that resource in the form of the Open Data Policy Lab’s City Incubator. A first-of-its-kind program to support data innovations in cities’ success and scale, the City Incubator will give 10 city officials access to the hands-on training and access to mentors to take their ideas to the next level. It will enable cutting edge work on various urban challenges and empower officials to create data collaboratives, data-sharing agreements, and other systems. This work is supported by Microsoft, Mastercard City Possible, Luminate, NYU CUSP and the Public Sector Network.

Our team is launching a call for ten city government intrapreneurs from around the world working on data-driven projects to apply to the City Incubator. Over the course of six months, participants will use start-up innovation and public sector program solving frameworks to develop and launch new data innovations. They will also receive support from a council of mentors from around the world.

Applications are due August 31, with an early application deadline of August 6 for applicants looking for feedback. Applicants are expected to present their idea and include information on the value their proposal will generate, the resources it will use, the partners it will involve, and the risks it might entail alongside other information in the form of a Data Innovation Canvas. Additional information can be found on the website here.”

The Data Innovation Canvas

Household Financial Transaction Data


Paper by Scott R. Baker & Lorenz Kueng: “The growth of the availability and use of detailed household financial transaction microdata has dramatically expanded the ability of researchers to understand both household decision-making as well as aggregate fluctuations across a wide range of fields. This class of transaction data is derived from a myriad of sources including financial institutions, FinTech apps, and payment intermediaries. We review how these detailed data have been utilized in finance and economics research and the benefits they enable beyond more traditional measures of income, spending, and wealth. We discuss the future potential for this flexible class of data in firm-focused research, real-time policy analysis, and macro statistics….(More)”.

The Inevitable Weaponization of App Data Is Here


Joseph Cox at VICE: “…After years of warning from researchers, journalists, and even governments, someone used highly sensitive location data from a smartphone app to track and publicly harass a specific person. In this case, Catholic Substack publication The Pillar said it used location data ultimately tied to Grindr to trace the movements of a priest, and then outed him publicly as potentially gay without his consent. The Washington Post reported on Tuesday that the outing led to his resignation….

The data itself didn’t contain each mobile phone user’s real name, but The Pillar and its partner were able to pinpoint which device belonged to Burill by observing one that appeared at the USCCB staff residence and headquarters, locations of meetings that he was in, as well as his family lake house and an apartment that has him listed as a resident. In other words, they managed to, as experts have long said is easy to do, unmask this specific person and their movements across time from an supposedly anonymous dataset.

A Grindr spokesperson told Motherboard in an emailed statement that “Grindr’s response is aligned with the editorial story published by the Washington Post which describes the original blog post from The Pillar as homophobic and full of unsubstantiated inuendo. The alleged activities listed in that unattributed blog post are infeasible from a technical standpoint and incredibly unlikely to occur. There is absolutely no evidence supporting the allegations of improper data collection or usage related to the Grindr app as purported.”…

“The research from The Pillar aligns to the reality that Grindr has historically treated user data with almost no care or concern, and dozens of potential ad tech vendors could have ingested the data that led to the doxxing,” Zach Edwards, a researcher who has closely followed the supply chain of various sources of data, told Motherboard in an online chat. “No one should be doxxed and outed for adult consenting relationships, but Grindr never treated their own users with the respect they deserve, and the Grindr app has shared user data to dozens of ad tech and analytics vendors for years.”…(More)”.

Financial data unbound: The value of open data for individuals and institutions


Paper by McKinsey Global Institute: “As countries around the world look to ensure rapid recovery once the COVID-19 crisis abates, improved financial services are emerging as a key element to boost growth, raise economic efficiency, and lift productivity. Robust digital financial infrastructure proved its worth during the crisis, helping governments cushion people and businesses from the economic shock of the pandemic. The next frontier is to create an open-data ecosystem for finance.

Already, technological, regulatory, and competitive forces are moving markets toward easier and safer financial data sharing. Open-data initiatives are springing up globally, including the United Kingdom’s Open Banking Implementation Entity, the European Union’s second payment services directive, Australia’s new consumer protection laws, Brazil’s drafting of open data guidelines, and Nigeria’s new Open Technology Foundation (Open Banking Nigeria). In the United States, the Consumer Financial Protection Bureau aims to facilitate a consumer-authorized data-sharing market, while the Financial Data Exchange consortium attempts to promote common, interoperable standards for secure access to financial data. Yet, even as many countries put in place stronger digital financial infrastructure and data-sharing mechanisms, COVID-19 has exposed limitations and gaps in their reach, a theme we explored in earlier research.

This discussion paper from the McKinsey Global Institute (download full text in 36-page PDF) looks at the potential value that could be created—and the key issues that will need to be addressed—by the adoption of open data for finance. We focus on four regions: the European Union, India, the United Kingdom, and the United States.

By open data, we mean the ability to share financial data through a digital ecosystem in a manner that requires limited effort or manipulation. Advantages include more accurate credit risk evaluation and risk-based pricing, improved workforce allocation, better product delivery and customer service, and stronger fraud protection.

Our analysis suggests that the boost to the economy from broad adoption of open-data ecosystems could range from about 1 to 1.5 percent of GDP in 2030 in the European Union, the United Kingdom, and the United States, to as much as 4 to 5 percent in India. All market participants benefit, be they institutions or consumers—either individuals or micro-, small-, and medium-sized enterprises (MSMEs)—albeit to varying degrees….(More)”.

Real-Time Incident Data Could Change Road Safety Forever


Skip Descant at GovTech: “Data collected from connected vehicles can offer near real-time insights into highway safety problem areas, identifying near-misses, troublesome intersections and other roadway dangers.

New research from Michigan State University and Ford Mobility, which tracked driving incidents on Ford vehicles outfitted with connected vehicle technology, points to a future of greatly expanded understanding of roadway events, far beyond simply reading crash data.

“Connected vehicle data allows us to know what’s happening now. And that’s a huge thing. And I think that’s where a lot of the potential is, to allow us to actively monitor the roadways,” said Meredith Nelson, connected and automated vehicles analyst with the Michigan Department of Transportation.

The research looked at data collected from Ford vehicles in the Detroit metro region equipped with connected vehicle technology from January 2020 to June 2020, drawing on data collected by Ford’s Safety Insights platform in partnership with StreetLight Data. The data offers insights into near-miss events like hard braking, hard acceleration and hard corners. In 2020 alone, Ford has measured more than a half-billion events from tens of millions of trips.

Traditionally, researchers relied on police-reported crash data, which had its drawbacks, in part, because of the delay in reporting, said Peter Savolainen, an engineering professor in the Department of Civil and Environmental Engineering at Michigan State University, with a research focus looking at road user behavior….(More)”.

On the forecastability of food insecurity


Paper by Pietro Foini,  Michele Tizzoni, Daniela Paolotti, and Elisa Omodei: “Food insecurity, defined as the lack of physical or economic access to safe, nutritious and sufficient food, remains one of the main challenges included in the 2030 Agenda for Sustainable Development. Near real-time data on the food insecurity situation collected by international organizations such as the World Food Programme can be crucial to monitor and forecast time trends of insufficient food consumption levels in countries at risk.

Here, using food consumption observations in combination with secondary data on conflict, extreme weather events and economic shocks, we build a forecasting model based on gradient boosted regression trees to create predictions on the evolution of insufficient food consumption trends up to 30 days in to the future in 6 countries (Burkina Faso, Cameroon, Mali, Nigeria, Syria and Yemen). Results show that the number of available historical observations is a key element for the forecasting model performance. Among the 6 countries studied in this work, for those with the longest food insecurity time series, the proposed forecasting model makes it possible to forecast the prevalence of people with insufficient food consumption up to 30 days into the future with higher accuracy than a naive approach based on the last measured prevalence only. The framework developed in this work could provide decision makers with a tool to assess how the food insecurity situation will evolve in the near future in countries at risk. Results clearly point to the added value of continuous near real-time data collection at sub-national level…(More)”.

What Is Behavioral Data Science and How to Get into It?


Blogpost by Ganna Pogrebna: “Behavioral Data Science is a new, emerging, interdisciplinary field, which combines techniques from the behavioral sciences, such as psychology, economics, sociology, and business, with computational approaches from computer science, statistics, data-centric engineering, information systems research and mathematics, all in order to better model, understand and predict behavior.

Behavioral Data Science lies at the interface of all these disciplines (and a growing list of others) — all interested in combining deep knowledge about the questions underlying human, algorithmic, and systems behavior with increasing quantities of data. The kinds of questions this field engages are not only exciting and challenging, but also timely, such as:

Behavioral Data Science is capable of addressing all these issues (and many more) partly because of the availability of new data sources and partly due to the emergence of new (hybrid) models, which merge behavioral science and data science models. The main advantage of these models is that they expand machine learning techniques, operating, essentially, as black boxes, to fully tractable, and explainable upgrades. Specifically, while a deep learning model can generate accurate prediction of why people select one product or brand over the other, it will not tell you what exactly drives people’s preferences; whereas hybrid models, such as anthropomorphic learning, will be able to provide this insight….(More)”