The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure


Paper by Yoonsang Kim, Rachel Nordgren and Sherry Emery: “Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication.

The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings.

This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source…(More)”.

Twitter might have a better read on floods than NOAA


Interview by By Justine Calma: “Frustrated tweets led scientists to believe that tidal floods along the East Coast and Gulf Coast of the US are more annoying than official tide gauges suggest. Half a million geotagged tweets showed researchers that people were talking about disruptively high waters even when government gauges hadn’t recorded tide levels high enough to be considered a flood.

Capturing these reactions on social media can help authorities better understand and address the more subtle, insidious ways that climate change is playing out in peoples’ daily lives. Coastal flooding is becoming a bigger problem as sea levels rise, but a study published recently in the journal Nature Communications suggests that officials aren’t doing a great job of recording that.

The Verge spoke with Frances Moore, lead author of the new study and a professor at the University of California, Davis. This isn’t the first time that she’s turned to Twitter for her climate research. Her previous research also found that people tend to stop reacting to unusual weather after dealing with it for a while — sometimes in as little as two years. Similar data from Twitter has been used to study how people coped with earthquakes and hurricanes…(More)”.

The many perks of using critical consumer user data for social benefit


Sushant Kumar at LiveMint: “Business models that thrive on user data have created profitable global technology companies. For comparison, market capitalization of just three tech companies, Google (Alphabet), Facebook and Amazon, combined is higher than the total market capitalization of all listed firms in India. Almost 98% of Facebook’s revenue and 84% of Alphabet’s come from serving targeted advertising powered by data collected from the users. No doubt, these tech companies provide valuable services to consumers. It is also true that profits are concentrated with private corporations and societal value for contributors of data, that is, the user, can be much more significant….

In the existing economic construct, private firms are able to deploy top scientists and sophisticated analytical tools to collect data, derive value and monetize the insights.

Imagine if personalization at this scale was available for more meaningful outcomes, such as for administering personalized treatment for diabetes, recommending crop patterns, optimizing water management and providing access to credit to the unbanked. These socially beneficial applications of data can generate undisputedly massive value.

However, handling critical data with accountability to prevent misuse is a complex and expensive task. What’s more, private sector players do not have any incentives to share the data they collect. These challenges can be resolved by setting up specialized entities that can manage data—collect, analyse, provide insights, manage consent and access rights. These entities would function as a trusted intermediary with public purpose, and may be named “data stewards”….(More)”.

See also: http://datastewards.net/ and https://datacollaboratives.org/

Housing Search in the Age of Big Data: Smarter Cities or the Same Old Blind Spots?


Paper by Geoff Boeing et al: “Housing scholars stress the importance of the information environment in shaping housing search behavior and outcomes. Rental listings have increasingly moved online over the past two decades and, in turn, online platforms like Craigslist are now central to the search process. Do these technology platforms serve as information equalizers or do they reflect traditional information inequalities that correlate with neighborhood sociodemographics? We synthesize and extend analyses of millions of US Craigslist rental listings and find they supply significantly different volumes, quality, and types of information in different communities.

Technology platforms have the potential to broaden, diversify, and equalize housing search information, but they rely on landlord behavior and, in turn, likely will not reach this potential without a significant redesign or policy intervention. Smart cities advocates hoping to build better cities through technology must critically interrogate technology platforms and big data for systematic biases….(More)”.

International Humanitarian and Development Aid and Big Data Governance


Chapter by Andrej Zwitter: “Modern technology and innovations constantly transform the world. This also applies to humanitarian action and development aid, for example: humanitarian drones, crowd sourcing of information, or the utility of Big Data in crisis analytics and humanitarian intelligence. The acceleration of modernization in these adjacent fields can in part be attributed to new partnerships between aid agencies and new private stakeholders that increasingly become active, such as individual crisis mappers, mobile telecommunication companies, or technological SMEs.

These partnerships, however, must be described as simultaneously beneficial as well as problematic. Many private actors do not subscribe to the humanitarian principles (humanity, impartiality, independence, and neutrality), which govern UN and NGO operations, or are not even aware of them. Their interests are not solely humanitarian, but may include entrepreneurial agendas. The unregulated use of data in humanitarian intelligence has already caused negative consequences such as the exposure of sensitive data about aid agencies and of victims of disasters.

This chapter investigates the emergent governance trends around data innovation in the humanitarian and development field. It takes a look at the ways in which the field tries to regulate itself and the utility of the humanitarian principles for Big Data analytics and data-driven innovation. It will argue that it is crucially necessary to formulate principles for data governance in the humanitarian context in order to ensure the safeguarding of beneficiaries that are particularly vulnerable. In order to do that, the chapter proposes to reinterpret the humanitarian principles to accommodate the new reality of datafication of different aspects of society…(More)”.

The New City Regulators: Platform and Public Values in Smart and Sharing Cities


Paper by Sofia Ranchordás and Catalina Goanta: “Cities are increasingly influenced by novel and cosmopolitan values advanced by transnational technology providers and digital platforms. These values which are often visible in the advancement of the sharing economy and smart cities, may differ from the traditional public values protected by national and local laws and policies. This article contrasts the public values created by digital platforms in cities with the democratic and social national values that the platform society is leaving behind.

It innovates by showing how co-regulation can balance public values with platform values. In this article, we argue that despite the value-creation benefits produced by the digital platforms under analysis, public authorities should be aware of the risks of technocratic discourses and potential conflicts between platform and local values. In this context, we suggest a normative framework which enhances the need for a new kind of knowledge-service creation in the form of local public-interest technology. Moreover, our framework proposes a negotiated contractual system that seeks to balance platform values with public values in an attempt to address the digital enforcement problem driven by the functional sovereignty role of platforms….(More)”.

What if you ask and they say yes? Consumers' willingness to disclose personal data is stronger than you think


Grzegorz Mazurek and Karolina Małagocka at Business Horizons: “Technological progress—including the development of online channels and universal access to the internet via mobile devices—has advanced both the quantity and the quality of data that companies can acquire. Private information such as this may be considered a type of fuel to be processed through the use of technologies, and represents a competitive market advantage.

This article describes situations in which consumers tend to disclose personal information to companies and explores factors that encourage them to do so. The empirical studies and examples of market activities described herein illustrate to managers just how rewards work and how important contextual integrity is to customer digital privacy expectations. Companies’ success in obtaining client data depends largely on three Ts: transparency, type of data, and trust. These three Ts—which, combined, constitute a main T (i.e., the transfer of personal data)—deserve attention when seeking customer information that can be converted to competitive advantage and market success….(More)”.

The State of Open Humanitarian Data


Report by Centre for Humanitarian Data: “The goal of this report is to increase awareness of the data available for humanitarian response activities and to highlight what is missing, as measured through OCHA’s Humanitarian Data Exchange (HDX) platform. We want to recognize the valuable and long-standing contributions of data-sharing organizations. We also want to be more targeted in our outreach on what data is required to understand crises so that new actors might be compelled to join the platform. Data is not an end in itself but a critical ingredient to the analysis that informs decision making. With nearly 168 million people in need of humanitarian assistance in 2020 — the highest figure in decades — there is no time, or data, to lose…(More)”.

Making Public Transit Fairer to Women Demands Way More Data


Flavie Halais at Wired: “Public transportation is sexist. This may be unintentional or implicit, but it’s also easy to see. Women around the world do more care and domestic work than men, and their resulting mobility habits are hobbled by most transport systems. The demands of running errands and caring for children and other family members mean repeatedly getting on and off the bus, meaning paying more fares. Strollers and shopping bags make travel cumbersome. A 2018 study of New Yorkers found women were harassed on the subway far more frequently than men were, and as a result paid more money to avoid transit in favor of taxis and ride-hail….

What is not measured is not known, and the world of transit data is still largely blind to women and other vulnerable populations. Getting that data, though, isn’t easy. Traditional sources like national censuses and user surveys provide reliable information that serve as the basis for policies and decisionmaking. But surveys are costly to run, and it can take years for a government to go through the process of adding a question to its national census.

Before pouring resources into costly data collection to find answers about women’s transport needs, cities could first turn to the trove of unconventional gender-disaggregated data that’s already produced. They include data exhaust, or the trail of data we leave behind as a result of our interactions with digital products and services like mobile phones, credit cards, and social media. Last year, researchers in Santiago, Chile, released a report based on their parsing of anonymized call detail records of female mobile phone users, to extract location information and analyze their mobility patterns. They found that women tended to travel to fewer locations than men, and within smaller geographical areas. When researchers cross-referenced location information with census data, they found a higher gender gap among lower-income residents, as poorer women made even shorter trips. And when using data from the local transit agency, they saw that living close to a public transit stop increased mobility for both men and women, but didn’t close the gender gap for poorer residents.

To encourage private companies to share such info, Stefaan Verhulst advocates for data collaboratives, flexible partnerships between data providers and researchers. Verhulst is the head of research and development at GovLab, a research center at New York University that contributed to the research in Santiago. And that’s how GovLab and its local research partner, Universidad del Desarollo, got access to the phone records owned by the Chilean phone company, Telefónica. Data collaboratives can enhance access to private data without exposing companies to competition or privacy concerns. “We need to find ways to access data according to different shades of openness,” Verhulst says….(More)”.

The promise and perils of big gender data


Essay by Bapu Vaitla, Stefaan Verhulst, Linus Bengtsson, Marta C. González, Rebecca Furst-Nichols & Emily Courey Pryor in Special Issue on Big Data of Nature Medicine: “Women and girls are legally and socially marginalized in many countries. As a result, policymakers neglect key gendered issues such as informal labor markets, domestic violence, and mental health1. The scientific community can help push such topics onto policy agendas, but science itself is riven by inequality: women are underrepresented in academia, and gendered research is rarely a priority of funding agencies.

However, the critical importance of better gender data for societal well-being is clear. Mental health is a particularly striking example. Estimates from the Global Burden of Disease database suggest that depressive and anxiety disorders are the second leading cause of morbidity among females between 10 and 63 years of age2. But little is known about the risk factors that contribute to mental illness among specific groups of women and girls, the challenges of seeking care for depression and anxiety, or the long-term consequences of undiagnosed and untreated illness. A lack of data similarly impedes policy action on domestic and intimate-partner violence, early marriage, and sexual harassment, among many other topics.

‘Big data’ can help fill that gap. The massive amounts of information passively generated by electronic devices represent a rich portrait of human life, capturing where people go, the decisions they make, and how they respond to changes in their socio-economic environment. For example, mobile-phone data allow better understanding of health-seeking behavior as well as the dynamics of infectious-disease transmission3. Social-media platforms generate the world’s largest database of thoughts and emotions—information that, if leveraged responsibly, can be used to infer gendered patterns of mental health4. Remote sensors, especially satellites, can be used in conjunction with traditional data sources to increase the spatial and temporal granularity of data on women’s economic activity and health status5.

But the risk of gendered algorithmic bias is a serious obstacle to the responsible use of big data. Data are not value free; they reproduce the conscious and unconscious attitudes held by researchers, programmers, and institutions. Consider, for example, the training datasets on which the interpretation of big data depends. Training datasets establish the association between two or more directly observed phenomena of interest—for example, the mental health of a platform user (typically collected through a diagnostic survey) and the semantic content of the user’s social-media posts. These associations are then used to develop algorithms that interpret big data streams. In the example here, the (directly unobserved) mental health of a large population of social-media users would be inferred from their observed posts….(More)”.