Five types of urban digital twins


Blog by Darrel Ronald: “The definition for urban digital twins is too vague — so it is important to create a clearer picture of the types of urban digital twins that are available. Not all digital twins are the same and each one comes with features and capabilities, strengths and weakness, as well as appropriate and inappropriate use cases….

Darrel Ronald
Urban Twin taxonomy, Source: Darrel Ronald, Spatiomatics

As shown in my proposed Urban Digital Twin Taxonomy above, I propose that we classify these products first based on their Main Functionality (the Use Case), then based on their Technology Platform. I highlight some of main products within the different categories and their product scope. Next, I detail the different types of twins and offer some brief strengths and weaknesses for each type. This taxonomy could apply to other industries such as architecture or manufacturing, but it is specifically applied to cities and urban development projects.

The main functionalities can be grouped by:

  • Modelling Twin
  • Computational Twin
  • Scenario Twin
  • Operational Twin
  • Experiential Twin

The technology platforms can be grouped by:

  • Computer Aided Design (CAD)
  • Web GIS
  • Geographic Information System (GIS)
  • Gaming…(More)”.

From Happiness Data to Economic Conclusions


Paper by Daniel J. Benjamin, Kristen Cooper, Ori Heffetz & Miles S. Kimball: “Happiness data—survey respondents’ self-reported well-being (SWB)—have become increasingly common in economics research, with recent calls to use them in policymaking. Researchers have used SWB data in novel ways, for example to learn about welfare or preferences when choice data are unavailable or difficult to interpret. Focusing on leading examples of this pioneering research, the first part of this review uses a simple theoretical framework to reverse-engineer some of the crucial assumptions that underlie existing applications. The second part discusses evidence bearing on these assumptions and provides practical advice to the agencies and institutions that generate SWB data, the researchers who use them, and the policymakers who may use the resulting research. While we advocate creative uses of SWB data in economics, we caution that their use in policy will likely require both additional data collection and further research to better understand the data…(More)”.

Evidence 2.0: The Next Era of Evidence-Based Policymaking


Interview with Nick Hart & Jason Saul: “One of the great—if largely unsung—bipartisan congressional acts of recent history was the passage in 2018 of the Foundations for Evidence-Based Policymaking Act. In essence, the “Evidence Act” codified the goal of using solid, consistent evidence as the basis for funding decisions on trillions of dollars of public money. Agencies use this data to decide on the most effective and most promising solutions for a vast array of issues, from early-childhood education to environmental protection.

Five years later, while most federal agencies have created fairly robust evidence bases, unlocking that evidence for practical use by decision makers remains challenging. One might argue that if Evidence 1.0 was focused on the production of evidence, then the next five years—let’s call it Evidence 2.0—will be focused on the effective use of that evidence. Now that evidence is readily available to policymakers, the question is, how can that data be standardized, aggregated, derived, applied, and used for predictive decision-making?…(More)”.

Essential requirements for the governance and management of data trusts, data repositories, and other data collaborations


Paper by Alison Paprica et al: “Around the world, many organisations are working on ways to increase the use, sharing, and reuse of person-level data for research, evaluation, planning, and innovation while ensuring that data are secure and privacy is protected. As a contribution to broader efforts to improve data governance and management, in 2020 members of our team published 12 minimum specification essential requirements (min specs) to provide practical guidance for organisations establishing or operating data trusts and other forms of data infrastructure… We convened an international team, consisting mostly of participants from Canada and the United States of America, to test and refine the original 12 min specs. Twenty-three (23) data-focused organisations and initiatives recorded the various ways they address the min specs. Sub-teams analysed the results, used the findings to make improvements to the min specs, and identified materials to support organisations/initiatives in addressing the min specs.
Analyses and discussion led to an updated set of 15 min specs covering five categories: one min spec for Legal, five for Governance, four for Management, two for Data Users, and three for Stakeholder & Public Engagement. Multiple changes were made to make the min specs language more technically complete and precise. The updated set of 15 min specs has been integrated into a Canadian national standard that, to our knowledge, is the first to include requirements for public engagement and Indigenous Data Sovereignty…(More)”.

What Big Tech Knows About Your Body


Article by Yael Grauer: “If you were seeking online therapy from 2017 to 2021—and a lot of people were—chances are good that you found your way to BetterHelp, which today describes itself as the world’s largest online-therapy purveyor, with more than 2 million users. Once you were there, after a few clicks, you would have completed a form—an intake questionnaire, not unlike the paper one you’d fill out at any therapist’s office: Are you new to therapy? Are you taking any medications? Having problems with intimacy? Experiencing overwhelming sadness? Thinking of hurting yourself? BetterHelp would have asked you if you were religious, if you were LGBTQ, if you were a teenager. These questions were just meant to match you with the best counselor for your needs, small text would have assured you. Your information would remain private.

Except BetterHelp isn’t exactly a therapist’s office, and your information may not have been completely private. In fact, according to a complaint brought by federal regulators, for years, BetterHelp was sharing user data—including email addresses, IP addresses, and questionnaire answers—with third parties, including Facebook and Snapchat, for the purposes of targeting ads for its services. It was also, according to the Federal Trade Commission, poorly regulating what those third parties did with users’ data once they got them. In July, the company finalized a settlement with the FTC and agreed to refund $7.8 million to consumers whose privacy regulators claimed had been compromised. (In a statement, BetterHelp admitted no wrongdoing and described the alleged sharing of user information as an “industry-standard practice.”)

We leave digital traces about our health everywhere we go: by completing forms like BetterHelp’s. By requesting a prescription refill online. By clicking on a link. By asking a search engine about dosages or directions to a clinic or pain in chest dying. By shopping, online or off. By participating in consumer genetic testing. By stepping on a smart scale or using a smart thermometer. By joining a Facebook group or a Discord server for people with a certain medical condition. By using internet-connected exercise equipment. By using an app or a service to count your steps or track your menstrual cycle or log your workouts. Even demographic and financial data unrelated to health can be aggregated and analyzed to reveal or infer sensitive information about people’s physical or mental-health conditions…(More)”.

The Man Who Trapped Us in Databases


McKenzie Funk in The New York University: “One of Asher’s innovations — or more precisely one of his companies’ innovations — was what is now known as the LexID. My LexID, I learned, is 000874529875. This unique string of digits is a kind of shadow Social Security number, one of many such “persistent identifiers,” as they are called, that have been issued not by the government but by data companies like Acxiom, Oracle, Thomson Reuters, TransUnion — or, in this case, LexisNexis.

My LexID was created sometime in the early 2000s in Asher’s computer room in South Florida, as many still are, and without my consent it began quietly stalking me. One early data point on me would have been my name; another, my parents’ address in Oregon. From my birth certificate or my driver’s license or my teenage fishing license — and from the fact that the three confirmed one another — it could get my sex and my date of birth. At the time, it would have been able to collect the address of the college I attended, Swarthmore, which was small and expensive, and it would have found my first full-time employer, the National Geographic Society, quickly amassing more than enough data to let someone — back then, a human someone — infer quite a bit more about me and my future prospects…(More)”

Data Repurposing through Compatibility: A Computational Perspective


Paper by Asia Biega: “Reuse of data in new contexts beyond the purposes for which it was originally collected has contributed to technological innovation and reducing the consent burden on data subjects. One of the legal mechanisms that makes such reuse possible is purpose compatibility assessment. In this paper, I offer an in-depth analysis of this mechanism through a computational lens. I moreover consider what should qualify as repurposing apart from using data for a completely new task, and argue that typical purpose formulations are an impediment to meaningful repurposing. Overall, the paper positions compatibility assessment as a constructive practice beyond an ineffective standard…(More)”

Machine-assisted mixed methods: augmenting humanities and social sciences with artificial intelligence


Paper by Andres Karjus: “The increasing capacities of large language models (LLMs) present an unprecedented opportunity to scale up data analytics in the humanities and social sciences, augmenting and automating qualitative analytic tasks previously typically allocated to human labor. This contribution proposes a systematic mixed methods framework to harness qualitative analytic expertise, machine scalability, and rigorous quantification, with attention to transparency and replicability. 16 machine-assisted case studies are showcased as proof of concept. Tasks include linguistic and discourse analysis, lexical semantic change detection, interview analysis, historical event cause inference and text mining, detection of political stance, text and idea reuse, genre composition in literature and film; social network inference, automated lexicography, missing metadata augmentation, and multimodal visual cultural analytics. In contrast to the focus on English in the emerging LLM applicability literature, many examples here deal with scenarios involving smaller languages and historical texts prone to digitization distortions. In all but the most difficult tasks requiring expert knowledge, generative LLMs can demonstrably serve as viable research instruments. LLM (and human) annotations may contain errors and variation, but the agreement rate can and should be accounted for in subsequent statistical modeling; a bootstrapping approach is discussed. The replications among the case studies illustrate how tasks previously requiring potentially months of team effort and complex computational pipelines, can now be accomplished by an LLM-assisted scholar in a fraction of the time. Importantly, this approach is not intended to replace, but to augment researcher knowledge and skills. With these opportunities in sight, qualitative expertise and the ability to pose insightful questions have arguably never been more critical…(More)”.

Get a rabbit: Don’t trust the numbers · 


Article by John Lanchester: “At a dinner​ with the American ambassador in 2007, Li Keqiang, future premier of China, said that when he wanted to know what was happening to the country’s economy, he looked at the numbers for electricity use, rail cargo and bank lending. There was no point using the official GDP statistics, Li said, because they are ‘man-made’. That remark, which we know about thanks to WikiLeaks, is fascinating for two reasons. First, because it shows a sly, subtle, worldly humour – a rare glimpse of the sort of thing Chinese Communist Party leaders say in private. Second, because it’s true. A whole strand in contemporary thinking about the production of knowledge is summed up there: data and statistics, all of them, are man-made.

They are also central to modern politics and governance, and the ways we talk about them. That in itself represents a shift. Discussions that were once about values and beliefs – about what a society wants to see when it looks at itself in the mirror – have increasingly turned to arguments about numbers, data, statistics. It is a quirk of history that the politician who introduced this style of debate wasn’t Harold Wilson, the only prime minister to have had extensive training in statistics, but Margaret Thatcher, who thought in terms of values but argued in terms of numbers. Even debates that are ultimately about national identity, such as the referendums about Scottish independence and EU membership, now turn on numbers.

Given the ubiquity of this style of argument, we are nowhere near as attentive to its misuses as we should be. As the House of Commons Treasury Committee said dryly in a 2016 report on the economic debate about EU membership, ‘many of these claims sound factual because they use numbers.’ The best short book about the use and misuse of statistics is Darrell Huff’s How to Lie with Statistics, first published in 1954, a devil’s-advocate guide to the multiple ways in which numbers are misused in advertising, commerce and politics. (Single best tip: ‘up to’ is always a fib. It means somebody did a range of tests and has artfully chosen the most flattering number.) For all its virtues, though, even Huff’s book doesn’t encompass the full range of possibilities for statistical deception. In politics, the numbers in question aren’t just man-made but are often contentious, tendentious or outright fake.

Two fake numbers have been decisively influential in British politics over the baleful last thirteen years. The first was an outright lie: Vote Leave’s assertion that £350 million a week extra ‘for the NHS’ would be available if we left the EU. The real number for the UK’s net contribution to the EU was £110 million, but that didn’t matter, since the crucial thing for the Leave campaign was to make the number the focus of debate. The Treasury Committee said the number was fake, and so did the UK Statistics Authority. This had no, or perhaps even a negative, effect. In politics it doesn’t really matter what the numbers are, so much as whose they are. If people are arguing about your numbers, you’re winning…(More)“.

On the culture of open access: the Sci-hub paradox


Paper by Abdelghani Maddi and David Sapinho: “Shadow libraries, also known as ”pirate libraries”, are online collections of copyrighted publications that have been made available for free without the permission of the copyright holders. They have gradually become key players of scientific knowledge dissemination, despite their illegality in most countries of the world. Many publishers and scientist-editors decry such libraries for their copyright infringement and loss of publication usage information, while some scholars and institutions support them, sometimes in a roundabout way, for their role in reducing inequalities of access to knowledge, particularly in low-income countries. Although there is a wealth of literature on shadow libraries, none of this have focused on its potential role in knowledge dissemination, through the open access movement. Here we analyze how shadow libraries can affect researchers’ citation practices, highlighting some counter-intuitive findings about their impact on the Open Access Citation Advantage (OACA). Based on a large randomized sample, this study first shows that OA publications, including those in fully OA journals, receive more citations than their subscription-based counterparts do. However, the OACA has slightly decreased over the seven last years. The introduction of a distinction between those accessible or not via the Scihub platform among subscription-based suggest that the generalization of its use cancels the positive effect of OA publishing. The results show that publications in fully OA journals are victims of the success of Sci-hub. Thus, paradoxically, although Sci-hub may seem to facilitate access to scientific knowledge, it negatively affects the OA movement as a whole, by reducing the comparative advantage of OA publications in terms of visibility for researchers. The democratization of the use of Sci-hub may therefore lead to a vicious cycle, hindering efforts to develop full OA strategies without proposing a credible and sustainable alternative model for the dissemination of scientific knowledge…(More)”.