Online Bettors Can Sniff Out Weak Psychology Studies


Ed Yong at the Atlantic: “Psychologists are in the midst of an ongoing, difficult reckoning. Many believe that their field is experiencing a “reproducibility crisis,” because they’ve tried and failed to repeat experiments done by their peers. Even classic results—the stuff of textbooks and TED talks—have proven surprisingly hard to replicate, perhaps because they’re the results of poor methods and statistical tomfoolery. These problems have spawned a community of researchers dedicated to improving the practices of their field and forging a more reliable way of doing science.

These attempts at reform have met resistance. Critics have argued that the so-called crisis is nothing of the sort, and that researchers who have failed to repeat past experiments were variously incompetentprejudiced, or acting in bad faith.

But if those critiques are correct, then why is it that scientists seem to be remarkably good at predicting which studies in psychology and other social sciences will replicate, and which will not?

Consider the new results from the Social Sciences Replication Project, in which 24 researchers attempted to replicate social-science studies published between 2010 and 2015 in Nature and Science—the world’s top two scientific journals. The replicators ran much bigger versions of the original studies, recruiting around five times as many volunteers as before. They did all their work in the open, and ran their plans past the teams behind the original experiments. And ultimately, they could only reproduce the results of 13 out of 21 studies—62 percent.

As it turned out, that finding was entirely predictable. While the SSRP team was doing their experimental re-runs, they also ran a “prediction market”—a stock exchange in which volunteers could buy or sell “shares” in the 21 studies, based on how reproducible they seemed. They recruited 206 volunteers—a mix of psychologists and economists, students and professors, none of whom were involved in the SSRP itself. Each started with $100 and could earn more by correctly betting on studies that eventually panned out.

At the start of the market, shares for every study cost $0.50 each. As trading continued, those prices soared and dipped depending on the traders’ activities. And after two weeks, the final price reflected the traders’ collective view on the odds that each study would successfully replicate. So, for example, a stock price of $0.87 would mean a study had an 87 percent chance of replicating. Overall, the traders thought that studies in the market would replicate 63 percent of the time—a figure that was uncannily close to the actual 62-percent success rate.

The traders’ instincts were also unfailingly sound when it came to individual studies. Look at the graph below. The market assigned higher odds of success for the 13 studies that were successfully replicated than the eight that weren’t—compare the blue diamonds to the yellow diamonds….(More)”.

Citizen science, public policy


Paper by Christi J. GuerriniMary A. Majumder,  Meaganne J. Lewellyn, and Amy L. McGuire in Science: “Citizen science initiatives that support collaborations between researchers and the public are flourishing. As a result of this enhanced role of the public, citizen science demonstrates more diversity and flexibility than traditional science and can encompass efforts that have no institutional affiliation, are funded entirely by participants, or continuously or suddenly change their scientific aims.

But these structural differences have regulatory implications that could undermine the integrity, safety, or participatory goals of particular citizen science projects. Thus far, citizen science appears to be addressing regulatory gaps and mismatches through voluntary actions of thoughtful and well-intentioned practitioners.

But as citizen science continues to surge in popularity and increasingly engage divergent interests, vulnerable populations, and sensitive data, it is important to consider the long-term effectiveness of these private actions and whether public policies should be adjusted to complement or improve on them. Here, we focus on three policy domains that are relevant to most citizen science projects: intellectual property (IP), scientific integrity, and participant protections….(More)”.

Data Science Thinking: The Next Scientific, Technological and Economic Revolution


Book by Longbing Cao: “This book explores answers to the fundamental questions driving the research, innovation and practices of the latest revolution in scientific, technological and economic development: how does data science transform existing science, technology, industry, economy, profession and education?  How does one remain competitive in the data science field? What is responsible for shaping the mindset and skillset of data scientists?

Data Science Thinking paints a comprehensive picture of data science as a new scientific paradigm from the scientific evolution perspective, as data science thinking from the scientific-thinking perspective, as a trans-disciplinary science from the disciplinary perspective, and as a new profession and economy from the business perspective.

The topics cover an extremely wide spectrum of essential and relevant aspects of data science, spanning its evolution, concepts, thinking, challenges, discipline, and foundation, all the way to industrialization, profession, education, and the vast array of opportunities that data science offers. The book’s three parts each detail layers of these different aspects….(More)”.

Behavioural science and policy: where are we now and where are we going?


Michael Sanders et al in Behavioral Public Policy: “The use of behavioural sciences in government has expanded and matured in the last decade. Since the Behavioural Insights Team (BIT) has been part of this movement, we sketch out the history of the team and the current state of behavioural public policy, recognising that other works have already told this story in detail. We then set out two clusters of issues that have emerged from our work at BIT. The first cluster concerns current challenges facing behavioural public policy: the long-term effects of interventions; repeated exposure effects; problems with proxy measures; spillovers and general equilibrium effects and unintended consequences; cultural variation; ‘reverse impact’; and the replication crisis. The second cluster concerns opportunities: influencing the behaviour of government itself; scaling interventions; social diffusion; nudging organisations; and dealing with thorny problems. We conclude that the field will need to address these challenges and take these opportunities in order to realise the full potential of behavioural public policy….(More)”.

Odd Numbers: Algorithms alone can’t meaningfully hold other algorithms accountable


Frank Pasquale at Real Life Magazine: “Algorithms increasingly govern our social world, transforming data into scores or rankings that decide who gets credit, jobs, dates, policing, and much more. The field of “algorithmic accountability” has arisen to highlight the problems with such methods of classifying people, and it has great promise: Cutting-edge work in critical algorithm studies applies social theory to current events; law and policy experts seem to publish new articles daily on how artificial intelligence shapes our lives, and a growing community of researchers has developed a field known as “Fairness, Accuracy, and Transparency in Machine Learning.”

The social scientists, attorneys, and computer scientists promoting algorithmic accountability aspire to advance knowledge and promote justice. But what should such “accountability” more specifically consist of? Who will define it? At a two-day, interdisciplinary roundtable on AI ethics I recently attended, such questions featured prominently, and humanists, policy experts, and lawyers engaged in a free-wheeling discussion about topics ranging from robot arms races to computationally planned economies. But at the end of the event, an emissary from a group funded by Elon Musk and Peter Thiel among others pronounced our work useless. “You have no common methodology,” he informed us (apparently unaware that that’s the point of an interdisciplinary meeting). “We have a great deal of money to fund real research on AI ethics and policy”— which he thought of as dry, economistic modeling of competition and cooperation via technology — “but this is not the right group.” He then gratuitously lashed out at academics in attendance as “rent seekers,” largely because we had the temerity to advance distinctive disciplinary perspectives rather than fall in line with his research agenda.

Most corporate contacts and philanthrocapitalists are more polite, but their sense of what is realistic and what is utopian, what is worth studying and what is mere ideology, is strongly shaping algorithmic accountability research in both social science and computer science. This influence in the realm of ideas has powerful effects beyond it. Energy that could be put into better public transit systems is instead diverted to perfect the coding of self-driving cars. Anti-surveillance activism transmogrifies into proposals to improve facial recognition systems to better recognize all faces. To help payday-loan seekers, developers might design data-segmentation protocols to show them what personal information they should reveal to get a lower interest rate. But the idea that such self-monitoring and data curation can be a trap, disciplining the user in ever finer-grained ways, remains less explored. Trying to make these games fairer, the research elides the possibility of rejecting them altogether….(More)”.

Searching for the Smart City’s Democratic Future


Article by Bianca Wylie at the Center for International Governance Innovation: “There is a striking blue building on Toronto’s eastern waterfront. Wrapped top to bottom in bright, beautiful artwork by Montreal illustrator Cecile Gariepy, the building — a former fish-processing plant — stands out alongside the neighbouring parking lots and a congested highway. It’s been given a second life as an office for Sidewalk Labs — a sister company to Google that is proposing a smart city development in Toronto. Perhaps ironically, the office is like the smart city itself: something old repackaged to be light, fresh and novel.

“Our mission is really to use technology to redefine urban life in the twenty-first century.”

Dan Doctoroff, CEO of Sidewalk Labs, shared this mission in an interview with Freakonomics Radio. The phrase is a variant of the marketing language used by the smart city industry at large. Put more simply, the term “smart city” is usually used to describe the use of technology and data in cities.

No matter the words chosen to describe it, the smart city model has a flaw at its core: corporations are seeking to exert influence on urban spaces and democratic governance. And because most governments don’t have the policy in place to regulate smart city development — in particular, projects driven by the fast-paced technology sector — this presents a growing global governance concern.

This is where the story usually descends into warnings of smart city dystopia or failure. Loads of recent articles have detailed the science fiction-style city-of-the-future and speculated about the perils of mass data collection, and for good reason — these are important concepts that warrant discussion. It’s time, however, to push past dystopian narratives and explore solutions for the challenges that smart cities present in Toronto and globally…(More)”.

A roadmap for restoring trust in Big Data


Mark Lawler et al in the Lancet: “The fallout from the Cambridge Analytica–Facebook scandal marks a significant inflection point in the public’s trust concerning Big Data. The health-science community must use this crisis-in-confidence to redouble its commitment to talk openly and transparently about benefits and risks and to act decisively to deliver robust effective governance frameworks, under which personal health data can be responsibly used. Activities such as the Innovative Medicines Initiative’s Big Data for Better Outcomes emphasise how a more granular data-driven understanding of human diseases including cancer could underpin innovative therapeutic intervention.
 Health Data Research UK is developing national research expertise and infrastructure to maximise the value of health data science for the National Health Service and ultimately British citizens.
Comprehensive data analytics are crucial to national programmes such as the US Cancer Moonshot, the UK’s 100 000 Genomes project, and other national genomics programmes. Cancer Core Europe, a research partnership between seven leading European oncology centres, has personal data sharing at its core. The Global Alliance for Genomics and Health recently highlighted the need for a global cancer knowledge network to drive evidence-based solutions for a disease that kills more than 8·7 million citizens annually worldwide. These activities risk being fatally undermined by the recent data-harvesting controversy.
We need to restore the public’s trust in data science and emphasise its positive contribution in addressing global health and societal challenges. An opportunity to affirm the value of data science in Europe was afforded by Digital Day 2018, which took place on April 10, 2018, in Brussels, and where European Health Ministers signed a declaration of support to link existing or future genomic databanks across the EU, through the Million European Genomes Alliance.
So how do we address evolving challenges in analysis, sharing, and storage of information, ensure transparency and confidentiality, and restore public trust? We must articulate a clear Social Contract, where citizens (as data donors) are at the heart of decision-making. We need to demonstrate integrity, honesty, and transparency as to what happens to data and what level of control people can, or cannot, expect. We must embed ethical rigour in all our data-driven processes. The Framework for Responsible Sharing of Genomic and Health Related Data represents a practical global approach, promoting effective and ethical sharing and use of research or patient data, while safeguarding individual privacy through secure and accountable data transfer…(More)”.

Americans Want to Share Their Medical Data. So Why Can’t They?


Eleni Manis at RealClearHealth: “Americans are willing to share personal data — even sensitive medical data — to advance the common good. A recent Stanford University study found that 93 percent of medical trial participants in the United States are willing to share their medical data with university scientists and 82 percent are willing to share with scientists at for-profit companies. In contrast, less than a third are concerned that their data might be stolen or used for marketing purposes.

However, the majority of regulations surrounding medical data focus on individuals’ ability to restrict the use of their medical data, with scant attention paid to supporting the ability to share personal data for the common good. Policymakers can begin to right this balance by establishing a national medical data donor registry that lets individuals contribute their medical data to support research after their deaths. Doing so would help medical researchers pursue cures and improve health care outcomes for all Americans.

Increased medical data sharing facilitates advances in medical science in three key ways. First, de-identified participant-level data can be used to understand the results of trials, enabling researchers to better explicate the relationship between treatments and outcomes. Second, researchers can use shared data to verify studies and identify cases of data fraud and research misconduct in the medical community. For example, one researcher recently discovered a prolific Japanese anesthesiologist had falsified data for almost two decades. Third, shared data can be combined and supplemented to support new studies and discoveries.

Despite these benefits, researchers, research funders, and regulators have struggled to establish a norm for sharing clinical research data. In some cases, regulatory obstacles are to blame. HIPAA — the federal law regulating medical data — blocks some sharing on grounds of patient privacy, while federal and state regulations governing data sharing are inconsistent. Researchers themselves have a proprietary interest in data they produce, while academic researchers seeking to maximize publications may guard data jealously.

Though funding bodies are aware of this tension, they are unable to resolve it on their own. The National Institutes of Health, for example, requires a data sharing plan for big-ticket funding but recognizes that proprietary interests may make sharing impossible….(More)”.

#TrendingLaws: How can Machine Learning and Network Analysis help us identify the “influencers” of Constitutions?


Unicef: “New research by scientists from UNICEF’s Office of Innovation — published today in the journal Nature Human Behaviour — applies methods from network science and machine learning to constitutional law.  UNICEF Innovation Data Scientists Alex Rutherford and Manuel Garcia-Herranz collaborated with computer scientists and political scientists at MIT, George Washington University, and UC Merced to apply data analysis to the world’s constitutions over the last 300 years. This work sheds new light on how to better understand why countries’ laws change and incorporate social rights…

Data science techniques allow us to use methods like network science and machine learning to uncover patterns and insights that are hard for humans to see. Just as we can map influential users on Twitter — and patterns of relations between places to predict how diseases will spread — we can identify which countries have influenced each other in the past and what are the relations between legal provisions.

Why The Science of Constitutions?

One way UNICEF fulfills its mission is through advocacy with national governments — to enshrine rights for minorities, notably children, formally in law. Perhaps the most renowned example of this is the International Convention on the Rights of the Child (ICRC).

Constitutions, such as Mexico’s 1917 constitution — the first to limit the employment of children — are critical to formalizing rights for vulnerable populations. National constitutions describe the role of a country’s institutions, its character in the eyes of the world, as well as the rights of its citizens.

From a scientific standpoint, the work is an important first step in showing that network analysis and machine learning technique can be used to better understand the dynamics of caring for and protecting the rights of children — critical to the work we do in a complex and interconnected world. It shows the significant, and positive policy implications of using data science to uphold children’s rights.

What the Research Shows:

Through this research, we uncovered:

  • A network of relationships between countries and their constitutions.
  • A natural progression of laws — where fundamental rights are a necessary precursor to more specific rights for minorities.
  • The effect of key historical events in changing legal norms….(More)”.

The Democratization of Data Science


Jonathan Cornelissen at Harvard Business School: “Want to catch tax cheats? The government of Rwanda does — and it’s finding them by studying anomalies in revenue-collection data.

Want to understand how American culture is changing? So does a budding sociologist in Indiana. He’s using data science to find patterns in the massive amounts of text people use each day to express their worldviews — patterns that no individual reader would be able to recognize.

Intelligent people find new uses for data science every day. Still, despite the explosion of interest in the data collected by just about every sector of American business — from financial companies and health care firms to management consultancies and the government — many organizations continue to relegate data-science knowledge to a small number of employees.

That’s a mistake — and in the long run, it’s unsustainable. Think of it this way: Very few companies expect only professional writers to know how to write. So why ask onlyprofessional data scientists to understand and analyze data, at least at a basic level?

Relegating all data knowledge to a handful of people within a company is problematic on many levels. Data scientists find it frustrating because it’s hard for them to communicate their findings to colleagues who lack basic data literacy. Business stakeholders are unhappy because data requests take too long to fulfill and often fail to answer the original questions. In some cases, that’s because the questioner failed to explain the question properly to the data scientist.

Why would non–data scientists need to learn data science? That’s like asking why non-accountants should be expected to stay within budget.

These days every industry is drenched in data, and the organizations that succeed are those that most quickly make sense of their data in order to adapt to what’s coming. The best way to enable fast discovery and deeper insights is to disperse data science expertise across an organization.

Companies that want to compete in the age of data need to do three things: share data tools, spread data skills, and spread data responsibility…(More)”.