Paper by Xuan Zhang et al: “Health care fraud is a serious problem that impacts every patient and consumer. This fraudulent behavior causes excessive financial losses every year and causes significant patient harm. Healthcare fraud includes health insurance fraud, fraudulent billing of insurers for services not provided, and exaggeration of medical services, etc. To identify healthcare fraud thus becomes an urgent task to avoid the abuse and waste of public funds. Existing methods in this research field usually use classified data from governments, which greatly compromises the generalizability and scope of application. This paper introduces a methodology to use publicly available data sources to identify potentially fraudulent behavior among physicians. The research involved data pairing of multiple datasets, selection of useful features, comparisons of classification models, and analysis of useful predictors. Our performance evaluation results clearly demonstrate the efficacy of the proposed method….(More)”.
Big Data: the End of the Scientific Method?
Paper by S. Succi and P.V. Coveney at arXiv: “We argue that the boldest claims of Big Data are in need of revision and toning-down, in view of a few basic lessons learned from the science of complex systems. We point out that, once the most extravagant claims of Big Data are properly discarded, a synergistic merging of BD with big theory offers considerable potential to spawn a new scientific paradigm capable of overcoming some of the major barriers confronted by the modern scientific method originating with Galileo. These obstacles are due to the presence of nonlinearity, nonlocality and hyperdimensions which one encounters frequently in multiscale modelling….(More)”.
We Need Transparency in Algorithms, But Too Much Can Backfire
Kartik Hosanagar and Vivian Jair at Harvard Business Review: “In 2013, Stanford professor Clifford Nass faced a student revolt. Nass’s students claimed that those in one section of his technology interface course received higher grades on the final exam than counterparts in another. Unfortunately, they were right: two different teaching assistants had graded the two different sections’ exams, and one had been more lenient than the other. Students with similar answers had ended up with different grades.
Nass, a computer scientist, recognized the unfairness and created a technical fix: a simple statistical model to adjust scores, where students got a certain percentage boost on their final mark when graded by a TA known to give grades that percentage lower than average. In the spirit of openness, Nass sent out emails to the class with a full explanation of his algorithm. Further complaints poured in, some even angrier than before. Where had he gone wrong?…
Kizilcec had in fact tested three levels of transparency: low and medium but also high, where the students got not only a paragraph explaining the grading process but also their raw peer-graded scores and how these were each precisely adjusted by the algorithm to get to a final grade. And this is where the results got more interesting. In the experiment, while medium transparency increased trust significantly, high transparency eroded it completely, to the point where trust levels were either equal to or lower than among students experiencing low transparency.
Making Modern AI Transparent: A Fool’s Errand?
What are businesses to take home from this experiment? It suggests that technical transparency – revealing the source code, inputs, and outputs of the algorithm – can build trust in many situations. But most algorithms in the world today are created and managed by for-profit companies, and many businesses regard their algorithms as highly valuable forms of intellectual property that must remain in a “black box.” Some lawmakers have proposed a compromise, suggesting that the source code be revealed to regulators or auditors in the event of a serious problem, and this adjudicator will assure consumers that the process is fair.
This approach merely shifts the burden of belief from the algorithm itself to the regulators. This may a palatable solution in many arenas: for example, few of us fully understand financial markets, so we trust the SEC to take on oversight. But in a world where decisions large and small, personal and societal, are being handed over to algorithms, this becomes less acceptable.
Another problem with technical transparency is that it makes algorithms vulnerable to gaming. If an instructor releases the complete source code for an algorithm grading student essays, it becomes easy for students to exploit loopholes in the code: maybe, for example, the algorithm seeks evidence that the students have done research by looking for phrases such as “according to published research.” A student might then deliberately use this language at the start of every paragraph in her essay.
But the biggest problem is that modern AI is making source code – transparent or not – less relevant compared with other factors in algorithmic functioning. Specifically, machine learning algorithms – and deep learning algorithms in particular – are usually built on just a few hundred lines of code. The algorithms logic is mostly learned from training data and is rarely reflected in its source code. Which is to say, some of today’s best-performing algorithms are often the most opaque. High transparency might involve getting our heads around reams and reams of data – and then still only being able to guess at what lessons the algorithm has learned from it.
This is where Kizilcec’s work becomes relevant – a way to embrace rather than despair over deep learning’s impenetrability. His work shows that users will not trust black box models, but they don’t need – or even want – extremely high levels of transparency. That means responsible companies need not fret over what percentage of source code to reveal, or how to help users “read” massive datasets. Instead, they should work to provide basic insights on the factors driving algorithmic decisions….(More)”
What top technologies should the next generation know how to use?
Lottie Waters at Devex: “Technology provides some great opportunities for global development, and a promising future. But for the next generation of professionals to succeed, it’s vital they stay up to date with the latest tech, innovations, and tools.
In a recent report produced by Devex in collaboration with the United States Agency for International Development and DAI, some 86 percent of survey respondents believe the technology, skills, and approaches development professionals will be using in 10 years’ time will be significantly different to today’s.
In fact, “technology for development” is regarded as the sector that will see the most development progress, but is also cited as the one that will see the biggest changes in skills required, according to the survey.
“As different technologies develop, new possibilities will open up that we may not even be aware of yet. These opportunities will bring new people into the development sector and require those in it to be more agile in adapting technologies to meet development challenges,” said one survey respondent.
While “blockchain,” “artificial intelligence,” and “drones” may be the current buzzwords surrounding tech in global development, geographical information systems, or GIS, and big data are actually the top technologies respondents believe the next generation of development professionals should learn how to utilize.
So, how are these technologies currently being used in development, how might this change in the near future, and what will their impact be in the next 10 years? Devex spoke with experts in the field who are already integrating these technologies into their work to find out….(More)”
How games can help craft better policy
Shrabonti Bagchi at LiveMint: “I have never seen economists having fun!” Anantha K. Duraiappah, director of Unesco-MGIEP (Mahatma Gandhi Institute of Education for Peace and Sustainable Development), was heard exclaiming during a recent conference. The academics in question were a group of environmental economists at an Indian Society for Ecological Economics conference in Thrissur, Kerala, and they were playing a game called Cantor’s World, in which each player assumes the role of the supreme leader of a country and gets to decide the fate of his or her nation.
Well, it’s not quite as simple as that (this is not Settlers Of Catan!). Players have to take decisions on long-term goals like education and industrialization based on data such as GDP, produced capital, human capital, and natural resources while adhering to the UN’s sustainable development goals. The game is probably the most accessible and enjoyable way of seeing how long-term policy decisions change and impact the future of countries.
That’s what Fields Of View does. The Bengaluru-based non-profit creates games, simulations and learning tools for the better understanding of policy and its impact. Essentially, their work is to make sure economists like the ones at the Thrissur conference actually have some fun while thrashing out crucial issues of public policy.
![A screen grab from ‘Cantor’s World’. A screen grab from ‘Cantor’s World’.](https://www.livemint.com/rf/Image-621x414/LiveMint/Period2/2018/06/23/Photos/Processed/game2.jpg)
Can policymaking be made more relevant to the lives of people affected by it? Can policymaking be more responsive to a dynamic social-economic-environmental context? Can we reduce the time taken for a policy to go from the drawing board to implementation? These were some of the questions the founders of Fields Of View, Sruthi Krishnan and Bharath M. Palavalli, set out to answer. “There are no binaries in policymaking. There are an infinite set of possibilities,” says Palavalli, who was named an Ashoka fellow in May for his work at the intersection of technology, social sciences and design.
Earlier this year, Fields Of View organized a session of one of its earliest games, City Game, for a group of 300 female college students in Mangaluru. City Game is a multiplayer offline game designed to explore urban infrastructure and help groups and individual understand the dynamics of urban governance…(More)”.
Doing good data science
series on data ethics) The hard thing about being an ethical data scientist isn’t understanding ethics. It’s the junction between ethical ideas and practice. It’s doing good data science.
(This post is the first in aThere has been a lot of healthy discussion about data ethics lately. We want to be clear: that discussion is good, and necessary. But it’s also not the biggest problem we face. We already have good standards for data ethics. The ACM’s code of ethics, which dates back to 1993, is clear, concise, and surprisingly forward-thinking; 25 years later, it’s a great start for anyone thinking about ethics. The American Statistical Association has a good set of ethical guidelines for working with data. So, we’re not working in a vacuum.
And, while there are always exceptions, we believe that most people want to be fair. Data scientists and software developers don’t want to harm the people using their products. There are exceptions, of course; we call them criminals and con artists. Defining “fairness” is difficult, and perhaps impossible, given the many crosscutting layers of “fairness” that we might be concerned with. But we don’t have to solve that problem in advance, and it’s not going to be solved in a simple statement of ethical principles, anyway.
The problem we face is different: how do we put ethical principles into practice? We’re not talking about an abstract commitment to being fair. Ethical principles are worse than useless if we don’t allow them to change our practice, if they don’t have any effect on what we do day-to-day. For data scientists, whether you’re doing classical data analysis or leading-edge AI, that’s a big challenge. We need to understand how to build the software systems that implement fairness. That’s what we mean by doing good data science.
Any code of data ethics will tell you that you shouldn’t collect data from experimental subjects without informed consent. But that code won’t tell you how to implement “informed consent.” Informed consent is easy when you’re interviewing a few dozen people in person for a psychology experiment. Informed consent means something different when someone clicks on an item in an online catalog (hello, Amazon), and ads for that item start following them around ad infinitum. Do you use a pop-up to ask for permission to use their choice in targeted advertising? How many customers would you lose? Informed consent means something yet again when you’re asking someone to fill out a profile for a social site, and you might (or might not) use that data for any number of experimental purposes. Do you pop up a consent form in impenetrable legalese that basically says “we will use your data, but we don’t know for what”? Do you phrase this agreement as an opt-out, and hide it somewhere on the site where nobody will find it?…
To put ethical principles into practice, we need space to be ethical. We need the ability to have conversations about what ethics means, what it will cost, and what solutions to implement. As technologists, we frequently share best practices at conferences, write blog posts, and develop open source technologies—but we rarely discuss problems such as how to obtain informed consent.
There are several facets to this space that we need to think about.
First, we need corporate cultures in which discussions about fairness, about the proper use of data, and about the harm that can be done by inappropriate use of data can be considered. In turn, this means that we can’t rush products out the door without thinking about how they’re used. We can’t allow “internet time” to mean ignoring the consequences. Indeed, computer security has shown us the consequences of ignoring the consequences: many companies that have never taken the time to implement good security practices and safeguards are now paying with damage to their reputations and their finances. We need to do the same when thinking about issues like fairness, accountability, and unintended consequences….(More)”.
Cloud Communities: The Dawn of Global Citizenship?
Robert Schuman Centre for Advanced Studies Research Paper by Liav Orgad and Rainer Baubock: “New digital technologies are rapidly changing the global economy and have connected billions of people in deterritoralised social network. Will they also create new opportunities for global citizenship and alternatives to state-based political communities?
In his kick-off essay, Liav Orgad takes an optimistic view. Blockchain technology permits to give every human being a unique legal persona and allows individuals to associate in ‘cloud communities’ that may take on several functions of territorial states. 14 commentators discuss this vision.
Sceptics assume that states or business corporations have always found ways to capture and use new technologies for their purposes. They emphasise that the political functions of states, including their task to protect human rights, require territorial monopolies of legitimate coercion that cannot be provided by cloud communities.
Others point out that individuals would sort themselves out into cloud communities that are internally homogenous which risks to deepen political cleavages within territorial societies.
Finally, some authors are concerned that digital political communities will enhance global social inequalities through excluding from access those who are already worse off in the birthright lottery of territorial citizenship.
Optimists see instead the great potential of blockchain technology to overcome exclusion and marginalisation based on statelessness or sheer lack of civil registries; they regard it as a tool for enhancing individual freedom, since people are self-sovereign in controlling their personal data; and they emphasise the possibilities for emancipatory movements to mobilise for global justice across territorial borders or to create their own internally democratic political utopias.
In the boldest vision, the deficits of cloud communities as voluntary political associations with limited scope of power could be overcome in a global cryptodemocracy that lets all individuals participate on a one-person-one-vote basis in global political decisions….(More)”.
The ‘Datasphere’, Data Flows Beyond Control, and the Challenges for Law and Governance
Paper by Jean-Sylvestre Bergé, Stephane Grumbach and Vincenzo Zeno-Zencovich: “The flows of people, goods and capital, which have considerably increased in recent history, are leading to crises (e.g., migrants, tax evasion, food safety) which reveal the failure to control them. Much less visible, and not yet included in economic measurements, data flows have increased exponentially in the last two decades, with the digitisation of social and economic activities. A new space – Datasphere – is emerging, mostly supported by digital platforms which provide essential services reaching half of the world’s population directly. Their control over data flows raises new challenges to governance, and increasingly conflicts with public administration.
In this paper, we consider the need and the difficulty of regulating this emerging space and the different approaches followed on both sides of the Atlantic. We distinguish between three situations. We first consider data at rest, which is from the point of view of the location where data are physically stored. We then consider data in motion, and the issues related to their combination. Finally, we investigate data in action, that is data as vectors of command of legal or illegal activities over territories, with impacts on economy and society as well as security, and raise governance challenges.
The notion of ‘Datasphere’ proposes a holistic comprehension of all the ‘information’ existing on earth, originating both in natural and socio-economic systems, which can be captured in digital form, flows through networks, and is stored, processed and transformed by machines. It differs from the ‘Cyberspace’, which is mostly concerned with the networks, the technical instruments (from software and protocols to cables and data centers) together with the social activities it allows, and to what extent they could/should be allowed.
The paper suggests one – out of the many possible – approach to this new world. Clearly it would be impossible to delve in depth into all its facets, which are as many as those of the physical world. Rather, it attempts to present how traditional legal notions could be usefully managed to put order in a highly complex environment, avoiding a piecemeal approach that looks only at details….(More)”.
Algorithms are taking over – and woe betide anyone they class as a ‘deadbeat’
Zoe Williams at The Guardian: “The radical geographer and equality evangelist Danny Dorling tried to explain to me once why an algorithm could be bad for social justice.
Imagine if email inboxes became intelligent: your messages would be prioritised on arrival, so if the recipient knew you and often replied to you, you’d go to the top; I said that was fine. That’s how it works already. If they knew you and never replied, you’d go to the bottom, he continued. I said that was fair – it would teach me to stop annoying that person.
If you were a stranger, but typically other people replied to you very quickly – let’s say you were Barack Obama – you’d sail right to the top. That seemed reasonable. And if you were a stranger who others usually ignored, you’d fall off the face of the earth.
“Well, maybe they should get an allotment and stop emailing people,” I said.
“Imagine how angry those people would be,” Dorling said. “They already feel invisible and they [would] become invisible by design.”…
All our debates about the use of big data have centred on privacy, and all seem a bit distant: I care, in principle, whether or not Ocado knows what I bought on Amazon. But in my truest heart, I don’t really care whether or not my Frube vendor knows that I also like dystopian fiction of the 1970s.
I do, however, care that a program exists that will determine my eligibility for a loan by how often I call my mother. I care if landlords are using tools to rank their tenants by compliant behaviour, to create a giant, shared platform of desirable tenants, who never complain about black mould and greet each rent increase with a basket of muffins. I care if the police in Durham are using Experian credit scores to influence their custodial decisions, an example – as you may have guessed by its specificity – that is already real. I care that the same credit-rating company has devised a Mosaic score, which splits households into comically bigoted stereotypes: if your name is Liam and you are an “avid texter”, that puts you in “disconnected youth”, while if you’re Asha you’re in “crowded kaleidoscope”. It’s not a privacy issue so much as a profiling one, although, as anyone who has ever been the repeated victim of police stop-and-search could have told me years ago, these are frequently the same thing.
Privacy isn’t the right to keep secrets: it’s the right to be an individual, not a type; the right to make a choice that’s entirely your own; the right to be private….(More)”.
Information Asymmetries, Blockchain Technologies, and Social Change
Reflections by Stefaan Verhulst on “the potential (and challenges) of Distributed Ledgers for “Market for Lemons” Conditions: We live in a data age, and it has become common to extol the transformative power of data and information. It is now conventional to assume that many of our most pressing public problems—everything from climate change to terrorism to mass migration—are amenable to a “data fix.”
The truth, though, is a little more complicated. While there is no doubt that data—when analyzed and used responsibly—holds tremendous potential, many factors affect whether, and to what extent, that potential will ultimately be fulfilled.
Our ability to address complex public problems using data depends vitally on how our respective data ecosystems is designed (as well as ongoing questions of representation in, power over, and stewardship of these ecosystems).
Flaws in our data ecosystem that prevent us from addressing problems; may also be responsible for many societal failures and inequalities result from the fact that:
- some actors have better access to data than others;
- data is of poor quality (or even “fake”); contains implicit bias; and/or is not validated and thus not trusted;
- only easily accessible data are shared and integrated (“open washing”) while important data remain carefully hidden or without resources for relevant research and analysis; and more generally that
- even in an era of big and open data, information too often remains stove-piped, siloed, and generally difficult to access.
Several observers have pointed to the relationship between these information asymmetries and, for example, corruption, financial exclusion, global pandemics, forced mass migration, human rights abuses, and electoral fraud.
Consider the transaction costs, power inequities and other obstacles that result from such information asymmetries, namely:
– At the individual level: too often someone who is trying to open a bank account (or sign up for new cell phone service) is unable to provide all the requisite information, such as credit history, proof of address or other confirmatory and trusted attributes of identity. As such, information asymmetries are in effect limiting this individual’s access to financial and communications services.
– At the corporate level, a vast body of literature in economics has shown how uncertainty over the quality and trustworthiness of data can impose transaction costs, limit the development of markets for goods and services, or shut them down altogether. This is the well-known “market for lemons” problem made famous in a 1970 paper of the same name by George Akerlof.
– At the societal or governance level, information asymmetries don’t just affect the efficiency of markets or social inequality. They can also incentivize unwanted behaviors that cause substantial public harm. Tyrants and corrupt politicians thrive on limiting their citizens’ access to information (e.g., information related to bank accounts, investment patterns or disbursement of public funds). Likewise, criminals, operate and succeed in the information-scarce corners of the underground economy.
Blockchain technologies and Information Asymmetries
This is where blockchain comes in. At their core, blockchain technologies are a new type of disclosure mechanism that have the potential to address some of the information asymmetries listed above. There are many types of blockchain technologies, and while I use the blanket term ‘blockchain’ in the below for simplicity’s sake, the nuances between different types of blockchain technologies can greatly impact the character and likelihood of success of a given initiative.
By leveraging a shared and verified database of ledgers stored in a distributed manner, blockchain seeks to redesign information ecosystems in a more transparent, immutable, and trusted manner. Solving information asymmetries may be the real potential of blockchain, and this—much more than the current hype over virtual currencies—is the real reason to assess its potential….(More)”.