Big data’s ‘streetlight effect’: where and how we look affects what we see


 at the Conversation: “Big data offers us a window on the world. But large and easily available datasets may not show us the world we live in. For instance, epidemiological models of the recent Ebola epidemic in West Africa using big data consistently overestimated the risk of the disease’s spread and underestimated the local initiatives that played a critical role in controlling the outbreak.

Researchers are rightly excited about the possibilities offered by the availability of enormous amounts of computerized data. But there’s reason to stand back for a minute to consider what exactly this treasure trove of information really offers. Ethnographers like me use a cross-cultural approach when we collect our data because family, marriage and household mean different things in different contexts. This approach informs how I think about big data.

We’ve all heard the joke about the drunk who is asked why he is searching for his lost wallet under the streetlight, rather than where he thinks he dropped it. “Because the light is better here,” he said.

This “streetlight effect” is the tendency of researchers to study what is easy to study. I use this story in my course on Research Design and Ethnographic Methods to explain why so much research on disparities in educational outcomes is done in classrooms and not in students’ homes. Children are much easier to study at school than in their homes, even though many studies show that knowing what happens outside the classroom is important. Nevertheless, schools will continue to be the focus of most research because they generate big data and homes don’t.

The streetlight effect is one factor that prevents big data studies from being useful in the real world – especially studies analyzing easily available user-generated data from the Internet. Researchers assume that this data offers a window into reality. It doesn’t necessarily.

Looking at WEIRDOs

Based on the number of tweets following Hurricane Sandy, for example, it might seem as if the storm hit Manhattan the hardest, not the New Jersey shore. Another example: the since-retired Google Flu Trends, which in 2013 tracked online searches relating to flu symptoms to predict doctor visits, but gave estimates twice as high as reports from the Centers for Disease Control and Prevention. Without checking facts on the ground, researchers may fool themselves into thinking that their big data models accurately represent the world they aim to study.

The problem is similar to the “WEIRD” issue in many research studies. Harvard professor Joseph Henrich and colleagues have shown that findings based on research conducted with undergraduates at American universities – whom they describe as “some of the most psychologically unusual people on Earth” – apply only to that population and cannot be used to make any claims about other human populations, including other Americans. Unlike the typical research subject in psychology studies, they argue, most people in the world are not from Western, Educated, Industrialized, Rich and Democratic societies, i.e., WEIRD.

Twitter users are also atypical compared with the rest of humanity, giving rise to what our postdoctoral researcher Sarah Laborde has dubbed the “WEIRDO” problem of data analytics: most people are not Western, Educated, Industrialized, Rich, Democratic and Online.

Context is critical

Understanding the differences between the vast majority of humanity and that small subset of people whose activities are captured in big data sets is critical to correct analysis of the data. Considering the context and meaning of data – not just the data itself – is a key feature of ethnographic research, argues Michael Agar, who has written extensively about how ethnographers come to understand the world….(https://theconversation.com/big-datas-streetlight-effect-where-and-how-we-look-affects-what-we-see-58122More)”

Towards a critique of cybernetic urbanism: The smart city and the society of control


Maroš Krivý at Planning Theory: “The smart city has become a hegemonic notion of urban governance, transforming and supplanting planning. The first part of this article reviews current critiques of this notion. Scholars present three main arguments against the smart city: that it is incompatible with an informal character of the city, that it subjects the city to corporate power and that it reproduces social and urban inequalities. It is argued that these critiques either misunderstand how power functions in the smart city or fail to address it as a specific modality of entrepreneurial urban governance. The second part advances an alternative critique, contending that the smart city should be understood as an urban embodiment of the society of control (Deleuze). The smart city is embedded in the intellectual framework of second order cybernetics and articulates urban subjectivity in terms of data flows. Planning as a political practice is superseded by an environmental-behavioural control, in which subjectivity is articulated supra-individually (permeating the city with sensing nodes) and infra-individually (making citizens into sensing nodes)….(More)”

Where are Human Subjects in Big Data Research? The Emerging Ethics Divide


Paper by Jacob Metcalf and Kate Crawford: “There are growing discontinuities between the research practices of data science and established tools of research ethics regulation. Some of the core commitments of existing research ethics regulations, such as the distinction between research and practice, cannot be cleanly exported from biomedical research to data science research. These discontinuities have led some data science practitioners and researchers to move toward rejecting ethics regulations outright. These shifts occur at the same time as a proposal for major revisions to the Common Rule — the primary regulation governing human-subjects research in the U.S. — is under consideration for the first time in decades. We contextualize these revisions in long-running complaints about regulation of social science research, and argue data science should be understood as continuous with social sciences in this regard. The proposed regulations are more flexible and scalable to the methods of non-biomedical research, but they problematically exclude many data science methods from human-subjects regulation, particularly uses of public datasets. The ethical frameworks for big data research are highly contested and in flux, and the potential harms of data science research are unpredictable. We examine several contentious cases of research harms in data science, including the 2014 Facebook emotional contagion study and the 2016 use of geographical data techniques to identify the pseudonymous artist Banksy. To address disputes about human-subjects research ethics in data science,critical data studies should offer a historically nuanced theory of “data subjectivity” responsive to the epistemic methods, harms and benefits of data science and commerce….(More)”

Jakarta’s plans for predictive government


 at GovInsider: “Jakarta is predicting floods and traffic using complaints data, and plans to do so for dengue as well.

Its Smart City Unit has partnered with startup Qlue to build a dashboard, analysing data from online complaints, sensors and traffic apps. “Our algorithms can predict several things related to our reports such as flood, traffic, and others”, Qlue co-founder and CEO Rama Raditya told GovInsider.

Take floods, for instance. Using trends in complaints from citizens, water level history from sensors and weather data, it can predict the intensity of floods in specific locations next year. “They can predict what will happen when they compare the weather with the flood conditions from last year”, he said.

The city will start to predict dengue hotspots from next year, Rama said. The dashboard was not originally looking at dengue, but after receiving “thousands of complaints on dengue locations”, the government is now looking into this data. “Next year our algorithm will allow the government to know before it happens so they can prepare the amount of medication and so on within each district,” he said.

The dashboard is paired with an app. The app started with collecting citizens’ complaints and has been expanding with new features. It now has a virtual reality section to explore tourist sites in the city. Next week it is launching an augmented reality feature giving directions to nearby ATMs, restaurants,mosques and parks, Rama said.

Qlue has become a strategic part of the Jakarta administration, with the Governor himself using it to decide who to fire and promote. Following its rise in the capital city, it is now being used by 12 other cities across Indonesia: Bandung, Makassar, Bali, Manado, Surabaya, Bogor, Depok, Palembang, Bekasi,Yogyakarta, Riau and Semarang….(More)

The Biggest Hope for Ending Corruption Is Open Public Contracting


Gavin Hayman at the Huffington Post: “This week the British Prime Minister David Cameron is hosting an international anti-corruption summit. The scourge of anonymous shell companies and hidden identities rightly seizes the public’s imagination. We can all picture the suitcases of cash and tropical islands involved. As well as acting on offshore and onshore money laundering havens, world leaders at the summit should also be asking themselves where all this money is being stolen from in the first place.

The answer is mostly from public contracting: government spending through private companies to deliver works, goods and services to citizens. It is technical, dull and universally obscure. But it is the single biggest item of spending by government – amounting to a staggering $9,500,000,000,000 each year. This concentration of money, government discretion, and secrecy makes public contracting so vulnerable to corruption. Data on prosecutions tracked by the OECD Anti-Bribery Convention shows that roughly 60% of bribes were paid to win public contracts.

Corruption in contracting deprives ordinary people of vital goods and services, and sometimes even kills: I was one of many Londoners moved by Ai Wei Wei’s installation that memorialised the names of thousands of children killed in China’s Sichuan earthquake in 2008. Their supposed earthquake-proof schools collapsed on them like tofu.

Beyond corruption, inefficiency and mismanagement of public contracts cost countries billions. Governments just don’t seem to know what they are buying, when, from whom, and whether they got a good price.

This problem can be fixed. But it will require a set of innovations best described as open contracting: using accessible open data and better engagement so that citizens, government and business can follow the money in government contracts from planning to tendering to performance and closure. The coordination required can be hard work but it is achievable: any country can make substantial progress on open contracting with some political leadership. My organisation supports an open data standard and a free global helpdesk to assist governments, civil society, and business in this transition….(More)”

Using Tweets and Posts to Speed Up Organ Donation


David Bornstein in the New York Times: “…But there is a problem: Demand for organ transplants vastly outstrips supply, as my colleague Tina Rosenberg has reported. In 2015 in the United States, there were only about 9,000 deceased donors (each of whom can save up to eight lives) and 6,000 living donors (who most often donate a kidney or liver lobe). Today, more than 121,000 people are on waiting lists, roughly 100,000 for kidney transplants, 15,000 for livers, and 4,000 for hearts. And the lists keep getting longer — 3,000 people are added to the kidney list each month. Last year, more than 4,000 people died while waiting for a new kidney; 3,600 dropped off the waiting list because they became too sick to qualify for a transplant.

Although 95 percent of Americans support organ donation, fewer than half of American adults are registered as donors. Research suggests that the number who donate organs after death could be increased greatly. Moreover, surveys indicate untapped support for living donation, too; nearly one in four people have told pollsters they would be willing to donate a kidney to save the life of a friend, community member or stranger. “If one in 10,000 Americans decided to donate each year, there wouldn’t be a shortage,” said Josh Morrison, who donated a kidney to a stranger and founded WaitList Zero, an organization that works to increase living kidney donation.

What could be done to harness people’s generous impulses more effectively to save lives?

One group attacking the question is Organize, which was founded in 2014 by Rick Segal’s son Greg, and Jenna Arnold, a media producer and educator who has worked with MTV and the United Nations in engaging audiences in social issues. Organize uses technology, open data and insights from behavioral economics to simplify becoming an organ donor.

This approach is shaking up longstanding assumptions.

For example, in the last four decades, people have most often been asked to register as an organ donor as part of renewing or obtaining a driver’s license. This made sense in the 1970s, when the nation’s organ procurement system was being set up, says Blair Sadler, the former president and chief executive of Rady Children’s Hospital in San Diego. He helped draft theUniform Anatomical Gift Act in 1967, which established a national legal framework for organ donation. “Health care leaders were asking, ‘How do we make this more routine?’” he recalled. “It’s hard to get people to put it in their wills. Oh, there’s a place where people have to go every five years” — their state Department of Motor Vehicles.

Today, governments allow individuals to initiate registrations online, but the process can be cumbersome. For example, New York State required me to fill out a digital form on my computer, then print it out and mail it to Albany. Donate Life America, by contrast, allows individuals to register online as an organ donor just by logging in with email or a Facebook or Google account — much easier.

In practice, legal registration may be overemphasized. It may be just as important to simply make your wishes known to your loved ones. When people tell relatives, “If something happens to me, I want to be an organ donor,” families almost always respect their wishes. This is particularly important for minors, who cannot legally register as donors.

Using that insight, Organize is making it easier to conduct social media campaigns to both prompt and collect sentiments about organ donation from Facebook, Twitter and Instagram.

If you post or tweet about organ donation, or include a hashtag like #iwanttobeanorgandonor, #organdonor, #donatemyparts, or any of a number of other relevant terms, Organize captures the information and logs it in a registry. In a year, it has gathered the names of nearly 600,000 people who declare support for organ donation. Now the big question is: Will it actually increase organ donation rates?

We should begin getting an idea pretty soon. Organize has been working with the Nevada Donor Network to test its registry. And in the coming months, several other states will begin using it….(More)”

Regulatory Transformations: An Introduction


Chapter by Bettina Lange and Fiona Haines in the book Regulatory Transformations: “Regulation is no longer the prerogative of either states or markets. Increasingly citizens in association with businesses catalyse regulation which marks the rise of a social sphere in regulation. Around the world, in San Francisco, Melbourne, Munich and Mexico City, citizens have sought to transform how and to what end economic transactions are conducted. For instance, ‘carrot mob’ initiatives use positive economic incentives, not provided by a state legal system, but by a collective of civil society actors in order to change business behaviour. In contrast to ‘negative’ consumer boycotts, ‘carrotmob’ events use ‘buycotts’. They harness competition between businesses as the lever for changing how and for what purpose business transactions are conducted. Through new social media ‘carrotmobs’ mobilize groups of citizens to purchase goods at a particular time in a specific shop. The business that promises to spend the greatest percentage of its takings on, for instance, environmental improvements, such as switching to a supplier of renewable energy, will be selected for an organized shopping spree and financially benefit from the extra income it receives from the ‘carrot mob’ event.’Carrot mob’ campaigns chime with other fundamental challenges to conventional economic activity, such as the shared use of consumer goods through citizens collective consumption which questions traditional conceptions of private property….(More; Other Chapters)”

 

The promises and pitfalls of open urban data


Keynote by Robert M. Goerge at the 2016 Third International Conference on eDemocracy & eGovernment (ICEDEG) Open data portals are springing up around the world. Municipalities, states and countries have made available data that has never been as accessible to the general public. These data have led to many applications that have informed the public of new urban conditions or provided information to make urban life easier. However, it should be clear that these data have limitations in the effort to solve many urban problems because in may cases they do not provide all of the information that is needed by government and NGOs to get at the cause or at least correlations of the problem at hand. It is still necessary to have access to data that cannot be made public to address some of most serious urban problems. While this seems just to apply to public access, it is also the case that government employees or those with legitimate access to the necessary non-open data lack access because of legal, organizational, privacy, or bureaucratic issues. This limits the promise of increasing data-driven efforts to address the most critical urban issues. Solutions to these problems in the context of ethical behavior will be discussed….(More)”

‘Big data’ was supposed to fix education. It didn’t. It’s time for ‘small data’


Pasi Sahlberg and Jonathan Hasak in the Washington Post: “One thing that distinguishes schools in the United States from schools around the world is how data walls, which typically reflect standardized test results, decorate hallways and teacher lounges. Green, yellow, and red colors indicate levels of performance of students and classrooms. For serious reformers, this is the type of transparency that reveals more data about schools and is seen as part of the solution to how to conduct effective school improvement. These data sets, however, often don’t spark insight about teaching and learning in classrooms; they are based on analytics and statistics, not on emotions and relationships that drive learning in schools. They also report outputs and outcomes, not the impacts of learning on the lives and minds of learners….

If you are a leader of any modern education system, you probably care a lot about collecting, analyzing, storing, and communicating massive amounts of information about your schools, teachers, and students based on these data sets. This information is “big data,” a term that first appeared around 2000, which refers to data sets that are so large and complex that processing them by conventional data processing applications isn’t possible. Two decades ago, the type of data education management systems processed were input factors of education system, such as student enrollments, teacher characteristics, or education expenditures handled by education department’s statistical officer. Today, however, big data covers a range of indicators about teaching and learning processes, and increasingly reports on student achievement trends over time.

With the outpouring of data, international organizations continue to build regional and global data banks. Whether it’s the United Nations, the World Bank, the European Commission, or the Organization for Economic Cooperation and Development, today’s international reformers are collecting and handling more data about human development than before. Beyond government agencies, there are global education and consulting enterprises like Pearson and McKinsey that see business opportunities in big data markets.

Among the best known today is the OECD’s Program for International Student Assessment (PISA), which measures reading, mathematical, and scientific literacy of 15-year-olds around the world. OECD now also administers an Education GPS, or a global positioning system, that aims to tell policymakers where their education systems place in a global grid and how to move to desired destinations. OECD has clearly become a world leader in the big data movement in education.

Despite all this new information and benefits that come with it, there are clear handicaps in how big data has been used in education reforms. In fact, pundits and policymakers often forget that Big data, at best, only reveals correlations between variables in education, not causality. As any introduction to statistics course will tell you, correlation does not imply causation….
We believe that it is becoming evident that big data alone won’t be able to fix education systems. Decision-makers need to gain a better understanding of what good teaching is and how it leads to better learning in schools. This is where information about details, relationships and narratives in schools become important. These are what Martin Lindstrom calls “small data”: small clues that uncover huge trends. In education, these small clues are often hidden in the invisible fabric of schools. Understanding this fabric must become a priority for improving education.

To be sure, there is not one right way to gather small data in education. Perhaps the most important next step is to realize the limitations of current big data-driven policies and practices. Too strong reliance on externally collected data may be misleading in policy-making. This is an example of what small data look like in practice:

  • It reduces census-based national student assessments to the necessary minimum and transfer saved resources to enhance the quality of formative assessments in schools and teacher education on other alternative assessment methods. Evidence shows that formative and other school-based assessments are much more likely to improve quality of education than conventional standardized tests.
  • It strengthens collective autonomy of schools by giving teachers more independence from bureaucracy and investing in teamwork in schools. This would enhance social capital that is proved to be critical aspects of building trust within education and enhancing student learning.
  • It empowers students by involving them in assessing and reflecting their own learning and then incorporating that information into collective human judgment about teaching and learning (supported by national big data). Because there are different ways students can be smart in schools, no one way of measuring student achievement will reveal success. Students’ voices about their own growth may be those tiny clues that can uncover important trends of improving learning.

Edwards Deming once said that “without data you are another person with an opinion.” But Deming couldn’t have imagined the size and speed of data systems we have today….(More)”

Critics allege big data can be discriminatory, but is it really bias?


Pradip Sigdyal at CNBC: “…The often cited case of big data discrimination points to a research conducted few years ago by Latanya Sweeny, who heads the Data Privacy Lab at Harvard University.

The case involves Google ad results when searching for certain kinds of names on the internet. In her research, Sweeney found that distinct sounding names often associated with blacks showed up with a disproportionately higher number of arrest record ads compared to white sounding names by roughly 18 percent of the time. Google has since fixed the issue, although they never publicly stated what they did to correct the problem.

The proliferation of big data in the last few years has seen other allegations of improper use and bias. These allegations run the gamut, from online price discrimination and consequences of geographic targeting to the controversial use of crime predicting technology by law enforcement, and lack of sufficient representative[data] sampleused in some public works decisions.

The benefits of big data need to be balanced with the risks associated with applying modern technologies to address societal issues. Yet data advocates believe that democratization of data has in essence givenpower to the people to affect change by transferring ‘tribal knowledge’ from experts to data-savvy practitioners.

Big data is here to stay

According to some advocates, the problem is not so much that ‘big data discriminates’, but that failures by data professionals risk misinterpreting the findings at the heart of data mining and statistical learning. They add that the benefits far outweigh the concerns.

“In my academic research and industry consulting, I have seen tremendous benefits accruing to firms, organizations and consumers alike from the use of data-driven decision-making, data science, and business analytics,” Anindya Ghose, the director of Center for Business Analytics at New York University’s Stern School of Business, said.

“To be perfectly honest, I do not at all understand these big-data cynics who engage in fear mongering about the implications of data analytics,” Ghose said.

“Here is my message to the cynics and those who keep cautioning us: ‘Deal with it, big data analytics is here to stay forever’.”…(More)”