Translator Gator


Yulistina Riyadi & Lalitia Apsar at Global Pulse: “Today Pulse Lab Jakarta launches Translator Gator, a new language game to support research initiatives in Indonesia. Players can earn phone credit by translating words between English and six common Indonesian languages. The database of keywords generated by the game will be used by researchers on topics ranging from computational social science to public policy.

Translator Gator is inspired by the need to socialise the 17 Sustainable Development Goals (SDGs), currently being integrated into the Government of Indonesia’s programme, and the need to better monitor progress against the varied indicators. Thus, Translator Gator will raise awareness of the SDGs and develop a taxonomy of keywords to inform research.

An essential element of public policy research is to pay attention to citizens’ feedback, both active and passive, for instance, citizens’ complaints to governments through official channels and on social media. To do this in a computational manner, researchers need a set of keywords, or ‘taxonomy’, by topic or government priorities for example.

But given the rich linguistic and cultural diversity in Indonesia, this poses some difficulties in that many languages and dialects are used in different provinces and islands. On social media, such variations – including jargon – make building a list of keywords more challenging as words, context and, by extension, meaning change from region to region. …(More)”

Core Concepts: Computational social science


Adam Mann at PNAS:Cell phone tower data predicts which parts of London can expect a spike in crime (1). Google searches for polling place information on the day of an election reveal the consequences of different voter registration laws (2). Mathematical models explain how interactions among financial investors produce better yields, and even how they generate economic bubbles (3).

Figure

Using cell-phone and taxi GPS data, researchers classified people in San Francisco into “tribal networks,” clustering them according to their behavioral patterns. Student’s, tourists, and businesspeople all travel through the city in various ways, congregating and socializing in different neighborhoods. Image courtesy of Alex Pentland (Massachusetts Institute of Technology, Cambridge, MA).

Figure

Where people hail from in the Mexico City area, here indicated by different colors, feeds into a crime-prediction model devised by Alex Pentland and colleagues (6). Image courtesy of Alex Pentland (Massachusetts Institute of Technology, Cambridge, MA).

 These are just a few examples of how a suite of technologies is helping bring sociology, political science, and economics into the digital age. Such social science fields have historically relied on interviews and survey data, as well as censuses and other government databases, to answer important questions about human behavior. These tools often produce results based on individuals—showing, for example, that a wealthy, well-educated, white person is statistically more likely to vote (4)—but struggle to deal with complex situations involving the interactions of many different people.

 

A growing field called “computational social science” is now using digital tools to analyze the rich and interactive lives we lead. The discipline uses powerful computer simulations of networks, data collected from cell phones and online social networks, and online experiments involving hundreds of thousands of individuals to answer questions that were previously impossible to investigate. Humans are fundamentally social creatures and these new tools and huge datasets are giving social scientists insights into exactly how connections among people create societal trends or heretofore undetected patterns, related to everything from crime to economic fortunes to political persuasions. Although the field provides powerful ways to study the world, it’s an ongoing challenge to ensure that researchers collect and store the requisite information safely, and that they and others use that information ethically….(More)”

The Crusade Against Multiple Regression Analysis


Richard Nisbett at the Edge: (VIDEO) “…The thing I’m most interested in right now has become a kind of crusade against correlational statistical analysis—in particular, what’s called multiple regression analysis. Say you want to find out whether taking Vitamin E is associated with lower prostate cancer risk. You look at the correlational evidence and indeed it turns out that men who take Vitamin E have lower risk for prostate cancer. Then someone says, “Well, let’s see if we do the actual experiment, what happens.” And what happens when you do the experiment is that Vitamin E contributes to the likelihood of prostate cancer. How could there be differences? These happen a lot. The correlational—the observational—evidence tells you one thing, the experimental evidence tells you something completely different.

In the case of health data, the big problem is something that’s come to be called the healthy user bias, because the guy who’s taking Vitamin E is also doing everything else right. A doctor or an article has told him to take Vitamin E, so he does that, but he’s also the guy who’s watching his weight and his cholesterol, gets plenty of exercise, drinks alcohol in moderation, doesn’t smoke, has a high level of education, and a high income. All of these things are likely to make you live longer, to make you less subject to morbidity and mortality risks of all kinds. You pull one thing out of that correlate and it’s going to look like Vitamin E is terrific because it’s dragging all these other good things along with it.

This is not, by any means, limited to health issues. A while back, I read a government report in The New York Times on the safety of automobiles. The measure that they used was the deaths per million drivers of each of these autos. It turns out that, for example, there are enormously more deaths per million drivers who drive Ford F150 pickups than for people who drive Volvo station wagons. Most people’s reaction, and certainly my initial reaction to it was, “Well, it sort of figures—everybody knows that Volvos are safe.”

Let’s describe two people and you tell me who you think is more likely to be driving the Volvo and who is more likely to be driving the pickup: a suburban matron in the New York area and a twenty-five-year-old cowboy in Oklahoma. It’s obvious that people are not assigned their cars. We don’t say, “Billy, you’ll be driving a powder blue Volvo station wagon.” Because of this self-selection problem, you simply can’t interpret data like that. You know virtually nothing about the relative safety of cars based on that study.

I saw in The New York Times recently an article by a respected writer reporting that people who have elaborate weddings tend to have marriages that last longer. How would that be? Maybe it’s just all the darned expense and bother—you don’t want to get divorced. It’s a cognitive dissonance thing.

Let’s think about who makes elaborate plans for expensive weddings: people who are better off financially, which is by itself a good prognosis for marriage; people who are more educated, also a better prognosis; people who are richer; people who are older—the later you get married, the more likelihood that the marriage will last, and so on.

The truth is you’ve learned nothing. It’s like saying men who are a somebody III or IV have longer-lasting marriages. Is it because of the suffix there? No, it’s because those people are the types who have a good prognosis for a lengthy marriage.

A huge range of science projects are done with multiple regression analysis. The results are often somewhere between meaningless and quite damaging….(More)

What World Are We Building?


danah boyd at Points: “….Knowing how to use data isn’t easy. One of my colleagues at Microsoft Research — Eric Horvitz — can predict with startling accuracy whether someone will be hospitalized based on what they search for. What should he do with that information? Reach out to people? That’s pretty creepy. Do nothing? Is that ethical? No matter how good our predictions are, figuring out how to use them is a complex social and cultural issue that technology doesn’t solve for us. In fact, as it stands, technology is just making it harder for us to have a reasonable conversation about agency and dignity, responsibility and ethics.

Data is power. Increasingly we’re seeing data being used to assert power over people. It doesn’t have to be this way, but one of the things that I’ve learned is that, unchecked, new tools are almost always empowering to the privileged at the expense of those who are not.

For most media activists, unfettered Internet access is at the center of the conversation, and that is critically important. Today we’re standing on a new precipice, and we need to think a few steps ahead of the current fight.

We are moving into a world of prediction. A world where more people are going to be able to make judgments about others based on data. Data analysis that can mark the value of people as worthy workers, parents, borrowers, learners, and citizens. Data analysis that has been underway for decades but is increasingly salient in decision-making across numerous sectors. Data analysis that most people don’t understand.

Many activists will be looking to fight the ecosystem of prediction — and to regulate when and where prediction can be used. This is all fine and well when we’re talking about how these technologies are designed to do harm. But more often than not, these tools will be designed to be helpful, to increase efficiency, to identify people who need help. Their positive uses will exist alongside uses that are terrifying. What do we do?One of the most obvious issues is the limited diversity of people who are building and using these tools to imagine our future. Statistical and technical literacy isn’t even part of the curriculum in most American schools. In our society where technology jobs are high-paying and technical literacy is needed for citizenry, less than 5% of high schools offer AP computer science courses. Needless to say, black and brown youth are much less likely to have access, let alone opportunities. If people don’t understand what these systems are doing, how do we expect people to challenge them?

One of the most obvious issues is the limited diversity of people who are building and using these tools to imagine our future. Statistical and technical literacy isn’t even part of the curriculum in most American schools. In our society where technology jobs are high-paying and technical literacy is needed for citizenry, less than 5% of high schools offer AP computer science courses. Needless to say, black and brown youth are much less likely to have access, let alone opportunities. If people don’t understand what these systems are doing, how do we expect people to challenge them?

We must learn how to ask hard questions of technology and of those making decisions based data-driven tech. And opening the black box isn’t enough. Transparency of data, algorithms, and technology isn’t enough. We need to build assessment into any system that we roll-out. You can’t just put millions of dollars of surveillance equipment into the hands of the police in the hope of creating police accountability, yet, with police body-worn cameras, that’s exactly what we’re doing. And we’re not even trying to assess the implications. This is probably the fastest roll-out of a technology out of hope, and it won’t be the last. How do we get people to look beyond their hopes and fears and actively interrogate the trade-offs?

Technology plays a central role — more and more — in every sector, every community, every interaction. It’s easy to screech in fear or dream of a world in which every problem magically gets solved. To make the world a better place, we need to start paying attention to the different tools that are emerging and learn to frame hard questions about how they should be put to use to improve the lives of everyday people.

We need those who are thinking about social justice to understand technology and those who understand technology to commit to social justice….(More)”

Don’t let transparency damage science


Stephan Lewandowsky and Dorothy Bishop explain in Nature “how the research community should protect its members from harassment, while encouraging the openness that has become essential to science:…

Screen Shot 2016-01-26 at 10.37.26 AMTransparency has hit the headlines. In the wake of evidence that many research findings are not reproducible, the scientific community has launched initiatives to increase data sharing, transparency and open critique. As with any new development, there are unintended consequences. Many measures that can improve science — shared data, post-publication peer review and public engagement on social media — can be turned against scientists. Endless information requests, complaints to researchers’ universities, online harassment, distortion of scientific findings and even threats of violence: these were all recurring experiences shared by researchers from a broad range of disciplines at a Royal Society-sponsored meeting last year that we organized to explore this topic. Orchestrated and well-funded harassment campaigns against researchers working in climate change and tobacco control are well documented. Some hard-line opponents to other research, such as that on nuclear fallout, vaccination, chronic fatigue syndrome or genetically modified organisms, although less resourced, have employed identical strategies….(More)”

 

The power of crowds


Pietro Michelucci and Janis L. Dickinson in Science: “Human computation, a term introduced by Luis von Ahn, refers to distributed systems that combine the strengths of humans and computers to accomplish tasks that neither can do alone. The seminal example is reCAPTCHA, a Web widget used by 100 million people a day when they transcribe distorted text into a box to prove they are human. This free cognitive labor provides users with access to Web content and keeps websites safe from spam attacks, while feeding into a massive, crowd-powered transcription engine that has digitized 13 million articles from The New York Times archives. But perhaps the best known example of human computation is Wikipedia. Despite initial concerns about accuracy, it has become the key resource for all kinds of basic information. Information science has begun to build on these early successes, demonstrating the potential to evolve human computation systems that can model and address wicked problems (those that defy traditional problem-solving methods) at the intersection of economic, environmental, and sociopolitical systems….(More)”

What Is Citizen Science? – A Scientometric Meta-Analysis


Christopher Kullenberg and Dick Kasperowski at PLOS One: “The concept of citizen science (CS) is currently referred to by many actors inside and outside science and research. Several descriptions of this purportedly new approach of science are often heard in connection with large datasets and the possibilities of mobilizing crowds outside science to assists with observations and classifications. However, other accounts refer to CS as a way of democratizing science, aiding concerned communities in creating data to influence policy and as a way of promoting political decision processes involving environment and health.

Objective

In this study we analyse two datasets (N = 1935, N = 633) retrieved from the Web of Science (WoS) with the aim of giving a scientometric description of what the concept of CS entails. We account for its development over time, and what strands of research that has adopted CS and give an assessment of what scientific output has been achieved in CS-related projects. To attain this, scientometric methods have been combined with qualitative approaches to render more precise search terms.

Results

Results indicate that there are three main focal points of CS. The largest is composed of research on biology, conservation and ecology, and utilizes CS mainly as a methodology of collecting and classifying data. A second strand of research has emerged through geographic information research, where citizens participate in the collection of geographic data. Thirdly, there is a line of research relating to the social sciences and epidemiology, which studies and facilitates public participation in relation to environmental issues and health. In terms of scientific output, the largest body of articles are to be found in biology and conservation research. In absolute numbers, the amount of publications generated by CS is low (N = 1935), but over the past decade a new and very productive line of CS based on digital platforms has emerged for the collection and classification of data….(More)”

Distributed ledger technology: beyond block chain


UK Government Office for Science: “In a major report on distributed ledgers published today (19 January 2016), the Government Chief Scientist, Sir Mark Walport, sets out how this technology could transform the delivery of public services and boost productivity.

A distributed ledger is a database that can securely record financial, physical or electronic assets for sharing across a network through entirely transparent updates of information.

Its first incarnation was ‘Blockchain’ in 2008, which underpinned digital cash systems such as Bitcoin. The technology has now evolved into a variety of models that can be applied to different business problems and dramatically improve the sharing of information.

Distributed ledger technology could provide government with new tools to reduce fraud, error and the cost of paper intensive processes. It also has the potential to provide new ways of assuring ownership and provenance for goods and intellectual property.

Distributed ledgers are already being used in the diamond markets and in the disbursing of international aid payments.

Sir Mark Walport said:

Distributed ledger technology has the potential to transform the delivery of public and private services. It has the potential to redefine the relationship between government and the citizen in terms of data sharing, transparency and trust and make a leading contribution to the government’s digital transformation plan.

Any new technology creates challenges, but with the right mix of leadership, collaboration and sound governance, distributed ledgers could yield significant benefits for the UK.

The report makes a number of recommendations which focus on ministerial leadership, research, standards and the need for proof of concept trials.

They include:

  • government should provide ministerial leadership to ensure that it provides the vision, leadership and the platform for distributed ledger technology within government; this group should consider governance, privacy, security and standards
  • government should establish trials of distributed ledgers in order to assess the technology’s usability within the public sector
  • government could support the creation of distributed ledger demonstrators for local government that will bring together all the elements necessary to test the technology and its application.
  • the UK research community should invest in the research required to ensure that distributed ledgers are scalable, secure and provide proof of correctness of their contents….View the report ‘Distributed ledger technology: beyond block chain’.”

The impact of open access scientific knowledge


Jack Karsten and Darrell M. West at Brookings: “In spite of technological advancements like the Internet, academic publishing has operated in much the same way for centuries. Scientists voluntarily review their peers’ papers for little or no compensation; the paper’s author likewise does not receive payment from academic publishers. Though most of the costs of publishing a journal are administrative, the cost of subscribing to scientific journals nevertheless increased 600 percent between 1984 and 2002. The funding for the research libraries that form the bulk of journal subscribers has not kept pace, leading to campaigns at universities including Harvard to boycott for-profit publishers.

Though the Internet has not yet brought down the price of academic journal subscriptions, it has led to some interesting alternatives. In 2015, the Twitter hashtag #icanhazPDF was created to request copies of papers located behind paywalls. Anyone with access to a specific paper can download it and then e-mail it to the requester. The practice violates the copyright of publishers, but puts papers in reach of researchers who would otherwise not be able to read them. If a researcher cannot read a journal article in the first place, they cannot go on to cite it, which raises the profile of the cited article and the journal that published it. The publisher is caught between two conflicting goals: to increase the number of citations for their articles and earning revenue to stay in business.

Thinking outside the journal

A trio of University of Chicago researchers examines this issue through the lens of Wikipedia in a paper titled “Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science.” Wikipedia makes a compelling subject for scientific diffusion given its status as one of the most visited websites in the world, attracting 374 million unique visitors monthly as of September 2015. The study found that on English language articles, Wikipedia editors are 47 percent more likely to cite an article from an open access journal. Anyone using Wikipedia as a first source for information on a subject is more likely to read information from open source journals. If readers click through the links to cited articles, they can read the actual text of these open-source journal articles.

Given how much the federal government spends on scientific research ($66 billion on nondefense R&D in 2015), it has a large role to play in the diffusion of scientific knowledge. Since 2008, the National Institutes of Health (NIH) has required researchers who publish in academic journals to also publish in PubMed, an online open access journal. Expanding provisions like the NIH Public Access Policy to other agencies and to recipients of federal grants at universities would give the public and other researchers a wealth of scientific information. Scientific literacy, even on cutting-edge research, is increasingly important when science informs policy on major issues such as climate change and health care….(More)”

Algorithmic Life: Calculative Devices in the Age of Big Data


Book edited by Louise Amoore and Volha Piotukh: “This book critically explores forms and techniques of calculation that emerge with digital computation, and their implications. The contributors demonstrate that digital calculative devices matter beyond their specific functions as they progressively shape, transform and govern all areas of our life. In particular, it addresses such questions as:

  • How does the drive to make sense of, and productively use, large amounts of diverse data, inform the development of new calculative devices, logics and techniques?
  • How do these devices, logics and techniques affect our capacity to decide and to act?
  • How do mundane elements of our physical and virtual existence become data to be analysed and rearranged in complex ensembles of people and things?
  • In what ways are conventional notions of public and private, individual and population, certainty and probability, rule and exception transformed and what are the consequences?
  • How does the search for ‘hidden’ connections and patterns change our understanding of social relations and associative life?
  • Do contemporary modes of calculation produce new thresholds of calculability and computability, allowing for the improbable or the merely possible to be embraced and acted upon?
  • As contemporary approaches to governing uncertain futures seek to anticipate future events, how are calculation and decision engaged anew?

Drawing together different strands of cutting-edge research that is both theoretically sophisticated and empirically rich, this book makes an important contribution to several areas of scholarship, including the emerging social science field of software studies, and will be a vital resource for students and scholars alike….(More)”