The death of data science – and rise of the citizen scientist


Ben Rossi at Information Age: “The notion of data science was born from the recent idea that if you have enough data, you don’t need much (if any) science to divine the truth and foretell the future – as opposed to the long-established rigours of statistical or actuarial science, which most times require painstaking efforts and substantial time to produce their version of ‘the truth’. …. Rather than embracing this untested and, perhaps, doomed form of science, and aimlessly searching for unicorns (also known as data scientists) to pay vast sums to, many organisations are now embracing the idea of making everyone data and analytics literate.

This leads me to what my column is really meant to focus on: the rise of the citizen scientist. 

The citizen scientist is not a new idea, having seen action in the space and earth sciences world for decades now, and has really come into its own as we enter the age of open data.

Cometh the hour

Given the exponential growth of open data initiatives across the world – the UK remains the leader, but has growing competition from all locations – the need for citizen scientists is now paramount. 

As governments open up vast repositories of new data of every type, the opportunity for these same governments (and commercial interests) to leverage the passion, skills and collective know-how of citizen scientists to help garner deeper insights into the scientific and civic challenges of the day is substantial. 

They can then take this knowledge and the collective energy of the citizen scientist community to develop common solution sets and applications to meet the needs of all their constituencies without expending much in terms of financial resources or suffering substantial development time lags. 

This can be a windfall of benefits for every level or type of government found around the world. The use of citizen scientists to tackle so-called ‘grand challenge’ problems has been a driving force behind many governments’ commitment to and investment in open data to date. 

There are so many challenges in governing today that it would be foolish not to employ these very capable resources to help tackle them. 

The benefits manifested from this approach are substantial and well proven. Many are well articulated in the open data success stories to date. 

Additionally, you only need to attend a local ‘hack fest’ to see how engaged citizen scientists can be of any age, gender and race, and feel the sense of community that these events foster as everyone focuses on the challenges at hand and works diligently to surmount them using very creative approaches. 

As open data becomes pervasive in use and matures in respect to the breadth and richness of the data sets being curated, the benefits returned to both government and its constituents will be manifold. 

The catalyst to realising these benefits and achieving return on investment will be the role of citizen scientists, which are not going to be statisticians, actuaries or so-called data gurus, but ordinary people with a passion for science and learning and a desire to contribute to solving the many grand challenges facing society at large….(More)

Safecity: Combatting Sexual Violence Through Technology


Safecity, …. is a not for profit organization that provides a platform for people to share their personal stories of sexual harassment and abuse in public spaces. This data, which may be anonymous, gets aggregated as hot spots on a map indicating trends at a local level. The idea is to make this data useful for individuals, local communities and local administration for social and systemic change for safer cities. We launched on 26 Dec 2012 and since then have collected over 4000 stories from over 50 cities in India and Nepal.

How can Safecity help?
Safecity is a crowd map that converts these individual stories into data that is then plotted on a map. It is then easier to see trends at the location level (e.g. a street). The focus is taken away from the individual victim and instead we can focus on solving the problem at the local neighborhood level.

The Objectives:
• Create awareness on street harassment and abuse and get people, especially women, victims of hate and LGBTQ crimes to break their silence and report their personal experiences.
• Collate this information to showcase location based trends.
• Make this information available and useful for individuals, local communities and local administration to solve the problem at the local level through urban planning aimed at addressing infrastructural deficits
• Establish successful models of community engagement using crowd sourced data to solve civic and local issues.
• Reach out to women who do not have equal access to technology through our Missed dial facility for them to report any cases of abuse and harassment.

We wish to take this data forward to lobby for systemic change in terms of urban planning and infrastructure, reforms in our law that are premised on gender equity, and social changes to loosen the shackles that do not allow us otherwise to live the way we want to, with the freedom we want to, and with the rights that are fundamental to all of us, and it will just build our momentum further by having as many passionate, concerned and diverse genders on board.

We are trying to build a movement by collecting these reports through campaigns, workshops and awareness programs with schools, colleges, local communities and partners with shared vision. Crime against women has been rampant and largely remains unreported even till date. That silence needs to gain a voice and the time is now. We are determined to highlight this serious social issue and we believe we are taking a step towards changing the way our society thinks and reacts and are hopeful that so are you. In time we hope it will lead to a safe and non-violent environment for all.

Safecity uses technology to document sexual harassment and abuse in public spaces in the following way. People can report incidents of sexual abuse and street harassment, that they have experienced or witnessed. They can share solutions that can help avoid such situations and decide for themselves what works best for them, their geographic location or circumstances.

By allowing people to pin such incidents on a crowd-sourced map, we aim to let them highlight the “hotspots” of such activities. This accentuates the emerging trend in a particular area, enabling the citizens to acknowledge the problem, take personal precautions and devise a solution at the neighbourhood level.

Safecity believes in uniting millions of voices that can become a catalyst for change.

You can read the FAQs section for more information on how the data is used for public good. (More)”

Social Dimensions of Privacy


New book edited by Dorota Mokrosinska and Beate Roessler: “Written by a select international group of leading privacy scholars, Social Dimensions of Privacy endorses and develops an innovative approach to privacy. By debating topical privacy cases in their specific research areas, the contributors explore the new privacy-sensitive areas: legal scholars and political theorists discuss the European and American approaches to privacy regulation; sociologists explore new forms of surveillance and privacy on social network sites; and philosophers revisit feminist critiques of privacy, discuss markets in personal data, issues of privacy in health care and democratic politics. The broad interdisciplinary character of the volume will be of interest to readers from a variety of scientific disciplines who are concerned with privacy and data protection issues.

  • Takes an innovative approach to privacy which focuses on the social dimensions and value of privacy in contrast to the value of privacy for individuals
  • Addresses readers from a variety of disciplines, including law, philosophy, media studies, gender studies and political science
  • Addresses new privacy-sensitive areas triggered by recent technological developments (More)”

What Is Community Anyway?


David M. Chavis & Kien Lee at Stanford Social Innovation Review: “Community” is so easy to say. The word itself connects us with each other. It describes an experience so common that we never really take time to explain it. It seems so simple, so natural, and so human. In the social sector, we often add it to the names of social innovations as a symbol of good intentions (for example, community mental health, community policing, community-based philanthropy, community economic development).

But the meaning of community is complex. And, unfortunately, insufficient understanding of what a community is and its role in the lives of people in diverse societies has led to the downfall of many well-intended “community” efforts.

Adding precision to our understanding of community can help funders and evaluators identify, understand, and strengthen the communities they work with. There has been a great deal of research in the social sciences about what a human community is (see for example, Chavis and Wandersman, 1990; Nesbit, 1953; Putnam, 2000). Here, we blend that research with our experience as evaluators and implementers of community change initiatives.

It’s about people.

First and foremost, community is not a place, a building, or an organization; nor is it an exchange of information over the Internet. Community is both a feeling and a set of relationships among people. People form and maintain communities to meet common needs….

People live in multiple communities.

Since meeting common needs is the driving force behind the formation of communities, most people identify and participate in several of them, often based on neighborhood, nation, faith, politics, race or ethnicity, age, gender, hobby, or sexual orientation….

Communities are nested within each other.

Just like Russian Matryoshka dolls, communities often sit within other communities. For example, in a neighborhood—a community in and of itself—there may be ethnic or racial communities, communities based on people of different ages and with different needs, and communities based on common economic interests….

Communities have formal and informal institutions.

Communities form institutions—what we usually think of as large organizations and systems such as schools, government, faith, law enforcement, or the nonprofit sector—to more effectively fulfill their needs….

Communities are organized in different ways.

Every community is organized to meet its members’ needs, but they operate differently based on the cultures, religions, and other experiences of their members. For example, while the African American church is generally understood as playing an important role in promoting health education and social justice for that community, not all faith institutions such as the mosque or Buddhist temple are organized and operate in the same way….(More)

A New Source of Data for Public Health Surveillance: Facebook Likes


Paper by Steven Gittelman et al in the Journal of Medical Internet Research: “The development of the Internet and the explosion of social media have provided many new opportunities for health surveillance. The use of the Internet for personal health and participatory health research has exploded, largely due to the availability of online resources and health care information technology applications [18]. These online developments, plus a demand for more timely, widely available, and cost-effective data, have led to new ways epidemiological data are collected, such as digital disease surveillance and Internet surveys [825]. Over the past 2 decades, Internet technology has been used to identify disease outbreaks, track the spread of infectious disease, monitor self-care practices among those with chronic conditions, and to assess, respond, and evaluate natural and artificial disasters at a population level [6,8,11,12,14,15,17,22,2628]. Use of these modern communication tools for public health surveillance has proven to be less costly and more timely than traditional population surveillance modes (eg, mail surveys, telephone surveys, and face-to-face household surveys).

The Internet has spawned several sources of big data, such as Facebook [29], Twitter [30], Instagram [31], Tumblr [32], Google [33], and Amazon [34]. These online communication channels and market places provide a wealth of passively collected data that may be mined for purposes of public health, such as sociodemographic characteristics, lifestyle behaviors, and social and cultural constructs. Moreover, researchers have demonstrated that these digital data sources can be used to predict otherwise unavailable information, such as sociodemographic characteristics among anonymous Internet users [3538]. For example, Goel et al [36] found no difference by demographic characteristics in the usage of social media and email. However, the frequency with which individuals accessed the Web for news, health care, and research was a predictor of gender, race/ethnicity, and educational attainment, potentially providing useful targeting information based on ethnicity and income [36]. Integrating these big data sources into the practice of public health surveillance is vital to move the field of epidemiology into the 21st century as called for in the 2012 US “Big Data Research and Development Initiative” [19,39].

Understanding how big data can be used to predict lifestyle behavior and health-related data is a step toward the use of these electronic data sources for epidemiologic needs…(More)”

CrowdFlower Launches Open Data Project


Anthony Ha at Techcrunch: “Crowdsourcing company CrowdFlower allows businesses to tap into a distributed workforce of 5 million contributors for basic tasks like sentiment analysis. Today it’s releasing some of that data to the public through its new Data for Everyone initiative…. hope is to turn CrowdFlower into a central repository where open data can be found by researchers and entrepreneurs. (Factual was another startup trying to become a hub for open data, though in recent years, it’s become more focused on gathering location data to power mobile ads.)…

As for the data that’s available now, …There’s a lot of Twitter sentiment analysis covering things like from attitudes towards brands and products, yogurt (?), and climate change. Among the more recent data sets, I was particularly taken in the gender breakdown of who’s been on the cover of Time magazine and, yes, the analysis of who thought the dress (you know the one) was gold and white versus blue and black…. (More)”

Pantheon: A Dataset for the Study of Global Cultural Production


Paper by Amy Zhao Yu, Shahar Ronen, Kevin Hu, Tiffany Lu, and César A. Hidalgo: “We present the Pantheon 1.0 dataset: a manually curated dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually curated demographic information (place of birth, date of birth, and gender), (ii) a cultural domain classification categorizing each biography at three levels of aggregation (i.e. Arts/Fine Arts/Painting), and (iii) measures of global visibility (fame) including the number of languages in which a biography is present in Wikipedia, the monthly page-views received by a biography (2008-2013), and a global visibility metric we name the Historical Popularity Index (HPI). We validate our measures of global visibility (HPI and Wikipedia language editions) using external measures of accomplishment in several cultural domains: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of accomplishments and fame (HPI) correlate with an R250, suggesting that measures of global fame are appropriate proxies for measures of accomplishment….(More)

How to Convince Men to Help the Poor


at Pacific Standard: “Please give. It’s a plea we are confronted with constantly, as a variety of charities implore us to help them help the less fortunate.

Whether we get out our checkbook or throw the request in the recycling bin is determined, in part, by the specific way the request is framed. But a new study suggests non-profits might want to create two separate appeals: One aimed at men, and another at women.

A research team led by Stanford University sociologist Robb Willer reports empathy-based appeals tend to be effective with women. But as a rule, men—who traditionally give somewhat less to anti-poverty charities—need to be convinced that their self-interest aligns with that of the campaign.

“Framing poverty as an issue that negatively affects all Americans increased men’s willingness to donate to the cause, eliminating the gender gap,” the researchers write in the journal Social Science Research….

“While this reframing resonated with men, who were otherwise less likely to spontaneously express concern about poverty,” Willer and his colleagues write, “it had the opposite effect for women, who might have felt less motivated to express concern about poverty when doing so seemed inconsistent with feeling empathy for the poor.”…(More)”

The Data Manifesto


Development Initiatives: “Staging a Data Revolution

Accessible, useable, timely and complete data is core to sustainable development and social progress. Access to information provides people with a base to make better choices and have more control over their lives. Too often attempts to deliver sustainable economic, social and environmental results are hindered by the failure to get the right information, in the right format, to the right people, at the right time. Worse still, the most acute data deficits often affect the people and countries facing the most acute problems.

The Data Revolution should be about data grounded in real life. Data and information that gets to the people who need it at national and sub-national levels to help with the decisions they face – hospital directors, school managers, city councillors, parliamentarians. Data that goes beyond averages – that is disaggregated to show the different impacts of decisions, policies and investments on gender, social groups and people living in different places and over time.

We need a Data Revolution that sets a new political agenda, that puts existing data to work, that improves the way data is gathered and ensures that information can be used. To deliver this vision, we need the following steps.


12 steps to a Data Revolution

1.     Implement a national ‘Data Pledge’ to citizens that is supported by governments, private and non-governmental sectors
2.     Address real world questions with joined up and disaggregated data
3.      Empower and up-skill data users of the future through education
4.     Examine existing frameworks and publish existing data
5.     Build an information bank of data assets
6.     Allocate funding available for better data according to national and sub-national priorities
7.     Strengthen national statistical systems’ capacity to collect data
8.     Implement a policy that data is ‘open by default’
9.     Improve data quality by subjecting it to public scrutiny
10.  Put information users’ needs first
11.  Recognise technology cannot solve all barriers to information
12.  Invest in infomediaries’ capacity to translate data into information that policymakers, civil society and the media can actually use…”

Can We Build a Safer Internet?


in the New York Times: “We often take it as a given that the Internet is a cruel place, a natural haven for those who seek to harass and threaten others. But to some people, social networks are not mere conduits for our worst impulses. They’re structures whose design can influence how we behave, for good as well as for ill.

Right now, having a social media account can mean facing down a torrent of harassment — including, for some, attacks that are misogynist, racist or both. “Just as you create a space for people to use something in innovative, creative ways, there are also people who will use it for other means,” Moya Bailey, a postdoctoral fellow at Northeastern University who writes about race, gender and media, told Op-Talk. She mentioned Anita Sarkeesian, the video game critic who has faced harassment for critiquing the portrayal of women in games.

“Because she is doing that work, she becomes a target of a lot of violence and hate,” said Ms. Bailey. The rise of online communication is “a gift and a curse always. It’s always both/and.”

And the way we behave online may depend on which site we’re using. Ms. Bailey cites Tumblr as an example. “I think there’s something about Tumblr that is really attractive to social-justice folks, and the kinds of conversations that people have on Tumblr are very different from what’s possible on Facebook,” she explained. “The platforms themselves help shape the kind of content that people post to those different sites.”

The design of those platforms can also determine who sees what we post. Kate Losse, a writer on technology and culture and a former product manager at Facebook, told Op-Talk that Facebook has widened the scope of some of our conversations.

“Pre-Facebook there would be all these different kinds of interactions you might have socially,” she said. “You might talk to one person, you might talk to three people, you might talk to a hundred people. But Facebook’s interesting because you’re always talking to a hundred people when you post, or more.”

“You have to look at something like Facebook as structuring social interactions,” she added. And interacting via what Ms. Losse called “large-scale announcements” can introduce problems. “The Internet is the classic case of tragedy of the commons,” she said. “If something that’s important to me gets viewed by someone across the world, who has no attachment to me, doesn’t care about me at all, doesn’t have any reason to know me or have empathy for me, it’s much easier for that person to do something hateful with the content than to be respectful of it.”

But if platforms can structure our interactions, can they steer us toward kindness rather than toward bile? Batya Friedman, a professor at the University of Washington’s Information School who studies the relationship between technology and human priorities, thinks it’s possible. “Any time people talk to each other,” she told Op-Talk, “we have all kinds of social norms that check how we say things to each other. We give each other social cues, we tell each other when somebody’s starting to go too far.”

The question for designers of online communities, she said, is “how do we either create virtual norms that are comparable, or how do we represent those things so that people are getting those cues, so they modulate their behavior?”…”