Researcher uncovers inherent biases of big data collected from social media sites


Phys.org: “With every click, Facebook, Twitter and other social media users leave behind digital traces of themselves, information that can be used by businesses, government agencies and other groups that rely on “big data.”

But while the information derived from social network sites can shed light on social behavioral traits, some analyses based on this type of data collection are prone to bias from the get-go, according to new research by Northwestern University professor Eszter Hargittai, who heads the Web Use Project.

Since people don’t randomly join Facebook, Twitter or LinkedIn—they deliberately choose to engage —the data are potentially biased in terms of demographics, socioeconomic background or Internet skills, according to the research. This has implications for businesses, municipalities and other groups who use because it excludes certain segments of the population and could lead to unwarranted or faulty conclusions, Hargittai said.

The study, “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” was published last month in the journal The Annals of the American Academy of Political and Social Science and is part of a larger, ongoing study.

The buzzword “big data” refers to automatically generated information about people’s behavior. It’s called “big” because it can easily include millions of observations if not more. In contrast to surveys, which require explicit responses to questions, big data is created when people do things using a service or system.

“The problem is that the only people whose behaviors and opinions are represented are those who decided to join the site in the first place,” said Hargittai, the April McClain-Delaney and John Delaney Professor in the School of Communication. “If people are analyzing big data to answer certain questions, they may be leaving out entire groups of people and their voices.”

For example, a city could use Twitter to collect local opinion regarding how to make the community more “age-friendly” or whether more bike lanes are needed. In those cases, “it’s really important to know that people aren’t on Twitter randomly, and you would only get a certain type of person’s response to the question,” said Hargittai.

“You could be missing half the population, if not more. The same holds true for companies who only use Twitter and Facebook and are looking for feedback about their products,” she said. “It really has implications for every kind of group.”…

More information: “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites” The Annals of the American Academy of Political and Social Science May 2015 659: 63-76, DOI: 10.1177/0002716215570866

Transforming Government Information


Sharyn Clarkson at the (Interim) Digital Transformation Office (Australia): “Our challenge: How do we get the right information and services to people when and where they need it?

The public relies on Government for a broad range of information – advice for individuals and businesses, what services are available and how to access them, and how various rules and laws impact our lives.

The government’s digital environment has grown organically over the last couple of decades. At the moment, information is largely created and managed within agencies and published across more than 1200 disparate gov.au websites, plus a range of social media accounts, apps and other digital formats.

This creates some difficulties for people looking for government information. By publishing within agency silos we are presenting people with an agency-centric view of government information. This is a problem because people largely don’t understand or care about how government organises itself and the structure of government does not map to the needs of people. Having a baby or travelling overseas? Up to a dozen government agencies may have information relevant to you. And as people’s needs span more than one agency, they end up with a disjointed and confusing user experience as they have to navigate across disparate government sites. And even if you begin at your favourite search engine how do you know which of the many government search results is the right place to start?

There are two government entry points already in place to help users – Australia.gov.au and business.gov.au – but they largely act as an umbrella across the 1200+ sites and currently only provide a very thin layer of whole of government information and mainly refer people off to other websites.

The establishment of the DTO has provided the first opportunity for people to come together and better understand how our underlying structural landscape is impacting people’s experience with government. It’s also given us an opportunity to take a step back and ask some of the big questions about how we manage information and what problems can only really be solved through whole of government transformation.

How do we make information and services easier to find? How do we make sure we provide information that people can trust and rely upon at times of need? How should the gov.au landscape be organised to make it easier for us to meet user’s needs and expectations? How many websites should we have – assuming 1200 is too many? What makes up a better user experience – does it mean all sites should look and feel the same? How can we provide government information at the places people naturally go looking for assistance – even if these are not government sites?

As we asked these questions we started to come across some central ideas:

  • What if we could decouple the authoring and management of information from the publishing process, so the subject experts in government still manage their content but we have flexibility to present it in more user-centric ways?
  • What if we unleashed government information? Making it possible for state and local governments, non-profit groups and businesses to deliver content and services alongside their own information to give better value users.
  • Should we move the bureaucratic content (information about agencies and how they are managed such as annual reports, budget statements and operating rules) out of the way of core content and services for people? Can we simplify our environment and base it around topics and life events instead of agencies? What if we had people in government responsible for curating these topics and life events across agencies and creating simpler pathways for users?…(More)”

Forging Trust Communities: How Technology Changes Politics


Book by Irene S. Wu: “Bloggers in India used social media and wikis to broadcast news and bring humanitarian aid to tsunami victims in South Asia. Terrorist groups like ISIS pour out messages and recruit new members on websites. The Internet is the new public square, bringing to politics a platform on which to create community at both the grassroots and bureaucratic level. Drawing on historical and contemporary case studies from more than ten countries, Irene S. Wu’s Forging Trust Communities argues that the Internet, and the technologies that predate it, catalyze political change by creating new opportunities for cooperation. The Internet does not simply enable faster and easier communication, but makes it possible for people around the world to interact closely, reciprocate favors, and build trust. The information and ideas exchanged by members of these cooperative communities become key sources of political power akin to military might and economic strength.

Wu illustrates the rich world history of citizens and leaders exercising political power through communications technology. People in nineteenth-century China, for example, used the telegraph and newspapers to mobilize against the emperor. In 1970, Taiwanese cable television gave voice to a political opposition demanding democracy. Both Qatar (in the 1990s) and Great Britain (in the 1930s) relied on public broadcasters to enhance their influence abroad. Additional case studies from Brazil, Egypt, the United States, Russia, India, the Philippines, and Tunisia reveal how various technologies function to create new political energy, enabling activists to challenge institutions while allowing governments to increase their power at home and abroad.

Forging Trust Communities demonstrates that the way people receive and share information through network communities reveals as much about their political identity as their socioeconomic class, ethnicity, or religion. Scholars and students in political science, public administration, international studies, sociology, and the history of science and technology will find this to be an insightful and indispensable work…(More)”

Exploring Open Energy Data in Urban Areas


The Worldbank: “…Energy efficiency – using less energy input to deliver the same level of service – has been described by many as the ‘first fuel’ of our societies. However, lack of adequate data to accurately predict and measure energy efficiency savings, particularly at the city level, has limited the realization of its promise over the past two decades.
Why Open Energy Data?
Open Data can be a powerful tool to reduce information asymmetry in markets, increase transparency and help achieve local economic development goals. Several sectors like transport, public sector management and agriculture have started to benefit from Open Data practices. Energy markets are often characterized by less-than-optimal conditions with high system inefficiencies, misaligned incentives and low levels of transparency. As such, the sector has a lot to potentially gain from embracing Open Data principles.
The United States is a leader in this field with its ‘Energy Data’ initiative. This initiative makes data easy to find, understand and apply, helping to fuel a clean energy economy. For example, the Energy Information Administration’s (EIA) open application programming interface (API) has more than 1.2 million time series of data and is frequently visited by users from the private sector, civil society and media. In addition, the Green Button  initiative is empowering American citizens to have access to their own energy usage data, and OpenEI.org is an Open Energy Information platform to help people find energy information, share their knowledge and connect to other energy stakeholders.
Introducing the Open Energy Data Assessment
To address this data gap in emerging and developing countries, the World Bank is conducting a series of Open Energy Data Assessments in urban areas. The objective is to identify important energy-related data, raise awareness of the benefits of Open Data principles and improve the flow of data between traditional energy stakeholders and others interested in the sector.
The first cities we assessed were Accra, Ghana and Nairobi, Kenya. Both are among the fastest-growing cities in the world, with dynamic entrepreneurial and technology sectors, and both are capitals of countries with an ongoing National Open Data Initiative., The two cities have also been selected to be part of the Negawatt Challenge, a World Bank international competition supporting technology innovation to solve local energy challenges.
The ecosystem approach
The starting point for the exercise was to consider the urban energy sector as an ecosystem, comprised of data suppliers, data users, key datasets, a legal framework, funding mechanisms, and ICT infrastructure. The methodology that we used adapted the established World Bank Open Data Readiness Assessment (ODRA), which highlights valuable connections between data suppliers and data demand.  The assessment showcases how to match pressing urban challenges with the opportunity to release and use data to address them, creating a longer-term commitment to the process. Mobilizing key stakeholders to provide quick, tangible results is also key to this approach….(More) …See also World Bank Open Government Data Toolkit.”

Flawed Humans, Flawed Justice


Adam Benforado in the New York Times  on using …”lessons from behavioral science to make police and courts more fair…. WHAT would it take to achieve true criminal justice in America?

Imagine that we got rid of all of the cops who cracked racist jokes and prosecutors blinded by a thirst for power. Imagine that we cleansed our courtrooms of lying witnesses and foolish jurors. Imagine that we removed every judge who thought the law should bend to her own personal agenda and every sadistic prison guard.

We would certainly feel just then. But we would be wrong.

We would still have unarmed kids shot in the back and innocent men and women sentenced to death. We would still have unequal treatment, disregarded rights and profound mistreatment.

The reason is simple and almost entirely overlooked: Our legal system is based on an inaccurate model of human behavior. Until recently, we had no way of understanding what was driving people’s thoughts, perceptions and actions in the criminal arena. So, we built our institutions on what we had: untested assumptions about what deceit looks like, how memories work and when punishment is merited.

But we now have tools — from experimental methods and data collection approaches to brain-imaging technologies — that provide an incredible opportunity to establish a new and robust foundation.

Our justice system must be reconstructed upon scientific fact. We can start by acknowledging what the data says about the fundamental flaws in our current legal processes and structures.

Consider the evidence that we treat as nearly unassailable proof of guilt at trial — an unwavering eyewitness, a suspect’s signed confession or a forensic match to the crime scene.

While we charge tens of thousands of people with crimes each year after they are identified in police lineups, research shows that eyewitnesses chose an innocent person roughly one-third of the time. Our memories can fail us because we’re frightened. They can be altered by the word choice of a detective. They can be corrupted by previously seeing someone’s image on a social media site.

Picking out lying suspects from their body language is ineffective. And trying then to gain a confession by exaggerating the strength of the evidence and playing down the seriousness of the offense can encourage people to admit to terrible things they didn’t do.

Even seemingly objective forensic analysis is far from incorruptible. Recent data shows that fingerprint — and even DNA — matches are significantly more likely when the forensic expert is aware that the sample comes from someone the police believe is guilty.

With the aid of psychology, we see there’s a whole host of seemingly extraneous forces influencing behavior and producing systematic distortions. But they remain hidden because they don’t fit into our familiar legal narratives.

We assume that the specific text of the law is critical to whether someone is convicted of rape, but research shows that the details of the criminal code — whether it includes a “force” requirement or excuses a “reasonably mistaken” belief in consent — can be irrelevant. What matters are the backgrounds and identifies of the jurors.

When a black teenager is shot by a police officer, we expect to find a bigot at the trigger.

But studies suggest that implicit bias, rather than explicit racism, is behind many recent tragedies. Indeed, simulator experiments show that the biggest danger posed to young African-American men may not be hate-filled cops, but well-intentioned police officers exposed to pervasive, damaging stereotypes that link the concepts of blackness and violence.

Likewise, Americans have been sold a myth that there are two kinds of judges — umpires and activists — and that being unbiased is a choice that a person makes. But the truth is that all judges are swayed by countless forces beyond their conscious awareness or control. It should have no impact on your case, for instance, whether your parole hearing is scheduled first thing in the morning or right before lunch, but when scientists looked at real parole boards, they found that judges were far more likely to grant petitions at the beginning of the day than they were midmorning.

The choice of where to place the camera in an interrogation room may seem immaterial, yet experiments show that it can affect whether a confession is determined to be coerced. When people watch a recording with the camera behind the detective, they are far more likely to find that the confession was voluntary than when watching the interactions from the perspective of the suspect.

With such challenges to our criminal justice system, what can possibly be done? The good news is that an evidence-based approach also illuminates the path forward.

Once we have clear data that something causes a bias, we can then figure out how to remove that influence. …(More)

The Civic Organization and the Digital Citizen


New book by Chris Wells: “The powerful potential of digital media to engage citizens in political actions has now crossed our news screens many times. But scholarly focus has tended to be on “networked,” anti-institutional forms of collective action, to the neglect of advocacy and service organizations. This book investigates the changing fortunes of the citizen-civil society relationship by exploring how social changes and innovations in communication technology are transforming the information expectations and preferences of many citizens, especially young citizens. In doing so, it is the first work to bring together theories of civic identity change with research on civic organizations. Specifically, it argues that a shift in “information styles” may help to explain the disjuncture felt by many young people when it comes to institutional participation and politics.

The book theorizes two paradigms of information style: a dutiful style, which was rooted in the society, communication system and citizen norms of the modern era, and an actualizing style, which constitutes the set of information practices and expectations of the young citizens of late modernity for whom interactive digital media are the norm. Hypothesizing that civil society institutions have difficulty adapting to the norms and practices of the actualizing information style, two empirical studies apply the dutiful/actualizing framework to innovative content analyses of organizations’ online communications-on their websites, and through Facebook. Results demonstrate that with intriguing exceptions, most major civil society organizations use digital media more in line with dutiful information norms than actualizing ones: they tend to broadcast strategic messages to an audience of receivers, rather than encouraging participation or exchange among an active set of participants. The book concludes with a discussion of the tensions inherent in bureaucratic organizations trying to adapt to an actualizing information style, and recommendations for how they may more successfully do so….(More)”

The Code Issue: Special Multi-platform Package on Demystifying Code


 

6.15.15 newsstand (25)

“Bloomberg Businessweek  released The Code Issue, a special double issue containing a single essay by writer and programmer Paul Ford. ….

Code directs the fate of everything from media to e-commerce to banking, and is arguably the most important phenomenon for the twenty-first century businessperson to understand. Yet it remains an intimidating mystery to most execs. In The Code Issue introduction, Bloomberg Businessweek editor Josh Tyrangiel writes, “Software has been around since the 1940s. Which means that people have been faking their way through meetings about software, and the code that builds it, for generations… ignorance is no longer acceptable.”

Tyrangiel says of The Code Issue, “There’s some technical language along with a few pretty basic mathematical concepts. There are also lots of solid jokes and lasting insights. It may take a few hours to read, but that’s a small price to pay for adding decades to your career.”

Chapters in The Code Issue include:

  • From Hardware to Software and How Does Code Become Software?
  • What Is an Algorithm?
  • What’s With All These Conferences, Anyway? (and why are there so many men in this field and why is it so hard for them to be in groups with female programmers and behave in a typical, adult way?)
  • Why Are Programmers So Intense About Languages?
  • What Do Different Languages Do? and The Importance of C
  • Why Are Coders Angry?
  • The Legend of the 10X Programmer (which details the accoutrements of the coder)
  • The Time You Attended the E-mail Address Validation Meeting
  • The Language of White Collars
  • Briefly on the Huge Subject Of Microsoft
  • What About JavaScript?
  • How Are Apps Made?
  • What Is Debugging?
  • Managing Programmers
  • Should You Learn to Code?

….An animated and interactive treatment of the essay allows web and mobile readers to dive deeper into code, to manipulate it and see the results. Among the demos and widgets are tinder for code, a fun Easter egg, and a certificate of completion you can share with friends. The code for the “What Is Code?” essay has been published on GitHub.”

How Crowdsourcing Can Help Us Fight ISIS


 at the Huffington Post: “There’s no question that ISIS is gaining ground. …So how else can we fight ISIS? By crowdsourcing data – i.e. asking a relevant group of people for their input via text or the Internet on specific ISIS-related issues. In fact, ISIS has been using crowdsourcing to enhance its operations since last year in two significant ways. Why shouldn’t we?

First, ISIS is using its crowd of supporters in Syria, Iraq and elsewhere to help strategize new policies. Last December, the extremist group leveraged its global crowd via social media to brainstorm ideas on how to kill 26-year-old Jordanian coalition fighter pilot Moaz al-Kasasba. ISIS supporters used the hashtag “Suggest a Way to Kill the Jordanian Pilot Pig” and “We All Want to Slaughter Moaz” to make their disturbing suggestions, which included decapitation, running al-Kasasba over with a bulldozer and burning him alive (which was the winner). Yes, this sounds absurd and was partly a publicity stunt to boost ISIS’ image. But the underlying strategy to crowdsource new strategies makes complete sense for ISIS as it continues to evolve – which is what the US government should consider as well.

In fact, in February, the US government tried to crowdsource more counterterrorism strategies. Via its official blog, DipNote, the State Departmentasked the crowd – in this case, US citizens – for their suggestions for solutions to fight violent extremism. This inclusive approach to policymaking was obviously important for strengthening democracy, with more than 180 entries posted over two months from citizens across the US. But did this crowdsourcing exercise actually improve US strategy against ISIS? Not really. What might help is if the US government asked a crowd of experts across varied disciplines and industries about counterterrorism strategies specifically against ISIS, also giving these experts the opportunity to critique each other’s suggestions to reach one optimal strategy. This additional, collaborative, competitive and interdisciplinary expert insight can only help President Obama and his national security team to enhance their anti-ISIS strategy.

Second, ISIS has been using its crowd of supporters to collect intelligence information to better execute its strategies. Since last August, the extremist group has crowdsourced data via a Twitter campaign specifically on Saudi Arabia’s intelligence officials, including names and other personal details. This apparently helped ISIS in its two suicide bombing attacks during prayers at a Shite mosque last month; it also presumably helped ISIS infiltrate a Saudi Arabian border town via Iraq in January. This additional, collaborative approach to intelligence collection can only help President Obama and his national security team to enhance their anti-ISIS strategy.

In fact, last year, the FBI used crowdsourcing to spot individuals who might be travelling abroad to join terrorist groups. But what if we asked the crowd of US citizens and residents to give us information specifically on where they’ve seen individuals get lured by ISIS in the country, as well as on specific recruitment strategies they may have noted? This might also lead to more real-time data points on ISIS defectors returning to the US – who are they, why did they defect and what can they tell us about their experience in Syria or Iraq? Overall, crowdsourcing such data (if verifiable) would quickly create a clearer picture of trends in recruitment and defectors across the country, which can only help the US enhance its anti-ISIS strategies.

This collaborative approach to data collection could also be used in Syria and Iraq with texts and online contributions from locals helping us to map ISIS’ movements….(More)”

The Tragedy of the Digital Commons


J. Nathan Matias in the Atlantic “….Milland and other regular Turkers navigate this precariously free market withTurkopticon, a DIY technology for rating employers created in 2008. To use it, workers install a browser plugin that extends Amazon’s website with special rating features. Before accepting a new task, workers check how others have rated the employer. After finishing, they can also leave their own rating of how well they were treated. Collective rating on Turkopticon is an act of citizenship in the digital world. This digital citizenship acknowledges that online experiences are as much a part of our common life as our schools, sidewalks, and rivers—requiring as much stewardship, vigilance, and improvement as anything else we share.

“How do you fix a broken system that isn’t yours to repair?” That’s the question that motivated the researchers Lilly Irani and Six Silberman to create Turkopticon, and it’s one that comes up frequently in digital environments dominated by large platforms with hands-off policies. (On social networks like Twitter, for example, harassment is a problem for many users.) Irani and Silberman describe Turkopticon as a “mutual aid for accountability” technology, a system that coordinates peer support to hold others accountable when platforms choose not to step in.

Mutual aid accountability is a growing response to the complex social problems people face online. On Twitter, systems like The Block Bot and BlockTogether coordinate collective judgments about alleged online harassers. The systems then collectively block tweets from accounts that a group prefers not to hear from. Last month, the advocacy organization Hollaback raised over $20,000 on Kickstarter to create support networks for people experiencing harassment. In November, I worked with the advocacy organization Women, Action, and the Media, which took a role as “authorized reporter” with Twitter. For three weeks WAM! accepted reports, sorted evidence, and forwarded serious cases to Twitter. In response, the company warned, suspended, and deleted the accounts of many alleged harassers.
These mutual aid technologies operate in the shadow of larger systems with gaps in how people are supported—even when platforms do step in, says Stuart Geiger, a Berkeley Ph.D. student. In other words, sometimes a platform’s system-wide solutions to a problem can create their own problems. For several years, Geiger and his colleague Aaron Halfaker, now a researcher at Wikimedia, were concerned that Wikipedia’s semi-automated anti-vandalism systems might be making the site unfriendly. As a graduate student unable to change Wikipedia’s code, Halfaker created Snuggle, a mutual-aid mentorship technology that tracks the site’s spam responders. When Snuggle users think a newcomer’s edits were mistakenly flagged as spam, the software coordinates Wikipedians to help those users recover from the negative experience of getting revoked.

By organizing peer support at scale, the designers of Turkopticon and its cousins draw attention to common problems, hoping to influence longer-term change on a complex issue. In time, the idea goes, requesters on Mechanical Turk might change their treatment of workers, Amazon might change its policies and software, or regulators might set new rules for digital labor. This is an approach with a long history in an area that might seem unlikely: the conservation movement. (Silberman and Irani cite the movement as inspiration for Turkopticon.)

To better understand how this approach might influence digital citizenship, I followed the history of mutual-aid accountability in a precious common network that the city of Boston enjoys every day: the Charles River. Planned, re-routed, exploited and contested, it has inspired and supported human life since before written history….(More)”