Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency


at Medium: “…So why, then, does granular, social data make people uncomfortable? Well, ultimately—and at the risk of stating the obvious—it’s because data of this sort brings up issues regarding ethics, privacy, bias, fairness, and inclusion. In turn, these issues make people uncomfortable because, at least as the popular narrative goes, these are new issues that fall outside the expertise of those those aggregating and analyzing big data. But the thing is, these issues aren’t actually new. Sure, they may be new to computer scientists and software engineers, but they’re not new to social scientists.

This is why I think the world of big data and those working in it — ranging from the machine learning researchers developing new analysis tools all the way up to the end-users and decision-makers in government and industry — can learn something from computational social science….

So, if technology companies and government organizations — the biggest players in the big data game — are going to take issues like bias, fairness, and inclusion seriously, they need to hire social scientists — the people with the best training in thinking about important societal issues. Moreover, it’s important that this hiring is done not just in a token, “hire one social scientist for every hundred computer scientists” kind of way, but in a serious, “creating interdisciplinary teams” kind of kind of way.


Thanks to Moritz Hardt for the picture!

While preparing for my talk, I read an article by Moritz Hardt, entitled “How Big Data is Unfair.” In this article, Moritz notes that even in supposedly large data sets, there is always proportionally less data available about minorities. Moreover, statistical patterns that hold for the majority may be invalid for a given minority group. He gives, as an example, the task of classifying user names as “real” or “fake.” In one culture — comprising the majority of the training data — real names might be short and common, while in another they might be long and unique. As a result, the classic machine learning objective of “good performance on average,” may actually be detrimental to those in the minority group….

As an alternative, I would advocate prioritizing vital social questions over data availability — an approach more common in the social sciences. Moreover, if we’re prioritizing social questions, perhaps we should take this as an opportunity to prioritize those questions explicitly related to minorities and bias, fairness, and inclusion. Of course, putting questions first — especially questions about minorities, for whom there may not be much available data — means that we’ll need to go beyond standard convenience data sets and general-purpose “hammer” methods. Instead we’ll need to think hard about how best to instrument data aggregation and curation mechanisms that, when combined with precise, targeted models and tools, are capable of elucidating fine-grained, hard-to-see patterns….(More).”

MIT to Pioneer Science of Innovation


Irving Wladawsky-Berger in the Wall Street Journal: ““Innovation – identified by MIT economist and Nobel laureate Robert Solow as the driver of long-term, sustainable economic growth and prosperity – has been a hallmark of the Massachusetts Institute of Technology since its inception.” Thus starts The MIT Innovation Initiative: Sustaining and Extending a Legacy of Innovation, the preliminary report of a yearlong effort to define the innovation needed to address some of the world’s most challenging problems. Released earlier this month, the report was developed by the MIT Innovation Initiative, launched a year ago by MIT President Rafael Reif…. Its recommendations are focused on four key priorities.
Strengthen and expand idea-to-impact education and research. Students are asking for career preparation that enables them to make a positive difference early in their careers. Twenty percent of incoming students say that they want to launch a company or NGO during their undergraduate years…
The report includes a number of specific ideas-to-impact recommendations. In education, they include new undergraduate minor programs focused on the engineering, scientific, economic and social dimensions of innovation projects. In research, it calls for supplementing research activities with specific programs designed to extend the work beyond publication with practical solutions, including proof-of-concept grants.
Extend innovation communities. Conversations with students, faculty and other stakeholders uncovered that the process of engaging with MIT’s innovation programs and activities is somewhat fragmented.  The report proposes tighter integration and improved coordinations with three key types of communities:

  • Students and postdocs with shared interests in innovation, including links to appropriate mentors;
  • External partners, focused on linking the MIT groups more closely to corporate partners and entrepreneurs; and
  • Global communities focused on linking MIT with key stakeholders in innovation hubs around the world.

Enhance innovation infrastructures. The report includes a number of recommendations for revitalizing innovation-centric infrastructures in four key areas…..
Pioneer the development of the Science of Innovation. In my opinion, the report’s most important and far reaching recommendation calls for MIT to create a new Laboratory for Innovation Science and Policy –…”
 

Democracy makes itself at home online


Geoff Mulgan on the creation of new parties in 2015 at NESTA: “….On its own the Internet is an imperfect tool for making decisions or shaping options. Opening decisions up to large numbers of people doesn’t automatically make decisions better (the ‘wisdom of crowds’). But in the right circumstances the Internet can involve far more people in shaping policy and sharing their expertise.
Hybrid models that combine the openness of the Internet with a continuing role for parliaments, committees and leaders in making decisions and being held to account are showing great promise (something being pursued in Nesta’s D-CENT project in countries like Finland and Iceland, and in our work with Podemos in Spain).
My prediction is that the aftermath of the UK election will see the first Internet-age parties emerge in the UK, our own versions of Podemos or Democracy OS. My hope is that they will help to engage millions of people currently detached from politics, and to provide them with ways to directly influence ideas and decisions. UKIP has tapped into that alienation – but mainly offers a better yesterday rather than a plausible vision of the future. That leaves a gap for new parties that are more at home in the 21st century and can target a much younger age group.
If new parties do spring up, the old ones will have to respond. Before long open primaries, deliberations on the Internet, and crowd-sourced policy processes could become the norm. As that happens politics will become messier and more interesting. Leaders will have to be adept at responding to contradictory currents of opinion, with more conversation and fewer bland speeches. The huge power once wielded by newspaper owners, commentators and editors will almost certainly continue to decline.
The hope, in short, is that democracy could be reenergised…. (More).

Institutions, Innovation, and Industrialization: Essays in Economic History and Development


Book edited by Avner Greif, Lynne Kiesling & John V. C. Nye: “This book brings together a group of leading economic historians to examine how institutions, innovation, and industrialization have determined the development of nations. Presented in honor of Joel Mokyr—arguably the preeminent economic historian of his generation—these wide-ranging essays address a host of core economic questions. What are the origins of markets? How do governments shape our economic fortunes? What role has entrepreneurship played in the rise and success of capitalism? Tackling these and other issues, the book looks at coercion and exchange in the markets of twelfth-century China, sovereign debt in the age of Philip II of Spain, the regulation of child labor in nineteenth-century Europe, meat provisioning in pre–Civil War New York, aircraft manufacturing before World War I, and more. The book also features an essay that surveys Mokyr’s important contributions to the field of economic history, and an essay by Mokyr himself on the origins of the Industrial Revolution….(More)”

Open policy making in action: Empowering divorcing couples and separating families to create sustainable solutions


at Open Policy Making Blog (UK Cabinet): “Set up in April 2014, Policy Lab brings new tools and techniques, new insights and practical experimentation to policy-making. This second demonstrator project has over the past two months resulted in learning about how policy professionals can work in a more open, user-centred way to engage with others and generate novel solutions to policy issues.
The project, with the Ministry of Justice (MoJ), is concerned with family mediation during divorce and separation….
The main findings from the Lab’s perspective are in three areas.
 Clarifying what user perspectives bring to policy-making.
The project gave us some insights into the potential value of ethnography in policy-making. It was centred around people’s whole experience of divorce or separation, not just their interactions with mediators or lawyers. The research explored what it was like for people now, and the creative activities in the workshop proposed what it could be like for people in the future.  Unexpected insights included that some people going through separation and divorce lacked confidence in their ability to make decisions about their futures.
Using person-centred techniques in the workshop made participants accountable to the users.  Their stories were read, interpreted and discussed at the start. Throughout the workshop, participants repeatedly raised questions about what a proposed new solution might be like for these personas. It was as if these participants were now accountable to these individuals.
Reconstituting the issue of family mediation.
Another result of this project was to shift from seeing policy-making as primarily as the province of the MoJ towards a collective activity in which many actors and different kinds of expertise needed to be involved. The project constituted policy-making as a complex configuration of socio-cultural, organizational and technological actors, processes, data and resources – more of a living system than a mechanical object with inputs, outputs and policy “levers”.
Starting and ending with people’s lives, not government-funded or delivered services, as the driver to innovate.  
Finally, this Lab project looked broadly at people’s lives, not just as users of mediation or court services…. (More)”

Climaps


Climaps: “This website presents the results of the EU research project EMAPS, as well as its process: an experiment to use computation and visualization to harness the increasing availability of digital data and mobilize it for public debate. To do so, EMAPS gathered a team of social and data scientists, climate experts and information designers. It also reached out beyond the walls of Academia and engaged with the actors of the climate debate.

The climate is changing. Efforts to reduce greenhouse emissions have so far been ineffective or, at least, insufficient. As the impacts of global warming are emerging, our societies experience an unprecedented pressure. How to live with climate change without giving up fighting it? How to share the burden of adaptation among countries, regions and communities? How to be fair to all human and non-human beings affected by such a planetary transition? Since our collective life depends on these questions, they deserve discussion, debate and even controversy. To provide some help to navigate in the uncharted territories that lead to our future, here is an electronic atlas. It proposes a series of maps and stories related to climate adaptation issues. They are not exhaustive or error-proof. They are nothing but sketches of the new world in which we will have to live. Such a world remains undetermined and its atlas can be but tentative…(More)”

Making Futures – Marginal Notes on Innovation, Design, and Democracy


Book edited by Pelle Ehn, Elisabet M. Nilsson and Richard Topgaard: “Innovation and design need not be about the search for a killer app. Innovation and design can start in people’s everyday activities. They can encompass local services, cultural production, arenas for public discourse, or technological platforms. The approach is participatory, collaborative, and engaging, with users and consumers acting as producers and creators. It is concerned less with making new things than with making a socially sustainable future. This book describes experiments in innovation, design, and democracy, undertaken largely by grassroots organizations, non-governmental organizations, and multi-ethnic working-class neighborhoods.
These stories challenge the dominant perception of what constitutes successful innovations. They recount efforts at social innovation, opening the production process, challenging the creative class, and expanding the public sphere. The wide range of cases considered include a collective of immigrant women who perform collaborative services, the development of an open-hardware movement, grassroots journalism, and hip-hop performances on city buses. They point to the possibility of democratized innovation that goes beyond solo entrepreneurship and crowdsourcing in the service of corporations to include multiple futures imagined and made locally by often-marginalized publics. (More) “

Senate moves to open-data format


Adam Mazmanian in FCW: “The U.S. Senate will begin making bills and other legislative information available for bulk XML download, following on efforts made by the House of Representatives in 2013. The Senate will include all summary and bill information from the 113th Congress, which just gaveled out, and legislation from the upcoming 114th
This is a very big deal for watchdog groups and private firms that use legislative data to make products for tracking Congress. Before the Senate decision was announced Dec. 18 at a meeting of the Legislative Branch Bulk Data Task Force, users of Senate data had to scrape information from multiple sources. That can be expensive and yield inaccurate data, said Hudson Hollister, executive director of the Data Transparency Coalition and a former Capitol Hill staffer.
“This change by the Senate means a crucial link in the chain will become more reliable. It’s great news for the ecosystem that wants to use government information to deliver transparency and deliver efficiency,” Hollister told FCW.
Having the House and Senate legislation available for bulk download means it’s possible to build services that track legislative language by keyword or topic as bills move through Congress, and follow the  process as bills get marked up in committee, combined with other bills, or amended on the floor. Already a few firms are building services along these lines, such as (Leg)Cyte and Fiscal Note.
Despite the recent move by the Senate, all of Congress isn’t exactly speaking with one voice regarding data standards. In the summer of 2013, the Office of Law Revision Council, which maintains the U.S. Code as new legislation is signed into law, developed an XML information model called U.S. Legislative Markup (USLM) as a way of publishing laws and associated metadata in XML. This model could potentially be adapted for use across Congress for bills, summaries, reports and other information….(More)”

Geneticists Begin Tests of an Internet for DNA


Antonio Regalado in MIT Technology Review: “A coalition of geneticists and computer programmers calling itself the Global Alliance for Genomics and Health is developing protocols for exchanging DNA information across the Internet. The researchers hope their work could be as important to medical science as HTTP, the protocol created by Tim Berners-Lee in 1989, was to the Web.
One of the group’s first demonstration projects is a simple search engine that combs through the DNA letters of thousands of human genomes stored at nine locations, including Google’s server farms and the University of Leicester, in the U.K. According to the group, which includes key players in the Human Genome Project, the search engine is the start of a kind of Internet of DNA that may eventually link millions of genomes together.
The technologies being developed are application program interfaces, or APIs, that let different gene databases communicate. Pooling information could speed discoveries about what genes do and help doctors diagnose rare birth defects by matching children with suspected gene mutations to others who are known to have them.
The alliance was conceived two years ago at a meeting in New York of 50 scientists who were concerned that genome data was trapped in private databases, tied down by legal consent agreements with patients, limited by privacy rules, or jealously controlled by scientists to further their own scientific work. It styles itself after the World Wide Web Consortium, or W3C, a body that oversees standards for the Web.
“It’s creating the Internet language to exchange genetic information,” says David Haussler, scientific director of the genome institute at the University of California, Santa Cruz, who is one of the group’s leaders.
The group began releasing software this year. Its hope—as yet largely unrealized—is that any scientist will be able to ask questions about genome data possessed by other laboratories, without running afoul of technical barriers or privacy rules….(More)”

Will Organization Design Be Affected By Big Data?


Paper by Giles Slinger and Rupert Morrison in the Journal of Organization Design: “Computing power and analytical methods allow us to create, collate, and analyze more data than ever before. When datasets are unusually large in volume, velocity, and variety, they are referred to as “big data.” Some observers have suggested that in order to cope with big data (a) organizational structures will need to change and (b) the processes used to design organizations will be different. In this article, we differentiate big data from relatively slow-moving, linked people data. We argue that big data will change organizational structures as organizations pursue the opportunities presented by big data. The processes by which organizations are designed, however, will be relatively unaffected by big data. Instead, organization design processes will be more affected by the complex links found in people data.”