Open Traffic Data to Revolutionize Transport


World Bank: “Congestion in metropolitan Manila costs the economy more than $60 million per day, and it is not atypical to spend more than 2 hours to travel 8 km during the evening commute there. But beyond these statistics, until recently, very little was actually known about Manila’s congestion, because the equipment and manpower required to collect traffic data has far exceeded available resources. Most cities in developing countries face similar issues.

Traditional methods of collecting traffic data rely either on labor-intensive fieldwork or capital-intensive sensor data networks. The former is slow and results in low-quality data, and the latter requires substantial capital and maintenance outlays, while only covering a small portion of a metropolitan area. In the era of big data, shouldn’t we be able to do better?

Responding to this need, Easy Taxi, Grab, and Le.Taxi, three ridesharing  companies—which, combined, cover more than 30 countries and millions of customers—are working with the World Bank and partners to make traffic data derived from their drivers’ GPS streams available to public through an open data license. Through the new Open Transport Partnership, these companies, along with founding members Mapzen, the World Resources Institute, Miovision, and NDrive, will empower resource-constrained transport agencies to make better, evidence-based decisions that previously had been out of reach.

Issues that this data will help address include, among others, traffic signal timing plans, public transit provision, roadway infrastructure needs, emergency traffic management, and travel demand management. According to Alyssa Wright, president of the US Open Street Map Foundation, the partnership “seeks to improve the efficiency and efficacy of global transportation use and provision through open data and capacity building.” …(More)

See also http://opentraffic.io/

Fighting Ebola with information


Larissa Fast and Adele Waugaman at Global Innovation Exchange: What can be learned from the use of data, information, and digital technologies, such as mobile-based systems and internet connectivity, during the Ebola outbreak response in West Africa? What worked, what didn’t, and how can we apply these lessons to improve data and information flows in the future? This report details key findings and recommendations about the collection, management, analysis, and use of paper-based and digital data and information, drawing upon the insights of more than 130 individuals and organizations who worked tirelessly to end the Ebola outbreak in West Africa in 2014 and 2015….(More)”

Crowdsourcing, Citizen Science, and Data-sharing


Sapien Labs: “The future of human neuroscience lies in crowdsourcing, citizen science and data sharing but it is not without its minefields.

A recent Scientific American article by Daniel Goodwin, “Why Neuroscience Needs Hackers,makes the case that neuroscience, like many fields today, is drowning in data, begging for application of advances in computer science like machine learning. Neuroscientists are able to gather realms of neural data, but often without big data mechanisms and frameworks to synthesize them.

The SA article describes the work of Sebastian Seung, a Princeton neuroscientist, who recently mapped the neural connections of the human retina from an “overwhelming mass” of electron microscopy data using state of the art A.I. and massive crowd-sourcing. Seung incorporated the A.I. into a game called “Eyewire” where 1,000s of volunteers scored points while improving the neural map.   Although the article’s title emphasizes advanced A.I., Dr. Seung’s experiment points even more to crowdsourcing and open science, avenues for improving research that have suddenly become easy and powerful with today’s internet. Eyewire perhaps epitomizes successful crowdsourcing — using an application that gathers, represents, and analyzes data uniformly according to researchers’ needs.

Crowdsourcing is seductive in its potential but risky for those who aren’t sure how to control it to get what they want. For researchers who don’t want to become hackers themselves, trying to turn the diversity of data produced by a crowd into conclusive results might seem too much of a headache to make it worthwhile. This is probably why the SA article title says we need hackers. The crowd is there but using it depends on innovative software engineering. A lot of researchers could really use software designed to flexibly support a diversity of crowdsourcing, some AI to enable things like crowd validation and big data tools.

The Potential

The SA article also points to Open BCI (brain-computer interface), mentioned here in other posts, as an example of how traditional divisions between institutional and amateur (or “citizen”) science are now crumbling; Open BCI is a community of professional and citizen scientists doing principled research with cheap, portable EEG-headsets producing professional research quality data. In communities of “neuro-hackers,” like NeurotechX, professional researchers, entrepreneurs, and citizen scientists are coming together to develop all kinds of applications, such as “telepathic” machine control, prostheses, and art. Other companies, like Neurosky sell EEG headsets and biosensors for bio-/neuro-feedback training and health-monitoring at consumer affordable pricing. (Read more in Citizen Science and EEG)

Tan Le, whose company Emotiv Lifesciences, also produces portable EEG head-sets, says, in an article in National Geographic, that neuroscience needs “as much data as possible on as many brains as possible” to advance diagnosis of conditions such as epilepsy and Alzheimer’s. Human neuroscience studies have typically consisted of 20 to 50 participants, an incredibly small sampling of a 7 billion strong humanity. For a single lab to collect larger datasets is difficult but with diverse populations across the planet real understanding may require data not even from thousands of brains but millions. With cheap mobile EEG-headsets, open-source software, and online collaboration, the potential for anyone can participate in such data collection is immense; the potential for crowdsourcing unprecedented. There are, however, significant hurdles to overcome….(More)”

Cancer Research Orgs Release Big Data for Precision Medicine


 at HealthITAnalytics: “The American Association for Cancer Research (AACR) is releasing more than 19,000 de-identified genomic records to further the international research community’s explorations into precision medicine.

The big data dump, which includes information on 59 major types of cancer, including breast, colorectal, and lung cancer, is a result of the AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) initiative, and includes both genomic and some clinical data on consenting patients….

“These data were generated as part of routine patient care and without AACR Project GENIE they would likely never have been shared with the global cancer research community.”

Eight cancer research institutions, including five based in the United States, have contributed to the first phase of the GENIE project.  Dana-Farber Cancer Institute in Boston, Memorial Sloan Kettering Cancer Center in New York City, and the University of Texas MD Anderson Cancer Center in Houston are among the collaborators.

Alongside institutions in Paris, the Netherlands, Toronto, Nashville, and Baltimore, these organizations aim to expand the research community’s knowledge of cancer and its potential treatments by continuing to make the exchange of high-grade clinical data a top priority.

“We are committed to sharing not only the real-world data within the AACR Project GENIE registry but also our best practices, from tips about assembling an international consortium to the best variant analysis pipeline, because only by working together will information flow freely and patients benefit rapidly,” Sawyers added…

Large-scale initiatives like the AACR Project GENIE, alongside separate data collection efforts like the VA’s Million Veterans Project, the CancerLinQ platform, Geisinger Health System’s MyCode databank, and the nascent PMI Cohort, will continue to make critical genomic and clinical data available to investigators across the country and around the world…(More)”.

Beyond IRBs: Designing Ethical Review Processes for Big Data Research


Conference Proceedings by Future of Privacy Forum: “The ethical framework applying to human subject research in the biomedical and behavioral research fields dates back to the Belmont Report.Drafted in 1976 and adopted by the United States government in 1991 as the Common Rule, the Belmont principles were geared towards a paradigmatic controlled scientific experiment with a limited population of human subjects interacting directly with researchers and manifesting their informed consent. These days, researchers in academic institutions as well as private sector businesses not subject to the Common Rule, conduct analysis of a wide array of data sources, from massive commercial or government databases to individual tweets or Facebook postings publicly available online, with little or no opportunity to directly engage human subjects to obtain their consent or even inform them of research activities.

Data analysis is now used in multiple contexts, such as combatting fraud in the payment card industry, reducing the time commuters spend on the road, detecting harmful drug interactions, improving marketing mechanisms, personalizing the delivery of education in K-12 schools, encouraging exercise and weight loss, and much more. And companies deploy data research not only to maximize economic gain but also to test new products and services to ensure they are safe and effective. These data uses promise tremendous societal benefits but at the same time create new risks to privacy, fairness, due process and other civil liberties.

Increasingly, corporate officers find themselves struggling to navigate unsettled social norms and make ethical choices that are more befitting of philosophers than business managers or even lawyers. The ethical dilemmas arising from data analysis transcend privacy and trigger concerns about stigmatization, discrimination, human subject research, algorithmic decision making and filter bubbles.

The challenge of fitting the round peg of data-focused research into the square hole of existing ethical and legal frameworks will determine whether society can reap the tremendous opportunities hidden in the data exhaust of governments and cities, health care institutions and schools, social networks and search engines, while at the same time protecting privacy, fairness, equality and the integrity of the scientific process. One commentator called this “the biggest civil rights issue of our time.”…(More)”

Group Privacy: New Challenges of Data Technologies


Book edited by Linnet Taylor, Luciano Floridi,, and Bart van der Sloot: “The goal of the book is to present the latest research on the new challenges of data technologies. It will offer an overview of the social, ethical and legal problems posed by group profiling, big data and predictive analysis and of the different approaches and methods that can be used to address them. In doing so, it will help the reader to gain a better grasp of the ethical and legal conundrums posed by group profiling. The volume first maps the current and emerging uses of new data technologies and clarifies the promises and dangers of group profiling in real life situations. It then balances this with an analysis of how far the current legal paradigm grants group rights to privacy and data protection, and discusses possible routes to addressing these problems. Finally, an afterword gathers the conclusions reached by the different authors and discuss future perspectives on regulating new data technologies….(More and Table of Contents)

Big Data and the Paradox of Diversity


Bernhard Rieder at Digital Culture & Society: “This paper develops a critique of Big Data and associated analytical techniques by focusing not on errors – skewed or imperfect datasets, false positives, underrepresentation, and so forth – but on data mining that works. After a quick framing of these practices as interested readings of reality, I address the question of how data analytics and, in particular, machine learning reveal and operate on the structured and unequal character of contemporary societies, installing “economic morality” (Allen 2012) as the central guiding principle. Rather than critiquing the methods behind Big Data, I inquire into the way these methods make the many differences in decentred, non-traditional societies knowable and, as a consequence, ready for profitable distinction and decision-making. The objective, in short, is to add to our understanding of the “profound ideological role at the intersection of sociality, research, and commerce” (van Dijck 2014: 201) the collection and analysis of large quantities of multifarious data have come to play. Such an understanding needs to embed Big Data in a larger, more fundamental critique of the societal context it operates in….(More)”.

Group Privacy in Times of Big Data. A Literature Review


Paula Helm at Digital Culture & Society: “New technologies pose new challenges on the protection of privacy and they stimulate new debates on the scope of privacy. Such debates usually concern the individuals’ right to control the flow of his or her personal information. The article however discusses new challenges posed by new technologies in terms of their impact on groups and their privacy. Two main challenges are being identified in this regard, both having to do with the formation of groups through the involvement of algorithms and the lack of civil awareness regarding the consequences of this involvement. On the one hand, there is the phenomenon of groups being created on the basis of big data without the members of such groups being aware of having been assigned and being treated as part of a certain group. Here, the challenge concerns the limits of personal law, manifesting with the disability of individuals to address possible violations of their right to privacy since they are not aware of them. On the other hand, commercially driven Websites influence the way in which groups form, grow and communicate when doing this online and they do this in such subtle way, that members oftentimes do not take into account this influence. This is why one could speak of a kind of domination here, which calls for legal regulation. The article presents different approaches addressing and dealing with those two challenges, discussing their strengths and weaknesses. Finally, a conclusion gathers the insights reached by the different approaches discussed and reflects on future challenges for further research on group privacy in times of big data….(More)”

Using Geodata and Geolocation in the Social Sciences: Mapping our Connected World


Book by David Abernathy: “Big data is upon us. With the ‘internet of things’ now a reality, social scientists must get to grips with the complex network of location-based data in order to ask questions and address problems in an increasingly networked, globalizing world.

Using Geodata and Geolocation in the Social Sciences: Mapping our Connected World provides an engaging and accessible introduction to the Geoweb with clear, step-by-step guides for:

  • capturing Geodata from sources including GPS, sensor networks, and Twitter
  • visualizing Geodata using programmes including QGIS, GRASS and R

Packed with colour images and practical exercises, this book is the perfect guide for students and researchers looking to incorporate location-based data into their social science research….(More) (Companion Website)”

Artificial Intelligence Could Help Colleges Better Plan What Courses They Should Offer


Jeffrey R. Young at EdSsurge: Big data could help community colleges better predict how industries are changing so they can tailor their IT courses and other programs. After all, if Amazon can forecast what consumers will buy and prestock items in their warehouses to meet the expected demand, why can’t colleges do the same thing when planning their curricula, using predictive analytics to make sure new degree or certificates programs are started just in time for expanding job opportunities?

That’s the argument made by Gordon Freedman, president of the nonprofit National Laboratory for Education Transformation. He’s part of a new center that will do just that, by building a data warehouse that brings together up-to-date information on what skills employers need and what colleges currently offer—and then applying artificial intelligence to attempt to predict when sectors or certain employment needs might be expanding.

He calls the approach “opportunity engineering,” and the center boasts some heavy-hitting players to assist in the efforts, including the University of Chicago, the San Diego Supercomputing Center and Argonne National Laboratory. It’s called the National Center for Opportunity Engineering & Analysis.

Ian Roark, vice president of workforce development at Pima Community College in Arizona, is among those eager for this kind of “opportunity engineering” to emerge.

He explains when colleges want to start new programs, they face a long haul—it takes time to develop a new curriculum, put it through an internal review, and then send it through an accreditor….

Other players are already trying to translate the job market into a giant data set to spot trends. LinkedIn sits on one of the biggest troves of data, with hundreds of millions of job profiles, and ambitions to create what it calls the “economic graph” of the economy. But not everyone is on LinkedIn, which attracts mainly those in white-collar jobs. And companies such as Burning Glass Technologies have scanned hundreds of thousands of job listings and attempt to provide real-time intelligence on what employers say they’re looking for. Those still don’t paint the full picture, Freedman argues, such as what jobs are forming at companies.

“We need better information from the employer, better information from the job seeker and better information from the college, and that’s what we’re going after,” Freedman says…(More)”.