Big Data and the Well-Being of Women and Girls: Applications on the Social Scientific Frontier


Report by Bapu Vaitla et al for Data2X: “Conventional forms of data—household surveys, national economic accounts, institutional records, and so on—struggle to capture detailed information on the lives of women and girls. The many forms of big data, from geospatial information to digital transaction logs to records of internet activity, can help close the global gender data gap. This report profiles several big data projects that quantify the economic, social, and health status of women and girls…

This report illustrates the potential of big data in filling the global gender data gap. The rise of big data, however, does not mean that traditional sources of data will become less important. On the contrary, the successful implementation of big data approaches requires investment in proven methods of social scientific research, especially for validation and bias correction of big datasets. More broadly, the invisibility of women and girls in national and international data systems is a political, not solely a technical, problem. In the best case, the current “data revolution” will be reimagined as a step towards better “data governance”: a process through which novel types of information catalyze the creation of new partnerships to advocate for scientific, policy, and political reforms that include women and girls in all spheres of social and economic life….(More)”.

Software used to predict crime can now be scoured for bias


Dave Gershgorn in Quartz: “Predictive policing, or the idea that software can foresee where crime will take place, is being adopted across the country—despite being riddled with issues. These algorithms have been shown to disproportionately target minorities, and private companies won’t reveal how their software reached those conclusions.

In an attempt to stand out from the pack, predictive-policing startup CivicScape has released its algorithm and data online for experts to scour, according to Government Technology magazine. The company’s Github page is already populated with its code, as well as a variety of documents detailing how its algorithm interprets police data and what variables are included when predicting crime.

“By making our code and data open-source, we are inviting feedback and conversation about CivicScape in the belief that many eyes make our tools better for all,” the company writes on Github. “We must understand and measure bias in crime data that can result in disparate public safety outcomes within a community.”…

CivicScape claims to not use race or ethnic data to make predictions, although it is aware of other indirect indicators of race that could bias its software. The software also filters out low-level drug crimes, which have been found to be heavily biased against African Americans.

While this startup might be the first to publicly reveal the inner machinations of its algorithm and data practices, it’s not an assurance that predictive policing can be made fair and transparent across the board.

“Lots of research is going on about how algorithms can be transparent, accountable, and fair,” the company writes. “We look forward to being involved in this important conversation.”…(More)”.

Seeing Theory


Seeing Theory is a project designed and created by Daniel Kunin with support from Brown University’s Royce Fellowship Program. The goal of the project is to make statistics more accessible to a wider range of students through interactive visualizations.

Statistics is quickly becoming the most important and multi-disciplinary field of mathematics. According to the American Statistical Association, “statistician” is one of the top ten fastest-growing occupations and statistics is one of the fastest-growing bachelor degrees. Statistical literacy is essential to our data driven society. Yet, for all the increased importance and demand for statistical competence, the pedagogical approaches in statistics have barely changed. Using Mike Bostock’s data visualization software, D3.js, Seeing Theory visualizes the fundamental concepts covered in an introductory college statistics or Advanced Placement statistics class. Students are encouraged to use Seeing Theory as an additional resource to their textbook, professor and peers….(More)”

Geospatial big data and cartography: research challenges and opportunities for making maps that matter


International Journal Of Cartography; “Geospatial big data present a new set of challenges and opportunities for cartographic researchers in technical, methodological and artistic realms. New computational and technical paradigms for cartography are accompanying the rise of geospatial big data. Additionally, the art and science of cartography needs to focus its contemporary efforts on work that connects to outside disciplines and is grounded in problems that are important to humankind and its sustainability. Following the development of position papers and a collaborative workshop to craft consensus around key topics, this article presents a new cartographic research agenda focused on making maps that matter using geospatial big data. This agenda provides both long-term challenges that require significant attention and short-term opportunities that we believe could be addressed in more concentrated studies….(More)”.

Big data helps Belfort, France, allocate buses on routes according to demand


 in Digital Trends: “As modern cities smarten up, the priority for many will be transportation. Belfort, a mid-sized French industrial city of 50,000, serves as proof of concept for improved urban transportation that does not require the time and expense of covering the city with sensors and cameras.

Working with Tata Consultancy Services (TCS) and GFI Informatique, the Board of Public Transportation of Belfort overhauled bus service management of the city’s 100-plus buses. The project entailed a combination of ID cards, GPS-equipped card readers on buses, and big data analysis. The collected data was used to measure bus speed from stop to stop, passenger flow to observe when and where people got on and off, and bus route density. From start to finish, the proof of concept project took four weeks.

Using the TCS Intelligent Urban Exchange system, operations managers were able to detect when and where about 20 percent of all bus passengers boarded and got off on each city bus route. Utilizing big data and artificial intelligence the city’s urban planners were able to use that data analysis to make cost-effective adjustments including the allocation of additional buses on routes and during times of greater passenger demand. They were also able to cut back on buses for minimally used routes and stops. In addition, the system provided feedback on the effect of city construction projects on bus service….

Going forward, continued data analysis will help the city budget wisely for infrastructure changes and new equipment purchases. The goal is to put the money where the needs are greatest rather than just spending and then waiting to see if usage justified the expense. The push for smarter cities has to be not just about improved services, but also smart resource allocation — in the Belfort project, the use of big data showed how to do both….(More)”

Watchdog to launch inquiry into misuse of data in politics


, and Alice Gibbs in The Guardian: “The UK’s privacy watchdog is launching an inquiry into how voters’ personal data is being captured and exploited in political campaigns, cited as a key factor in both the Brexit and Trump victories last year.

The intervention by the Information Commissioner’s Office (ICO) follows revelations in last week’s Observer that a technology company part-owned by a US billionaire played a key role in the campaign to persuade Britons to vote to leave the European Union.

It comes as privacy campaigners, lawyers, politicians and technology experts express fears that electoral laws are not keeping up with the pace of technological change.

“We are conducting a wide assessment of the data-protection risks arising from the use of data analytics, including for political purposes, and will be contacting a range of organisations,” an ICO spokeswoman confirmed. “We intend to publicise our findings later this year.”

The ICO spokeswoman confirmed that it had approached Cambridge Analytica over its apparent use of data following the story in the Observer. “We have concerns about Cambridge Analytica’s reported use of personal data and we are in contact with the organisation,” she said….

In the US, companies are free to use third-party data without seeking consent. But Gavin Millar QC, of Matrix Chambers, said this was not the case in Europe. “The position in law is exactly the same as when people would go canvassing from door to door,” Millar said. “They have to say who they are, and if you don’t want to talk to them you can shut the door in their face.That’s the same principle behind the data protection act. It’s why if telephone canvassers ring you, they have to say that whole long speech. You have to identify yourself explicitly.”…

Dr Simon Moores, visiting lecturer in the applied sciences and computing department at Canterbury Christ Church University and a technology ambassador under the Blair government, said the ICO’s decision to shine a light on the use of big data in politics was timely.

“A rapid convergence in the data mining, algorithmic and granular analytics capabilities of companies like Cambridge Analytica and Facebook is creating powerful, unregulated and opaque ‘intelligence platforms’. In turn, these can have enormous influence to affect what we learn, how we feel, and how we vote. The algorithms they may produce are frequently hidden from scrutiny and we see only the results of any insights they might choose to publish.” …(More)”

AI, machine learning and personal data


Jo Pedder at the Information Commissioner’s Office Blog: “Today sees the publication of the ICO’s updated paper on big data and data protection.

But why now? What’s changed in the two and a half years since we first visited this topic? Well, quite a lot actually:

  • big data is becoming the norm for many organisations, using it to profile people and inform their decision-making processes, whether that’s to determine your car insurance premium or to accept/reject your job application;
  • artificial intelligence (AI) is stepping out of the world of science-fiction and into real life, providing the ‘thinking’ power behind virtual personal assistants and smart cars; and
  • machine learning algorithms are discovering patterns in data that traditional data analysis couldn’t hope to find, helping to detect fraud and diagnose diseases.

The complexity and opacity of these types of processing operations mean that it’s often hard to know what’s going on behind the scenes. This can be problematic when personal data is involved, especially when decisions are made that have significant effects on people’s lives. The combination of these factors has led some to call for new regulation of big data, AI and machine learning, to increase transparency and ensure accountability.

In our view though, whilst the means by which the processing of personal data are changing, the underlying issues remain the same. Are people being treated fairly? Are decisions accurate and free from bias? Is there a legal basis for the processing? These are issues that the ICO has been addressing for many years, through oversight of existing European data protection legislation….(More)”

When the Big Lie Meets Big Data


Peter Bruce in Scientific America: “…The science of predictive modeling has come a long way since 2004. Statisticians now build “personality” models and tie them into other predictor variables. … One such model bears the acronym “OCEAN,” standing for the personality characteristics (and their opposites) of openness, conscientiousness, extroversion, agreeableness, and neuroticism. Using Big Data at the individual level, machine learning methods might classify a person as, for example, “closed, introverted, neurotic, not agreeable, and conscientious.”

Alexander Nix, CEO of Cambridge Analytica (owned by Trump’s chief donor, Rebekah Mercer), says he has thousands of data points on you, and every other voter: what you buy or borrow, where you live, what you subscribe to, what you post on social media, etc. At a recent Concordia Summit, using the example of gun rights, Nix described how messages will be crafted to appeal specifically to you, based on your personality profile. Are you highly neurotic and conscientious? Nix suggests the image of a sinister gloved hand reaching through a broken window.

In his presentation, Nix noted that the goal is to induce behavior, not communicate ideas. So where does truth fit in? Johan Ugander, Assistant Professor of Management Science at Stanford, suggests that, for Nix and Cambridge Analytica, it doesn’t. In counseling the hypothetical owner of a private beach how to keep people off his property, Nix eschews the merely factual “Private Beach” sign, advocating instead a lie: “Sharks sighted.” Ugander, in his critique, cautions all data scientists against “building tools for unscrupulous targeting.”

The warning is needed, but may be too late. What Nix described in his presentation involved carefully crafted messages aimed at his target personalities. His messages pulled subtly on various psychological strings to manipulate us, and they obeyed no boundary of truth, but they required humans to create them.  The next phase will be the gradual replacement of human “craftsmanship” with machine learning algorithms that can supply targeted voters with a steady stream of content (from whatever source, true or false) designed to elicit desired behavior. Cognizant of the Pandora’s box that data scientists have opened, the scholarly journal Big Data has issued a call for papers for a future issue devoted to “Computational Propaganda.”…(More)”

Handbook of Big Data Technologies


Handbook by Albert Y. Zomaya and Sherif Sakr: “…offers comprehensive coverage of recent advancements in Big Data technologies and related paradigms.  Chapters are authored by international leading experts in the field, and have been reviewed and revised for maximum reader value. The volume consists of twenty-five chapters organized into four main parts. Part one covers the fundamental concepts of Big Data technologies including data curation mechanisms, data models, storage models, programming models and programming platforms. It also dives into the details of implementing Big SQL query engines and big stream processing systems.  Part Two focuses on the semantic aspects of Big Data management including data integration and exploratory ad hoc analysis in addition to structured querying and pattern matching techniques.  Part Three presents a comprehensive overview of large scale graph processing. It covers the most recent research in large scale graph processing platforms, introducing several scalable graph querying and mining mechanisms in domains such as social networks.  Part Four details novel applications that have been made possible by the rapid emergence of Big Data technologies such as Internet-of-Things (IOT), Cognitive Computing and SCADA Systems.  All parts of the book discuss open research problems, including potential opportunities, that have arisen from the rapid progress of Big Data technologies and the associated increasing requirements of application domains.
Designed for researchers, IT professionals and graduate students, this book is a timely contribution to the growing Big Data field. Big Data has been recognized as one of leading emerging technologies that will have a major contribution and impact on the various fields of science and varies aspect of the human society over the coming decades. Therefore, the content in this book will be an essential tool to help readers understand the development and future of the field….(More)”

Fighting Illegal Fishing With Big Data


Emily Matchar in Smithsonian: “In many ways, the ocean is the Wild West. The distances are vast, the law enforcement agents few and far between, and the legal jurisdiction often unclear. In this environment, illegal activity flourishes. Illegal fishing is so common that experts estimate as much as a third of fish sold in the U.S. was fished illegally. This illegal fishing decimates the ocean’s already dwindling fish populations and gives rise to modern slavery, where fishermen are tricked onto vessels and forced to work, sometimes for years.

A new use of data technology aims to help curb these abuses by shining a light on the high seas. The technology uses ships’ satellite signals to detect instances of transshipment, when two vessels meet at sea to exchange cargo. As transshipment is a major way illegally caught fish makes it into the legal supply chain, tracking it could potentially help stop the practice.

“[Transshipment] really allows people to do something out of sight,” says David Kroodsma, the research program director at Global Fishing Watch, an online data platform launched by Google in partnership with the nonprofits Oceana and SkyTruth. “It’s something that obscures supply chains. It’s basically being able to do things without any oversight. And that’s a problem when you’re using a shared resource like the oceans.”

Global Fishing Watch analyzed some 21 billion satellite signals broadcast by ships, which are required to carry transceivers for collision avoidance, from between 2012 and 2016. It then used an artificial intelligence system it created to identify which ships were refrigerated cargo vessels (known in the industry as “reefers”). They then verified this information with fishery registries and other sources, eventually identifying 794 reefers—90 percent of the world’s total number of such vessels. They tracked instances where a reefer and a fishing vessel were moving at similar speeds in close proximity, labeling these instances as “likely transshipments,” and also traced instances where reefers were traveling in a way that indicated a rendezvous with a fishing vessel, even if no fishing vessel was present—fishing vessels often turn off their satellite systems when they don’t want to be seen. All in all there were more than 90,000 likely or potential transshipments recorded.

Even if these encounters were in fact transshipments, they would not all have been for nefarious purposes. They may have taken place to refuel or load up on supplies. But looking at the patterns of where the potential transshipments happen is revealing. Very few are seen close to the coasts of the U.S., Canada and much of Europe, all places with tight fishery regulations. There are hotspots off the coast of Peru and Argentina, all over Africa, and off the coast of Russia. Some 40 percent of encounters happen in international waters, far enough off the coast that no country has jurisdiction.

The tracked reefers were flying flags from some 40 different countries. But that doesn’t necessarily tell us much about where they really come from. Nearly half of the reefers tracked were flying “flags of convenience,” meaning they’re registered in countries other than where the ship’s owners are from to take advantage of those countries’ lax regulations….(More)”

Read more: http://www.smithsonianmag.com/innovation/fighting-illegal-fishing-big-data-180962321/#7eCwGrGS5v5gWjFz.99
Give the gift of Smithsonian magazine for only $12! http://bit.ly/1cGUiGv
Follow us: @SmithsonianMag on Twitter