big data

Crowdbreaks: Tracking Health Trends using Public Social Media Data and Crowdsourcing

Curated on May 19, 2018November 15, 2018 by admin

Paper by Martin Mueller and Marcel Salath: “In the past decade, tracking health trends using social media data has shown great promise, due to a powerful combination of massive adoption of social media around the world, and increasingly potent hardware and software that enables us to work with these new big data streams.

At the same time, many challenging problems have been identified. First, there is often a mismatch between how rapidly online data can change, and how rapidly algorithms are updated, which means that there is limited reusability for algorithms trained on past data as their performance decreases over time. Second, much of the work is focusing on specific issues during a specific past period in time, even though public health institutions would need flexible tools to assess multiple evolving situations in real time. Third, most tools providing such capabilities are proprietary systems with little algorithmic or data transparency, and thus little buy-in from the global public health and research community.

Here, we introduce Crowdbreaks, an open platform which allows tracking of health trends by making use of continuous crowdsourced labelling of public social media content. The system is built in a way which automatizes the typical workflow from data collection, filtering, labelling and training of machine learning classifiers and therefore can greatly accelerate the research process in the public health domain. This work introduces the technical aspects of the platform and explores its future use cases…(More)”.

Behavioral economics from nuts to ‘nudges’

Curated on May 16, 2018November 15, 2018 by admin

Richard Thaler at ChicagoBoothReview: “…Behavioral economics has come a long way from my initial set of stories. Behavioral economists of the current generation are using all the modern tools of economics, from theory to big data to structural models to neuroscience, and they are applying those tools to most of the domains in which economists practice their craft. This is crucial to making descriptive economics more accurate. As the last section of this lecture highlighted, they are also influencing public-policy makers around the world, with those in the private sector not far behind. Sunstein and I did not invent nudging—we just gave it a word. People have been nudging as long as they have been trying to influence other people.

And much as we might wish it to be so, not all nudging is nudging for good. The same passive behavior we saw among Swedish savers applies to nearly everyone agreeing to software terms, or mortgage documents, or car payments, or employment contracts. We click “agree” without reading, and can find ourselves locked into a long-term contract that can only be terminated with considerable time and aggravation, or worse. Some firms are actively making use of behaviorally informed strategies to profit from the lack of scrutiny most shoppers apply. I call this kind of exploitive behavior “sludge.” It is the exact opposite of nudging for good. But whether the use of sludge is a long-term profit-maximizing strategy remains to be seen. Creating the reputation as a sludge-free supplier of goods and services may be a winning long-term strategy, just like delivering free bottles of water to victims of a hurricane.

Although not every application of behavioral economics will make the world a better place, I believe that giving economics a more human dimension and creating theories that apply to humans, not just econs, will make our discipline stronger, more useful, and undoubtedly more accurate….(More)”.

Why Policymakers Should Care About “Big Data” in Healthcare

Curated on May 15, 2018November 15, 2018 by admin

David W.Bates et al at Health Policy and Technology: “The term “big data” has gotten increasing popular attention, and there is growing focus on how such data can be used to measure and improve health and healthcare. Analytic techniques for extracting information from these data have grown vastly more powerful, and they are now broadly available. But for these approaches to be most useful, large amounts of data must be available, and barriers to use should be low. We discuss how “smart cities” are beginning to invest in this area to improve the health of their populations; provide examples around model approaches for making large quantities of data available to researchers and clinicians among other stakeholders; discuss the current state of big data approaches to improve clinical care including specific examples, and then discuss some of the policy issues around and examples of successful regulatory approaches, including deidentification and privacy protection….(More)”.

The Future of Fishing Is Big Data and Artificial Intelligence

Curated on May 14, 2018May 29, 2019 by Stefaan Verhulst

Meg Wilcox at Civil Eats: “New England’s groundfish season is in full swing, as hundreds of dayboat fishermen from Rhode Island to Maine take to the water in search of the region’s iconic cod and haddock. But this year, several dozen of them are hauling in their catch under the watchful eye of video cameras as part of a new effort to use technology to better sustain the area’s fisheries and the communities that depend on them.

Video observation on fishing boats—electronic monitoring—is picking up steam in the Northeast and nationally as a cost-effective means to ensure that fishing vessels aren’t catching more fish than allowed while informing local fisheries management. While several issues remain to be solved before the technology can be widely deployed—such as the costs of reviewing and storing data—electronic monitoring is beginning to deliver on its potential to lower fishermen’s costs, provide scientists with better data, restore trust where it’s broken, and ultimately help consumers gain a greater understanding of where their seafood is coming from….

Muto’s vessel was outfitted with cameras, at a cost of about $8,000, through a collaborative venture between NOAA’s regional office and science center, The Nature Conservancy (TNC), the Gulf of Maine Research Institute, and the Cape Cod Commercial Fishermen’s Alliance. Camera costs are currently subsidized by NOAA Fisheries and its partners.

The cameras run the entire time Muto and his crew are out on the water. They record how the fisherman handle their discards, the fish they’re not allowed to keep because of size or species type, but that count towards their quotas. The cost is lower than what he’d pay for an in-person monitor.The biggest cost of electronic monitoring, however, is the labor required to review the video. …

Another way to cut costs is to use computers to review the footage. McGuire says there’s been a lot of talk about automating the review, but the common refrain is that it’s still five years off.

To spur faster action, TNC last year spearheaded an online competition, offering a $50,000 prize to computer scientists who could crack the code—that is, teach a computer how to count fish, size them, and identify their species.

“We created an arms race,” says McGuire. “That’s why you do a competition. You’ll never get the top minds to do this because they don’t care about your fish. They all want to work for Google, and one way to get recognized by Google is to win a few of these competitions.”The contest exceeded McGuire’s expectations. “Winners got close to 100 percent in count and 75 percent accurate on identifying species,” he says. “We proved that automated review is now. Not in five years. And now all of the video-review companies are investing in machine leaning.” It’s only a matter of time before a commercial product is available, McGuire believes….(More).

Big Data in the Arts and Humanities: Theory and Practice

Curated on May 3, 2018August 3, 2018 by Stefaan Verhulst

Book edited by Giovanni Schiuma and Daniela Carlucci: “As digital technologies occupy a more central role in working and everyday human life, individual and social realities are increasingly constructed and communicated through digital objects, which are progressively replacing and representing physical objects. They are even shaping new forms of virtual reality. This growing digital transformation coupled with technological evolution and the development of computer computation is shaping a cyber society whose working mechanisms are grounded upon the production, deployment, and exploitation of big data. In the arts and humanities, however, the notion of big data is still in its embryonic stage, and only in the last few years, have arts and cultural organizations and institutions, artists, and humanists started to investigate, explore, and experiment with the deployment and exploitation of big data as well as understand the possible forms of collaborations based on it.

Big Data in the Arts and Humanities: Theory and Practice explores the meaning, properties, and applications of big data. This book examines therelevance of big data to the arts and humanities, digital humanities, and management of big data with and for the arts and humanities. It explores the reasons and opportunities for the arts and humanities to embrace the big data revolution. The book also delineates managerial implications to successfully shape a mutually beneﬁcial partnership between the arts and humanities and the big data- and computational digital-based sciences.

Big data and arts and humanities can be likened to the rational and emotional aspects of the human mind. This book attempts to integrate these two aspects of human thought to advance decision-making and to enhance the expression of the best of human life….(More)“.

The Efficiency Paradox: What Big Data Can’t Do

Curated on April 24, 2018August 3, 2018 by Stefaan Verhulst

Book by Edward Tenner: “A bold challenge to our obsession with efficiency–and a new understanding of how to benefit from the powerful potential of serendipity

Algorithms, multitasking, the sharing economy, life hacks: our culture can’t get enough of efficiency. One of the great promises of the Internet and big data revolutions is the idea that we can improve the processes and routines of our work and personal lives to get more done in less time than we ever have before. There is no doubt that we’re performing at higher levels and moving at unprecedented speed, but what if we’re headed in the wrong direction?

Melding the long-term history of technology with the latest headlines and findings of computer science and social science, The Efficiency Paradox questions our ingrained assumptions about efficiency, persuasively showing how relying on the algorithms of digital platforms can in fact lead to wasted efforts, missed opportunities, and above all an inability to break out of established patterns. Edward Tenner offers a smarter way of thinking about efficiency, revealing what we and our institutions, when equipped with an astute combination of artificial intelligence and trained intuition, can learn from the random and unexpected….(More)”

Smart cities need thick data, not big data

Curated on April 18, 2018August 3, 2018 by Stefaan Verhulst

Adrian Smith at The Guardian: “…The Smart City is an alluring prospect for many city leaders. Even if you haven’t heard of it, you may have already joined in by looking up bus movements on your phone, accessing Council services online or learning about air contamination levels. By inserting sensors across city infrastructures and creating new data sources – including citizens via their mobile devices – Smart City managers can apply Big Data analysis to monitor and anticipate urban phenomena in new ways, and, so the argument goes, efficiently manage urban activity for the benefit of ‘smart citizens’.

Barcelona has been a pioneering Smart City. The Council’s business partners have been installing sensors and opening data platforms for years. Not everyone is comfortable with this technocratic turn. After Ada Colau was elected Mayor on a mandate of democratising the city and putting citizens centre-stage, digital policy has sought to go ‘beyond the Smart City’. Chief Technology Officer Francesca Bria is opening digital platforms to greater citizen participation and oversight. Worried that the city’s knowledge was being ceded to tech vendors, the Council now promotes technological sovereignty.

On the surface, the noise project in Plaça del Sol is an example of such sovereignty. It even features in Council presentations. Look more deeply, however, and it becomes apparent that neighbourhood activists are really appropriating new technologies into the old-fashioned politics of community development….

What made Plaça del Sol stand out can be traced to a group of technology activists who got in touch with residents early in 2017. The activists were seeking participants in their project called Making Sense, which sought to resurrect a struggling ‘Smart Citizen Kit’ for environmental monitoring. The idea was to provide residents with the tools to measure noise levels, compare them with officially permissible levels, and reduce noise in the square. More than 40 neighbours signed up and installed 25 sensors on balconies and inside apartments.

The neighbours had what project coordinator Mara Balestrini from Ideas for Change calls ‘a matter of concern’. The earlier Smart Citizen Kit had begun as a technological solution looking for a problem: a crowd-funded gadget for measuring pollution, whose data users could upload to a web-platform for comparison with information from other users. Early adopters found the technology trickier to install than developers had presumed. Even successful users stopped monitoring because there was little community purpose. A new approach was needed. Noise in Plaça del Sol provided a problem for this technology fix….

Anthropologist Clifford Geertz argued many years ago that situations can only be made meaningful through ‘thick description’. Applied to the Smart City, this means data cannot really be explained and used without understanding the contexts in which it arises and gets used. Data can only mobilise people and change things when it becomes thick with social meaning….(More)”

From Texts to Tweets to Satellites: The Power of Big Data to Fill Gender Data Gaps

Curated on April 17, 2018July 19, 2019 by Stefaan Verhulst

Rebecca Furst-Nichols at UN Foundation Blog: “Twitter posts, credit card purchases, phone calls, and satellites are all part of our day-to-day digital landscape.

Detailed data, known broadly as “big data” because of the massive amounts of passively collected and high-frequency information that such interactions generate, are produced every time we use one of these technologies. These digital traces have great potential and have already developed a track record for application in global development and humanitarian response.

Data2X has focused particularly on what big data can tell us about the lives of women and girls in resource-poor settings. Our research, released today in a new report, Big Data and the Well-Being of Women and Girls, demonstrates how four big data sources can be harnessed to fill gender data gaps and inform policy aimed at mitigating global gender inequality. Big data can complement traditional surveys and other data sources, offering a glimpse into dimensions of girls’ and women’s lives that have otherwise been overlooked and providing a level of precision and timeliness that policymakers need to make actionable decisions.

Here are three findings from our report that underscore the power and potential offered by big data to fill gender data gaps:

Social media data can improve understanding of the mental health of girls and women.

Mental health conditions, from anxiety to depression, are thought to be significant contributors to the global burden of disease, particularly for young women, though precise data on mental health is sparse in most countries. However, research by Georgia Tech University, commissioned by Data2X, finds that social media provides an accurate barometer of mental health status…..

Cell phone and credit card records can illustrate women’s economic and social patterns – and track impacts of shocks in the economy.

Our spending priorities and social habits often indicate economic status, and these activities can also expose economic disparities between women and men.

By compiling cell phone and credit card records, our research partners at MIT traced patterns of women’s expenditures, spending priorities, and physical mobility. The research found that women have less mobility diversity than men, live further away from city centers, and report less total expenditure per capita…..

Satellite imagery can map rivers and roads, but it can also measure gender inequality.

Satellite imagery has the power to capture high-resolution, real-time data on everything from natural landscape features, like vegetation and river flows, to human infrastructure, like roads and schools. Research by our partners at the Flowminder Foundation finds that it is also able to measure gender inequality….(More)”.

Practical approaches to big data privacy over time

Curated on April 6, 2018August 3, 2018 by Stefaan Verhulst

Micah Altman, Alexandra Wood, David R O’Brien and Urs Gasser in International Data Privacy Law: “

Governments and businesses are increasingly collecting, analysing, and sharing detailed information about individuals over long periods of time.
Vast quantities of data from new sources and novel methods for large-scale data analysis promise to yield deeper understanding of human characteristics, behaviour, and relationships and advance the state of science, public policy, and innovation.
The collection and use of fine-grained personal data over time, at the same time, is associated with significant risks to individuals, groups, and society at large.
This article examines a range of long-term research studies in order to identify the characteristics that drive their unique sets of risks and benefits and the practices established to protect research data subjects from long-term privacy risks.
We find that many big data activities in government and industry settings have characteristics and risks similar to those of long-term research studies, but are subject to less oversight and control.
We argue that the risks posed by big data over time can best be understood as a function of temporal factors comprising age, period, and frequency and non-temporal factors such as population diversity, sample size, dimensionality, and intended analytic use.
Increasing complexity in any of these factors, individually or in combination, creates heightened risks that are not readily addressable through traditional de-identification and process controls.
We provide practical recommendations for big data privacy controls based on the risk factors present in a specific case and informed by recent insights from the state of the art and practice….(More)”.

How Democracy Can Survive Big Data

Curated on March 28, 2018August 3, 2018 by Stefaan Verhulst

Colin Koopman in The New York Times: “…The challenge of designing ethics into data technologies is formidable. This is in part because it requires overcoming a century-long ethos of data science: Develop first, question later. Datafication first, regulation afterward. A glimpse at the history of data science shows as much.

The techniques that Cambridge Analytica uses to produce its psychometric profiles are the cutting edge of data-driven methodologies first devised 100 years ago. The science of personality research was born in 1917. That year, in the midst of America’s fevered entry into war, Robert Sessions Woodworth of Columbia University created the Personal Data Sheet, a questionnaire that promised to assess the personalities of Army recruits. The war ended before Woodworth’s psychological instrument was ready for deployment, but the Army had envisioned its use according to the precedent set by the intelligence tests it had been administering to new recruits under the direction of Robert Yerkes, a professor of psychology at Harvard at the time. The data these tests could produce would help decide who should go to the fronts, who was fit to lead and who should stay well behind the lines.

The stakes of those wartime decisions were particularly stark, but the aftermath of those psychometric instruments is even more unsettling. As the century progressed, such tests — I.Q. tests, college placement exams, predictive behavioral assessments — would affect the lives of millions of Americans. Schoolchildren who may have once or twice acted out in such a way as to prompt a psychometric evaluation could find themselves labeled, setting them on an inescapable track through the education system.

Researchers like Woodworth and Yerkes (or their Stanford colleague Lewis Terman, who formalized the first SAT) did not anticipate the deep consequences of their work; they were too busy pursuing the great intellectual challenges of their day, much like Mr. Zuckerberg in his pursuit of the next great social media platform. Or like Cambridge Analytica’s Christopher Wylie, the twentysomething data scientist who helped build psychometric profiles of two-thirds of all Americans by leveraging personal information gained through uninformed consent. All of these researchers were, quite understandably, obsessed with the great data science challenges of their generation. Their failure to consider the consequences of their pursuits, however, is not so much their fault as it is our collective failing.

For the past 100 years we have been chasing visions of data with a singular passion. Many of the best minds of each new generation have devoted themselves to delivering on the inspired data science promises of their day: intelligence testing, building the computer, cracking the genetic code, creating the internet, and now this. We have in the course of a single century built an entire society, economy and culture that runs on information. Yet we have hardly begun to engineer data ethics appropriate for our extraordinary information carnival. If we do not do so soon, data will drive democracy, and we may well lose our chance to do anything about it….(More)”.