Book by Giovanni Livraga: “This book presents a comprehensive approach to protecting sensitive information when large data collections are released by their owners. It addresses three key requirements of data privacy: the protection of data explicitly released, the protection of information not explicitly released but potentially vulnerable due to a release of other data, and the enforcement of owner-defined access restrictions to the released data. It is also the first book with a complete examination of how to enforce dynamic read and write access authorizations on released data, applicable to the emerging data outsourcing and cloud computing situations. Private companies, public organizations and final users are releasing, sharing, and disseminating their data to take reciprocal advantage of the great benefits of making their data available to others. This book weighs these benefits against the potential privacy risks. A detailed analysis of recent techniques for privacy protection in data release and case studies illustrate crucial scenarios. Protecting Privacy in Data Release targets researchers, professionals and government employees working in security and privacy. Advanced-level students in computer science and electrical engineering will also find this book useful as a secondary text or reference….(More)”
The Art of Insight in Science and Engineering: Mastering Complexity
Book by Sanjoy Mahajan: “…shows us that the way to master complexity is through insight rather than precision. Precision can overwhelm us with information, whereas insight connects seemingly disparate pieces of information into a simple picture. Unlike computers, humans depend on insight. Based on the author’s fifteen years of teaching at MIT, Cambridge University, and Olin College, The Art of Insight in Science and Engineering shows us how to build insight and find understanding, giving readers tools to help them solve any problem in science and engineering.
To master complexity, we can organize it or discard it. The Art of Insight in Science and Engineeringfirst teaches the tools for organizing complexity, then distinguishes the two paths for discarding complexity: with and without loss of information. Questions and problems throughout the text help readers master and apply these groups of tools. Armed with this three-part toolchest, and without complicated mathematics, readers can estimate the flight range of birds and planes and the strength of chemical bonds, understand the physics of pianos and xylophones, and explain why skies are blue and sunsets are red. (Public access version of the book).
Big Data. Big Obstacles.
Dalton Conley et al. in the Chronicle of Higher Education: “After decades of fretting over declining response rates to traditional surveys (the mainstay of 20th-century social research), an exciting new era would appear to be dawning thanks to the rise of big data. Social contagion can be studied by scraping Twitter feeds; peer effects are tested on Facebook; long-term trends in inequality and mobility can be assessed by linking tax records across years and generations; social-psychology experiments can be run on Amazon’s Mechanical Turk service; and cultural change can be mapped by studying the rise and fall of specific Google search terms. In many ways there has been no better time to be a scholar in sociology, political science, economics, or related fields.
However, what should be an opportunity for social science is now threatened by a three-headed monster of privatization, amateurization, and Balkanization. A coordinated public effort is needed to overcome all of these obstacles.
While the availability of social-media data may obviate the problem of declining response rates, it introduces all sorts of problems with the level of access that researchers enjoy. Although some data can be culled from the web—Twitter feeds and Google searches—other data sit behind proprietary firewalls. And as individual users tune up their privacy settings, the typical university or independent researcher is increasingly locked out. Unlike federally funded studies, there is no mandate for Yahoo or Alibaba to make its data publicly available. The result, we fear, is a two-tiered system of research. Scientists working for or with big Internet companies will feast on humongous data sets—and even conduct experiments—and scholars who do not work in Silicon Valley (or Alley) will be left with proverbial scraps….
To address this triple threat of privatization, amateurization, and Balkanization, public social science needs to be bolstered for the 21st century. In the current political and economic climate, social scientists are not waiting for huge government investment like we saw during the Cold War. Instead, researchers have started to knit together disparate data sources by scraping, harmonizing, and geocoding any and all information they can get their hands on.
Currently, many firms employ some well-trained social and behavioral scientists free to pursue their own research; likewise, some companies have programs by which scholars can apply to be in residence or work with their data extramurally. However, as Facebook states, its program is “by invitation only and requires an internal Facebook champion.” And while Google provides services like Ngram to the public, such limited efforts at data sharing are not enough for truly transparent and replicable science….(More)”
Selected Readings on Data Governance
Jos Berens (Centre for Innovation, Leiden University) and Stefaan G. Verhulst (GovLab)
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data governance was originally published in 2015.
Context
The field of Data Collaboratives is premised on the idea that sharing and opening-up private sector datasets has great – and yet untapped – potential for promoting social good. At the same time, the potential of data collaboratives depends on the level of societal trust in the exchange, analysis and use of the data exchanged. Strong data governance frameworks are essential to ensure responsible data use. Without such governance regimes, the emergent data ecosystem will be hampered and the (perceived) risks will dominate the (perceived) benefits. Further, without adopting a human-centered approach to the design of data governance frameworks, including iterative prototyping and careful consideration of the experience, the responses may fail to be flexible and targeted to real needs.
Selected Readings List (in alphabetical order)
- Better Place Lab – Privacy, Transparency and Trust – a report looking specifically at the main risks development organizations should focus on to develop a responsible data use practice.
- The Brookings Institution – Enabling Humanitarian Use of Mobile Phone Data – this paper explores ways of mitigating privacy harms involved in using call detail records for social good.
- Centre for Democracy and Technology – Health Big Data in the Commercial Context – a publication treating some of the risks involved in using new sources of health related data, and how to mitigate those risks.
- Center for Information Policy Leadership – A Risk-based Approach to Privacy: Improving Effectiveness in Practice – a whitepaper on the elements of a risk-based approach to privacy.
- Centre for Information Policy and Leadership – Data Governance for the Evolving Digital Market Place – a paper describing the necessary organizational reforms to effectively promote accountability within organizational structures.
- Crawford and Schulz – Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harm – a paper considering a rigorous ‘procedural data due process’.
- DataPop Alliance – The Ethics and Politics of Call Data Analytics – a paper exploring the risks involved in using call detail records for social good, and possible ways of mitigating those risks.
- Data for Development External Ethics Panel – Report of the External Review Panel – a report presenting the findings of the external expert panel overseeing the Data for Development Challenge.
- Federal Trade Commission – Mobile Privacy Disclosures: Building Trust Through Transparency – a report by the FTC looking at the privacy risks involved in mobile data sharing, and ways to mitigate these risks.
- Leo Mirani – How to use mobile phone data for good without invading any ones privacy – a paper on the use of data produced by mobile phone use, and the steps that need to be taken to ensure that user privacy is not intruded upon.
- Lucy Bernholz – Several Examples of Digital Ethics and Proposed Practices – a literature review listing multiple sources compiled for the Stanford Ethics of Data conference, 2014.
- Martin Abrams – A Unified Ethical Frame for Big Data Analysis – a paper from the Information Accountability Foundation on developing a unified ethical frame for data analysis that goes beyond privacy.
- NYU Centre for Urban Science and Progress – Privacy, Big Data and the Public Good – a book on the privacy issues surrounding the use of big data for promoting the public good.
- Neil M. Richards and Jonathan H. King – Big Data Ethics – a research paper arguing that the growing impact of big data on society calls for a set of ethical principles to guide big data use.
- OECD Revised Privacy Guidelines – a set of principles accompanied by explanatory text used globally to inform the governance and policy structures around data handling.
- Whitehouse Big Data and Privacy Working Group – Big Data: Seizing Opportunities, Preserving Values – a whitepaper documenting the findings of the Whitehouse big data and privacy working group.
- World Economic Forum – Pathways for Progress – a whitepaper considering the global data ecosystem and the constraints preventing data from flowing to those who need it most. A lack of well-defined and balanced governance mechanisms is considered one of the key obstacles.
Annotated Selected Readings List (in alphabetical order)
Better Place Lab, “Privacy, Transparency and Trust.” Mozilla, 2015. Available from: http://www.betterplace-lab.org/privacy-report.
- This report looks specifically at the risks involved in the social sector having access to datasets, and the main risks development organizations should focus on to develop a responsible data use practice.
- Focusing on five specific countries (Brazil, China, Germany, India and Indonesia), the report displays specific country profiles, followed by a comparative analysis centering around the topics of privacy, transparency, online behavior and trust.
- Some of the key findings mentioned are:
- A general concern on the importance of privacy, with cultural differences influencing conception of what privacy is.
- Cultural differences determining how transparency is perceived, and how much value is attached to achieving it.
- To build trust, individuals need to feel a personal connection or get a personal recommendation – it is hard to build trust regarding automated processes.
Montjoye, Yves Alexandre de; Kendall, Jake and; Kerry, Cameron F. “Enabling Humanitarian Use of Mobile Phone Data.” The Brookings Institution, 2015. Available from: http://www.brookings.edu/research/papers/2014/11/12-enabling-humanitarian-use-mobile-phone-data.
- Focussing in particular on mobile phone data, this paper explores ways of mitigating privacy harms involved in using call detail records for social good.
- Key takeaways are the following recommendations for using data for social good:
- Engaging companies, NGOs, researchers, privacy experts, and governments to agree on a set of best practices for new privacy-conscientious metadata sharing models.
- Accepting that no framework for maximizing data for the public good will offer perfect protection for privacy, but there must be a balanced application of privacy concerns against the potential for social good.
- Establishing systems and processes for recognizing trusted third-parties and systems to manage datasets, enable detailed audits, and control the use of data so as to combat the potential for data abuse and re-identification of anonymous data.
- Simplifying the process among developing governments in regards to the collection and use of mobile phone metadata data for research and public good purposes.
Centre for Democracy and Technology, “Health Big Data in the Commercial Context.” Centre for Democracy and Technology, 2015. Available from: https://cdt.org/insight/health-big-data-in-the-commercial-context/.
- Focusing particularly on the privacy issues related to using data generated by individuals, this paper explores the overlap in privacy questions this field has with other data uses.
- The authors note that although the Health Insurance Portability and Accountability Act (HIPAA) has proven a successful approach in ensuring accountability for health data, most of these standards do not apply to developers of the new technologies used to collect these new data sets.
- For non-HIPAA covered, customer facing technologies, the paper bases an alternative framework for consideration of privacy issues. The framework is based on the Fair Information Practice Principles, and three rounds of stakeholder consultations.
Center for Information Policy Leadership, “A Risk-based Approach to Privacy: Improving Effectiveness in Practice.” Centre for Information Policy Leadership, Hunton & Williams LLP, 2015. Available from: https://www.informationpolicycentre.com/uploads/5/7/1/0/57104281/white_paper_1-a_risk_based_approach_to_privacy_improving_effectiveness_in_practice.pdf.
- This white paper is part of a project aiming to explain what is often referred to as a new, risk-based approach to privacy, and the development of a privacy risk framework and methodology.
- With the pace of technological progress often outstripping the capabilities of privacy officers to keep up, this method aims to offer the ability to approach privacy matters in a structured way, assessing privacy implications from the perspective of possible negative impact on individuals.
- With the intended outcomes of the project being “materials to help policy-makers and legislators to identify desired outcomes and shape rules for the future which are more effective and less burdensome”, insights from this paper might also feed into the development of innovative governance mechanisms aimed specifically at preventing individual harm.
Centre for Information Policy Leadership, “Data Governance for the Evolving Digital Market Place”, Centre for Information Policy Leadership, Hunton & Williams LLP, 2011. Available from: http://www.huntonfiles.com/files/webupload/CIPL_Centre_Accountability_Data_Governance_Paper_2011.pdf.
- This paper argues that as a result of the proliferation of large scale data analytics, new models governing data inferred from society will shift responsibility to the side of organizations deriving and creating value from that data.
- It is noted that, with the reality of the challenge corporations face of enabling agile and innovative data use “In exchange for increased corporate responsibility, accountability [and the governance models it mandates, ed.] allows for more flexible use of data.”
- Proposed as a means to shift responsibility to the side of data-users, the accountability principle has been researched by a worldwide group of policymakers. Tailing the history of the accountability principle, the paper argues that it “(…) requires that companies implement programs that foster compliance with data protection principles, and be able to describe how those programs provide the required protections for individuals.”
- The following essential elements of accountability are listed:
- Organisation commitment to accountability and adoption of internal policies consistent with external criteria
- Mechanisms to put privacy policies into effect, including tools, training and education
- Systems for internal, ongoing oversight and assurance reviews and external verification
- Transparency and mechanisms for individual participation
- Means of remediation and external enforcement
Crawford, Kate; Schulz, Jason. “Big Data and Due Process: Toward a Framework to Redress Predictive Privacy Harm.” NYU School of Law, 2014. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2325784&download=yes.
- Considering the privacy implications of large-scale analysis of numerous data sources, this paper proposes the implementation of a ‘procedural data due process’ mechanism to arm data subjects against potential privacy intrusions.
- The authors acknowledge that some privacy protection structures already know similar mechanisms. However, due to the “inherent analytical assumptions and methodological biases” of big data systems, the authors argue for a more rigorous framework.
Letouze, Emmanuel, and; Vinck, Patrick. “The Ethics and Politics of Call Data Analytics”, DataPop Alliance, 2015. Available from: http://static1.squarespace.com/static/531a2b4be4b009ca7e474c05/t/54b97f82e4b0ff9569874fe9/1421442946517/WhitePaperCDRsEthicFrameworkDec10-2014Draft-2.pdf.
- Focusing on the use of Call Detail Records (CDRs) for social good in development contexts, this whitepaper explores both the potential of these datasets – in part by detailing recent successful efforts in the space – and political and ethical constraints to their use.
- Drawing from the Menlo Report Ethical Principles Guiding ICT Research, the paper explores how these principles might be unpacked to inform an ethics framework for the analysis of CDRs.
Data for Development External Ethics Panel, “Report of the External Ethics Review Panel.” Orange, 2015. Available from: http://www.d4d.orange.com/fr/content/download/43823/426571/version/2/file/D4D_Challenge_DEEP_Report_IBE.pdf.
- This report presents the findings of the external expert panel overseeing the Orange Data for Development Challenge.
- Several types of issues faced by the panel are described, along with the various ways in which the panel dealt with those issues.
Federal Trade Commission Staff Report, “Mobile Privacy Disclosures: Building Trust Through Transparency.” Federal Trade Commission, 2013. Available from: www.ftc.gov/os/2013/02/130201mobileprivacyreport.pdf.
- This report looks at ways to address privacy concerns regarding mobile phone data use. Specific advise is provided for the following actors:
- Platforms, or operating systems providers
- App developers
- Advertising networks and other third parties
- App developer trade associations, along with academics, usability experts and privacy researchers
Mirani, Leo. “How to use mobile phone data for good without invading anyone’s privacy.” Quartz, 2015. Available from: http://qz.com/398257/how-to-use-mobile-phone-data-for-good-without-invading-anyones-privacy/.
- This paper considers the privacy implications of using call detail records for social good, and ways to mitigate risks of privacy intrusion.
- Taking example of the Orange D4D challenge and the anonymization strategy that was employed there, the paper describes how classic ‘anonymization’ is often not enough. The paper then lists further measures that can be taken to ensure adequate privacy protection.
Bernholz, Lucy. “Several Examples of Digital Ethics and Proposed Practices” Stanford Ethics of Data conference, 2014, Available from: http://www.scribd.com/doc/237527226/Several-Examples-of-Digital-Ethics-and-Proposed-Practices.
- This list of readings prepared for Stanford’s Ethics of Data conference lists some of the leading available literature regarding ethical data use.
Abrams, Martin. “A Unified Ethical Frame for Big Data Analysis.” The Information Accountability Foundation, 2014. Available from: http://www.privacyconference2014.org/media/17388/Plenary5-Martin-Abrams-Ethics-Fundamental-Rights-and-BigData.pdf.
- Going beyond privacy, this paper discusses the following elements as central to developing a broad framework for data analysis:
- Beneficial
- Progressive
- Sustainable
- Respectful
- Fair
Lane, Julia; Stodden, Victoria; Bender, Stefan, and; Nissenbaum, Helen, “Privacy, Big Data and the Public Good”, Cambridge University Press, 2014. Available from: http://www.dataprivacybook.org.
- This book treats the privacy issues surrounding the use of big data for promoting the public good.
- The questions being asked include the following:
- What are the ethical and legal requirements for scientists and government officials seeking to serve the public good without harming individual citizens?
- What are the rules of engagement?
- What are the best ways to provide access while protecting confidentiality?
- Are there reasonable mechanisms to compensate citizens for privacy loss?
Richards, Neil M, and; King, Jonathan H. “Big Data Ethics”. Wake Forest Law Review, 2014. Available from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2384174.
- This paper describes the growing impact of big data analytics on society, and argues that because of this impact, a set of ethical principles to guide data use is called for.
- The four proposed themes are: privacy, confidentiality, transparency and identity.
- Finally, the paper discusses how big data can be integrated into society, going into multiple facets of this integration, including the law, roles of institutions and ethical principles.
OECD, “OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data”. Available from: http://www.oecd.org/sti/ieconomy/oecdguidelinesontheprotectionofprivacyandtransborderflowsofpersonaldata.htm.
- A globally used set of principles to inform thought about handling personal data, the OECD privacy guidelines serve as one the leading standards for informing privacy policies and data governance structures.
- The basic principles of national application are the following:
- Collection Limitation Principle
- Data Quality Principle
- Purpose Specification Principle
- Use Limitation Principle
- Security Safeguards Principle
- Openness Principle
- Individual Participation Principle
- Accountability Principle
The White House Big Data and Privacy Working Group, “Big Data: Seizing Opportunities, Preserving Values”, White House, 2015. Available from: https://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_5.1.14_final_print.pdf.
- Documenting the findings of the White House big data and privacy working group, this report lists i.a. the following key recommendations regarding data governance:
- Bringing greater transparency to the data services industry
- Stimulating international conversation on big data, with multiple stakeholders
- With regard to educational data: ensuring data is used for the purpose it is collected for
- Paying attention to the potential for big data to facilitate discrimination, and expanding technical understanding to stop discrimination
William Hoffman, “Pathways for Progress” World Economic Forum, 2015. Available from: http://www3.weforum.org/docs/WEFUSA_DataDrivenDevelopment_Report2015.pdf.
- This paper treats i.a. the lack of well-defined and balanced governance mechanisms as one of the key obstacles preventing particularly corporate sector data from being shared in a controlled space.
- An approach that balances the benefits against the risks of large scale data usage in a development context, building trust among all stake holders in the data ecosystem, is viewed as key.
- Furthermore, this whitepaper notes that new governance models are required not just by the growing amount of data and analytical capacity, and more refined methods for analysis. The current “super-structure” of information flows between institutions is also seen as one of the key reasons to develop alternatives to the current – outdated – approaches to data governance.
Citizen Science in the Unexplored Terrain of the Brain
Aaron Krol at BioITWorld: “The game is simple. On the left-hand side of the screen you see a cube containing a misshapen 3D figure, a bit like a tree branch with a gall infestation. To the right is a razor-thin cross-section of the cube, a grainy image of overlapping gray blobs. Clicking on a blob colors it in, like using the paint bucket tool in MS Paint, while also sending colorful extensions out from the branch to the left. Working your way through 256 of these cross-sections, your job is to extend the branch through the cube, identifying which blobs are continuous with the branch and which are nearby distractions.
It hardly sounds like a game at all, but strange to say, there’s something very compelling about playing EyeWire. Maybe it’s watching the branches grow and fork as you discover new connections. Maybe it’s how quickly you can rack up progress, almost not noticing time go by as you span your branches through cube after cube.
“It draws you in,” says Nikitas Serafetinidis ― or Nseraf, as he’s known in-game. “There’s an unexplained component that makes this game highly addictive.”
Serafetinidis is the world record holder in EyeWire, a game whose players are helping to build a three-dimensional map of brain cells in the retina. The images in EyeWire are in fact photos taken with an electron microscope at the Max Planck Institute of Medical Research in Heidelberg: each one represents a tiny sliver of a mouse’s retina, just 20 nanometers thick. The “blobs” are thin slices of closely adjoined neurons, and the “branch” shows the path of a single cell, which can cross through hundreds of thousands of those images….(More)”
Facebook’s Filter Study Raises Questions About Transparency
Will Knight in MIT Technology Review: “Facebook is an enormously valuable source of information about social interactions.
Facebook’s latest scientific research, about the way it shapes the political perspectives users are exposed to, has led some academics to call for the company to be more open about what it chooses to study and publish.
This week the company’s data science team published a paper in the prominent journal Science confirming what many had long suspected: that the network’s algorithms filter out some content that might challenge a person’s political leanings. However, the paper also suggested that the effect was fairly small, and less significant than a user’s own filtering behavior (see “Facebook Says You Filter News More Than Its Algorithm Does”).
Several academics have pointed to limitations of the study, such as the fact that the only people involved had indicated their political affiliation on their Facebook page. Critics point out that those users might behave in a different way from everyone else. But beyond that, a few academics have noted a potential tension between Facebook’s desire to explore the scientific value of its data and its own corporate interests….
In response to the controversy over that study, Facebook’s chief technology officer, Mike Schroepfer, wrote a Facebook post that acknowledged people’s concerns and described new guidelines for its scientific research. “We’ve created a panel including our most senior subject-area researchers, along with people from our engineering, research, legal, privacy and policy teams, that will review projects falling within these guidelines,” he wrote….(More)
Toward a Research Agenda on Opening Governance
Members of the MacArthur Research Network on Opening Governance at Medium: “Society is confronted by a number of increasingly complex problems — inequality, climate change, access to affordable healthcare — that often seem intractable. Existing societal institutions, including government agencies, corporations and NGOs, have repeatedly proven themselves unable to tackle these problems in their current composition. Unsurprisingly, trust in existing institutions is at an all-time low.
At the same time, advances in technology and sciences offer a unique opportunity to redesign and reinvent our institutions. Increased access to data may radically transform how we identify problems and measure progress. Our capacity to connect with citizens could greatly increase the knowledge and expertise available to solve big public problems. We are witnessing, in effect, the birth of a new paradigm of governance — labeled “open governance” — where institutions share and leverage data, pursue collaborative problem-solving, and partner with citizens to make better decisions. All of these developments offer a potential solution to the crisis of trust and legitimacy confronting existing institutions.
But for the promise of open governance, we actually know very little about its true impact, and about the conditions and contingencies required for institutional innovation to really work. Even less is known about the capabilities that institutions must develop in order to be able to take advantage of new technologies and innovative practices. The lack of evidence is holding back positive change. It is limiting our ability to improve people’s lives.
The MacArthur Foundation Research Network on Opening Governance seeks to address these shortcomings. Convened and organized by the GovLab, and made possible by a three-year, $5 million grant from the John D. and Catherine T. MacArthur Foundation, the Network seeks to build an empirical foundation that will help us understand how democratic institutions are being (and should be) redesigned, and how this in turn influences governance. At its broadest level, the Network seeks to create a new science of institutional innovation.
In what follows, we outline a research agenda and a set of deliverables for the coming years that can deepen our understanding of “open governance.” More specifically the below seeks:
- to frame and contextualize the areas of common focus among the members;
- to guide the targeted advancement of Network activities;
- to catalyze opportunities for further collaboration and knowledge exchange between Network members and those working in the field at large.
A core objective of the Network is to conduct research based on, and that has relevance for, real-world institutions. Any research that is solely undertaken in the lab, far from the actual happenings the Network seeks to influence and study, is deemed to be insufficient. As such, the Network is actively developing flexible, scalable methodologies to help analyze the impact of opening governance. In the spirit of interdisciplinarity and openness that defines the Network, these methodologies are being developed collaboratively with partners from diverse disciplines.
The below seeks to provide a framework for those outside the Network — including those who would not necessarily characterize their research as falling under the banner of opening governance — to undertake empirical, agile research into the redesign and innovation of governance processes and the solving of public problems….(More)”
Global Diseases, Collective Solutions
New paper by Ben Ramalingam: “Environmental disruption, mass urbanization and the runaway globalization of trade and transport have created ideal conditions for infectious diseases to emerge and spread around the world. Rapid spill-overs from local into regional and global crises reveal major gaps in the global system for dealing with infectious diseases.
A number of Global Solution Networks have emerged that address failures of systems, of institutions and of markets. At their most ambitious, they aim to change the rules of the global health game—opening up governance structures, sharing knowledge and science, developing new products, creating markets—all with the ultimate aim of preventing and treating diseases, and saving lives.
These networks have emerged in an ad-hoc and opportunistic fashion. More strategic thinking and investment is needed to build networking competencies and to identify opportunities for international institutions to best leverage new forms of collaboration and partnership. (Read the paper here).”
Data Fusion Heralds City Attractiveness Ranking
Emerging Technology From the arXiv: “The ability of any city to attract visitors is an important metric for town planners, businesses based on tourism, traffic planners, residents, and so on. And there are increasingly varied ways of measuring this thanks to the growing volumes of city-related data generated by with social media, and location-based data.
So it’s only natural that researchers would like to draw these data sets together to see what kind of insight they can get from this form of data fusion.
And so it has turned out thanks to the work of Stanislav Sobolevsky at MIT and a few buddies. These guys have fused three wildly different data sets related to the attractiveness of a city that allows them to rank these places and to understand why people visit them and what they do when they get there.
The work focuses exclusively on cities in Spain using data that is relatively straightforward to gather. The first data set consists of the number of credit and debit card transactions carried out by visitors to cities throughout Spain during 2011. This includes each card’s country of origin, which allows Sobolevsky and co to count only those transactions made by foreign visitors—a total of 17 million anonymized transactions from 8.6 million foreign visitors from 175 different countries.
The second data set consists of over 3.5 million photos and videos taken in Spain and posted to Flickr by people living in other countries. These pictures were taken between 2005 and 2014 by 16,000 visitors from 112 countries.
The last data set consists of around 700,000 geotagged tweets posted in Spain during 2012. These were posted by 16,000 foreign visitors from 112 countries.
Finally, the team defined a city’s attractiveness, at least for the purposes of this study, as the total number of pictures, tweets and card transactions that took place within it……
That’s interesting work that shows how the fusion of big data sets can provide insights into the way people use cities. It has its limitations of course. The study does not address the reasons why people find cities attractive and what draws them there in the first place. For example, are they there for tourism, for business, or for some other reason. That would require more specialized data.
But it does provide a general picture of attractiveness that could be a start for more detailed analyses. As such, this work is just a small part of a new science of cities based on big data, but one that shows how much is becoming possible with just a little number crunching.
Ref: arxiv.org/abs/1504.06003 : Scaling of city attractiveness for foreign visitors through big data of human economic and social media activity”
Preparing for Responsible Sharing of Clinical Trial Data
Paper by Michelle M. Mello et al in the New England Journal of Medicine: “Data from clinical trials, including participant-level data, are being shared by sponsors and investigators more widely than ever before. Some sponsors have voluntarily offered data to researchers, some journals now require authors to agree to share the data underlying the studies they publish, the Office of Science and Technology Policy has directed federal agencies to expand public access to data from federally funded projects, and the European Medicines Agency (EMA) and U.S. Food and Drug Administration (FDA) have proposed the expansion of access to data submitted in regulatory applications. Sharing participant-level data may bring exciting benefits for scientific research and public health but may also have unintended consequences. Thus, expanded data sharing must be pursued thoughtfully.
We provide a suggested framework for broad sharing of participant-level data from clinical trials and related technical documents. After reviewing current data-sharing initiatives, potential benefits and risks, and legal and regulatory implications, we propose potential governing principles and key features for a system of expanded access to participant-level data and evaluate several governance structures….(More)”