Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.
Open Cities | Open Data: Collaborative Cities in the Information Era
Book edited by Scott Hawken, Hoon Han and Chris Pettit: “Today the world’s largest economies and corporations trade in data and its products to generate value in new disruptive markets. Within these markets vast streams of data are often inaccessible or untapped and controlled by powerful monopolies. Counter to this exclusive use of data is a promising world-wide “open-data” movement, promoting freely accessible information to share, reuse and redistribute. The provision and application of open data has enormous potential to transform exclusive, technocratic “smart cities” into inclusive and responsive “open-cities”.
This book argues that those who contribute urban data should benefit from its production. Like the city itself, the information landscape is a public asset produced through collective effort, attention, and resources. People produce data through their engagement with the city, creating digital footprints through social medial, mobility applications, and city sensors. By opening up data there is potential to generate greater value by supporting unforeseen collaborations, spontaneous urban innovations and solutions, and improved decision-making insights. Yet achieving more open cities is made challenging by conflicting desires for urban anonymity, sociability, privacy and transparency. This book engages with these issues through a variety of critical perspectives, and presents strategies, tools and case studies that enable this transformation….(More)”.
Community Data Dialogues
Sunlight foundation: “Community Data Dialogues are in-person events designed to share open data with community members in the most digestible way possible to start a conversation about a specific issue. The main goal of the event is to give residents who may not have technical expertise but have local experience a chance to participate in data-informed decision-making. Doing this work in-person can open doors and let facilitators ask a broader range of questions. To achieve this, the event must be designed to be inclusive of people without a background in data analysis and/or using statistics to understand local issues. Carrying out this event will let decision-makers in government use open data to talk with residents who can add to data’s value with their stories of lived experience relevant to local issues.
These events can take several forms, and groups both in and outside of government have designed creative and innovative events tailored to engage community members who are actively interested in helping solve local issues but are unfamiliar with using open data. This guide will help clarify how exactly to make Community Data Dialogues non-technical, interactive events that are inclusive to all participants….
A number of groups both in and outside of government have facilitated accessible open data events to great success. Here are just a few examples from the field of what data-focused events tailored for a nontechnical audience can look like:
Data Days Cleveland
Data Days Cleveland is an annual one-day event designed to make data accessible to all. Programs are designed with inclusivity and learning in mind, making it a more welcoming space for people new to data work. Data experts and practitioners direct novices on the fundamentals of using data: making maps, reading spreadsheets, creating data visualizations, etc….
The Urban Institute’s Data Walks
The Urban Institute’s Data Walks are an innovative example of presenting data in an interactive and accessible way to communities. Data Walks are events gathering community residents, policymakers, and others to jointly review and analyze data presentations on specific programs or issues and collaborate to offer feedback based on their individual experiences and expertise. This feedback can be used to improve current projects and inform future policies….(More)“.
How technology can enable a more sustainable agriculture industry
Matt High at CSO:”…The sector also faces considerable pressure in terms of its transparency, largely driven by shifting consumer preferences for responsibly sourced and environmentally-friendly goods. The UK, for example, has seen shoppers transition away from typical agricultural commodities towards ‘free-from’ or alternative options that combine health, sustainability and quality.
It means that farmers worldwide must work harder and smarter in embedding corporate social responsibility (CSR) practices into their operations. Davis, who through Anthesis delivers financially driven sustainability strategies, strongly believes that sustainability is no longer a choice. “The agricultural sector is intrinsic to a wide range of global systems, societies and economies,” he says, adding that those organisations that do not embed sustainability best practice into their supply chains will face “increasing risk of price volatility, security of supply, commodity shortages, fraud and uncertainty.” To counter this, he urges businesses to develop CSR founded on a core set of principles that enable sustainable practices to be successfully adopted at a pace and scale that mitigates those risks discussed.
Data is proving a particularly useful tool in this regard. Take the Cool Farm Tool, for example, which is a global, free-to-access online greenhouse gas (GHG), water and biodiversity footprint calculator used by farmers in more than 115 countries worldwide to enable effective management of critical on-farm sustainability challenges. Member organisations such as Pepsi, Tesco and Danone aggregate their supply chain data to report total agricultural footprint against key sustainability metrics – outputs from which are used to share knowledge and best practice on carbon and water reductions strategies….(More)”.
Guidance Note: Statistical Disclosure Control
Centre for Humanitarian Data: “Survey and needs assessment data, or what is known as ‘microdata’, is essential for providing adequate response to crisis-affected people. However, collecting this information does present risks. Even as great effort is taken to remove unique identifiers such as names and phone numbers from microdata so no individual persons or communities are exposed, combining key variables such as location or ethnicity can still allow for re-identification of individual respondents. Statistical Disclosure Control (SDC) is one method for reducing this risk.
The Centre has developed a Guidance Note on Statistical Disclosure Control that outlines the steps involved in the SDC process, potential applications for its use, case studies and key actions for humanitarian data practitioners to take when managing sensitive microdata. Along with an overview of what SDC is and what tools are available, the Guidance Note outlines how the Centre is using this process to mitigate risk for datasets shared on HDX. …(More)”.
Hacking for Housing: How open data and civic hacking creates wins for housing advocates
Krista Chan at Sunlight: “…Housing advocates have an essential role to play in protecting residents from the consequences of real estate speculation. But they’re often at a significant disadvantage; the real estate lobby has access to a wealth of data and technological expertise. Civic hackers and open data could play an essential role in leveling the playing field.
Civic hackers have facilitated wins for housing advocates by scraping data or submitting FOIA requests where data is not open and creating apps to help advocates gain insights that they can turn into action.
Hackers at New York City’s Housing Data Coalition created a host of civic apps that identify problematic landlords by exposing owners behind shell companies, or flagging buildings where tenants are at risk of displacement. In a similar vein, Washington DC’s Housing Insights tool aggregates a wide variety of data to help advocates make decisions about affordable housing.
Barriers and opportunities
Today, the degree to which housing data exists, is openly available, and consistently reliable varies widely, even within cities themselves. Cities with robust communities of affordable housing advocacy groups may not be connected to people who can help open data and build usable tools. Even in cities with robust advocacy and civic tech communities, these groups may not know how to work together because of the significant institutional knowledge that’s required to understand how to best support housing advocacy efforts.
In cities where civic hackers have tried to create useful open housing data repositories, similar data cleaning processes have been replicated, such as record linkage of building owners or identification of rent-controlled units. Civic hackers need to take on these data cleaning and “extract, transform, load” (ETL) processes in order to work with the data itself, even if it’s openly available. The Housing Data Coalition has assembled NYC-DB, a tool which builds a postgres database containing a variety of housing related data pertaining to New York City, and Washington DC’s Housing Insights similarly ingests housing data into a postgres database and API for front-end access.
Since these tools are open source, civic hackers in a multitude of cities can use existing work to develop their own, locally relevant tools to support local housing advocates….(More)”.
The value of data in Canada: Experimental estimates
Statistics Canada: “As data and information take on a far more prominent role in Canada and, indeed, all over the world, data, databases and data science have become a staple of modern life. When the electricity goes out, Canadians are as much in search of their data feed as they are food and heat. Consumers are using more and more data that is embodied in the products they buy, whether those products are music, reading material, cars and other appliances, or a wide range of other goods and services. Manufacturers, merchants and other businesses depend increasingly on the collection, processing and analysis of data to make their production processes more efficient and to drive their marketing strategies.
The increasing use of and investment in all things data is driving economic growth, changing the employment landscape and reshaping how and from where we buy and sell goods. Yet the rapid rise in the use and importance of data is not well measured in the existing statistical system. Given the ‘lack of data on data’, Statistics Canada has initiated new research to produce a first set of estimates of the value of data, databases and data science. The development of these estimates benefited from collaboration with the Bureau of Economic Analysis in the United States and the Organisation for Economic Co-operation and Development.
In 2018, Canadian investment in data, databases and data science was estimated to be as high as $40 billion. This was greater than the annual investment in industrial machinery, transportation equipment, and research and development and represented approximately 12% of total non-residential investment in 2018….
Statistics Canada recently released a conceptual framework outlining how one might measure the economic value of data, databases and data science. Thanks to this new framework, the growing role of data in Canada can be measured through time. This framework is described in a paper that was released in The Daily on June 24, 2019 entitled “Measuring investments in data, databases and data science: Conceptual framework.” That paper describes the concept of an ‘information chain’ in which data are derived from everyday observations, databases are constructed from data, and data science creates new knowledge by analyzing the contents of databases….(More)”.
How we can place a value on health care data
Report by E&Y: “Unlocking the power of health care data to fuel innovation in medical research and improve patient care is at the heart of today’s health care revolution. When curated or consolidated into a single longitudinal dataset, patient-level records will trace a complete story of a patient’s demographics, health, wellness, diagnosis, treatments, medical procedures and outcomes. Health care providers need to recognize patient data for what it is: a valuable intangible asset desired by multiple stakeholders, a treasure trove of information.
Among the universe of providers holding significant data assets, the United Kingdom’s National Health Service (NHS) is the single largest integrated health care provider in the world. Its patient records cover the entire UK population from birth to death.
We estimate that the 55 million patient records held by the NHS today may have an indicative market value of several billion pounds to a commercial organization. We estimate also that the value of the curated NHS dataset could be as much as £5bn per annum and deliver around £4.6bn of benefit to patients per annum, in potential operational savings for the NHS, enhanced patient outcomes and generation of wider economic benefits to the UK….(More)”.
The plan to mine the world’s research papers
Priyanka Pulla in Nature: “Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.
Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.
No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.
The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. “We bring in professors and explain what we are doing. They get all excited and they say, ‘Oh gosh, this is wonderful’,” says Malamud.
But the depot’s legal status isn’t yet clear. Malamud, who contacted several intellectual-property (IP) lawyers before starting work on the depot, hopes to avoid a lawsuit. “Our position is that what we are doing is perfectly legal,” he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in. Malamud says his team does plan to allow remote access in the future. “The hope is to do this slowly and deliberately. We are not throwing this open right away,” he says….(More)”.
What Restaurant Reviews Reveal About Cities
Linda Poon at CityLab: “Online review sites can tell you a lot about a city’s restaurant scene, and they can reveal a lot about the city itself, too.
Researchers at MIT recently found that information about restaurants gathered from popular review sites can be used to uncover a number of socioeconomic factors of a neighborhood, including its employment rates and demographic profiles of the people who live, work, and travel there.
A report published last week in the Proceedings of the National Academy of Sciences explains how the researchers used information found on Dianping—a Yelp-like site in China—to find information that might usually be gleaned from an official government census. The model could prove especially useful for gathering information about cities that don’t have that kind of reliable or up-to-date government data, especially in developing countries with limited resources to conduct regular surveys….
Zheng and her colleagues tested out their machine-learning model using restaurant data from nine Chinese cities of various sizes—from crowded ones like Beijing, with a population of more than 10 million, to smaller ones like Baoding, a city of fewer than 3 million people.
They pulled data from 630,000 restaurants listed on Dianping, including each business’s location, menu prices, opening day, and customer ratings. Then they ran it through a machine-learning model with official census data and with anonymous location and spending data gathered from cell phones and bank cards. By comparing the information, they were able to determine where the restaurant data reflected the other data they had about neighborhoods’ characteristics.
They found that the local restaurant scene can predict, with 95 percent accuracy, variations in a neighborhood’s daytime and nighttime populations, which are measured using mobile phone data. They can also predict, with 90 and 93 percent accuracy, respectively, the number of businesses and the volume of consumer consumption. The type of cuisines offered and kind of eateries available (coffeeshop vs. traditional teahouses, for example), can also predict the proportion of immigrants or age and income breakdown of residents. The predictions are more accurate for neighborhoods near urban centers as opposed to those near suburbs, and for smaller cities, where neighborhoods don’t vary as widely as those in bigger metropolises….(More)”.