Information literacy in the age of algorithms


Report by Alison J. Head, Ph.D., Barbara Fister, Margy MacMillan: “…Three sets of questions guided this report’s inquiry:

  1. What is the nature of our current information environment, and how has it influenced how we access, evaluate, and create knowledge today? What do findings from a decade of PIL research tell us about the information skills and habits students will need for the future?
  2. How aware are current students of the algorithms that filter and shape the news and information they encounter daily? What
    concerns do they have about how automated decision-making systems may influence us, divide us, and deepen inequalities?
  3. What must higher education do to prepare students to understand the new media landscape so they will be able to participate in sharing and creating information responsibly in a changing and challenged world?
    To investigate these questions, we draw on qualitative data that PIL researchers collected from student focus groups and faculty interviews during fall 2019 at eight U.S. colleges and universities. Findings from a sample of 103 students and 37 professors reveal levels of awareness and concerns about the age of algorithms on college campuses. They are presented as research takeaways….(More)”.

The Future State CIO: How the Role will Drive Innovation


Report by Accenture/NASCIO: “…exploring the future role of the state CIO and how the state CIO will drive innovation.

The research included interviews and a survey of state CIOs to understand the role of state CIOs in promoting innovation in government.

  • The study explored how state IT organizations build the capacity to innovate and which best practices help in doing so.
  • We also examined how state CIOs embrace new and emerging technologies to create the best government outcomes.
  • Our report illuminates compelling opportunities, persistent obstacles, strategies for accelerating innovation and inspiring real-world case studies.
  • The report presents a set of practical recommendations for driving innovation…(More)”.

The wisdom of crowds: What smart cities can learn from a dead ox and live fish


Portland State University: “In 1906, Francis Galton was at a country fair where attendees had the opportunity to guess the weight of a dead ox. Galton took the guesses of 787 fair-goers and found that the average guess was only one pound off of the correct weight — even when individual guesses were off base.

This concept, known as “the wisdom of crowds” or “collective intelligence,” has been applied to many situations over the past century, from people estimating the number of jellybeans in a jar to predicting the winners of major sporting events — often with high rates of success. Whatever the problem, the average answer of the crowd seems to be an accurate solution.

But does this also apply to knowledge about systems, such as ecosystems, health care, or cities? Do we always need in-depth scientific inquiries to describe and manage them — or could we leverage crowds?

This question has fascinated Antonie J. Jetter, associate professor of Engineering and Technology Management for many years. Now, there’s an answer. A recent study, which was co-authored by Jetter and published in Nature Sustainability, shows that diverse crowds of local natural resource stakeholders can collectively produce complex environmental models very similar to those of trained experts.

For this study, about 250 anglers, water guards and board members of German fishing clubs were asked to draw connections showing how ecological relationships influence the pike stock from the perspective of the anglers and how factors like nutrients and fishing pressures help determine the number of pike in a freshwater lake ecosystem. The individuals’ drawings — or their so-called mental models — were then mathematically combined into a collective model representing their averaged understanding of the ecosystem and compared with the best scientific knowledge on the same subject.

The result is astonishing. If you combine the ideas from many individual anglers by averaging their mental models, the final outcomes correspond more or less exactly to the scientific knowledge of pike ecology — local knowledge of stakeholders produces results that are in no way inferior to lengthy and expensive scientific studies….(More)”.

The Wild Wild West of Data Hoarding in the Federal Government


ActiveNavigation: “There is a strong belief, both in the public and private sector, that the worst thing you can do with a piece of data is to delete it. The government stores all sorts of data, from traffic logs to home ownership statistics. Data is obviously incredibly important to the Federal Government – but storing large amounts of it poses significant compliance and security risks – especially with the rise of Nation State hackers. As the risk of being breached continues to rise, why is the government not tackling their data storage problem head on?

The Myth of “Free” Storage

Storage is cheap, especially compared to 10-15 years ago. Cloud storage has made it easier than ever to store swaths of information, creating what some call “digital landfills.” However, the true cost of storage isn’t in the ones and zeros sitting on the server somewhere. It’s the business cost.

As information stores continue to grow, the Federal Government’s ability to execute moving information to the correct place gets harder and harder, not to mention more expensive. The U.S. Government has a duty to provide accurate, up-to-date information to its taxpayers – meaning that sharing “bad data” is not an option.

The Association of Information and Image Management (AIIM) reports that half of an organization’s retained data has no value. So far, in 2019, through our work with Federal Agencies, we have discovered that this number, is in fact, low. Over 66% of data we’ve indexed, by the client’s definition, has fallen into that “junk” category. Eliminating junk data paves the way for greater accessibility, transparency and major financial savings. But what is “junk” data?

Redundant, Obsolete and Trivial (ROT) Data

Data is important – but if you can’t assign a value to it, it can become impossible to manage. Simply put, ROT data is digital information that an organization retains, that has no business or legal value. To be efficient from both a cyber hygiene and business perspective, the government needs to get better at purging their ROT data.

Again, purging data doesn’t just help with the hard cost of storage and backups, etc. For example, think about what needs to be done to answer a Freedom of Information Act (FOIA) request. You have a petabyte of data. You have at least a billion documents you need to funnel through to be able to respond to that FOIA request. By eliminating 50% of your ROT data, you probably have also reduced your FOIA response time by 50%.

Records and information governance, taken at face value, might seem fairly esoteric. It may not be as fun or as sexy as the new Space Force, but the reality is, the only way to know if the government is doing what it says it’s through records and information. You can’t answer an FOIA request if there’s no material. You can’t answer Congress if the material isn’t accurate. Being able to access timely, accurate information is critical. That’s why NARA is advocating a move to electronic records.…(More)”.

Paging Dr. Google: How the Tech Giant Is Laying Claim to Health Data


Wall Street Journal: “Roughly a year ago, Google offered health-data company Cerner Corp.an unusually rich proposal.

Cerner was interviewing Silicon Valley giants to pick a storage provider for 250 million health records, one of the largest collections of U.S. patient data. Google dispatched former chief executive Eric Schmidt to personally pitch Cerner over several phone calls and offered around $250 million in discounts and incentives, people familiar with the matter say. 

Google had a bigger goal in pushing for the deal than dollars and cents: a way to expand its effort to collect, analyze and aggregate health data on millions of Americans. Google representatives were vague in answering questions about how Cerner’s data would be used, making the health-care company’s executives wary, the people say. Eventually, Cerner struck a storage deal with Amazon.com Inc. instead.

The failed Cerner deal reveals an emerging challenge to Google’s move into health care: gaining the trust of health care partners and the public. So far, that has hardly slowed the search giant.

Google has struck partnerships with some of the country’s largest hospital systems and most-renowned health-care providers, many of them vast in scope and few of their details previously reported. In just a few years, the company has achieved the ability to view or analyze tens of millions of patient health records in at least three-quarters of U.S. states, according to a Wall Street Journal analysis of contractual agreements. 

In certain instances, the deals allow Google to access personally identifiable health information without the knowledge of patients or doctors. The company can review complete health records, including names, dates of birth, medications and other ailments, according to people familiar with the deals.

The prospect of tech giants’ amassing huge troves of health records has raised concerns among lawmakers, patients and doctors, who fear such intimate data could be used without individuals’ knowledge or permission, or in ways they might not anticipate. 

Google is developing a search tool, similar to its flagship search engine, in which patient information is stored, collated and analyzed by the company’s engineers, on its own servers. The portal is designed for use by doctors and nurses, and eventually perhaps patients themselves, though some Google staffers would have access sooner. 

Google executives and some health systems say that detailed data sharing has the potential to improve health outcomes. Large troves of data help fuel algorithms Google is creating to detect lung cancer, eye disease and kidney injuries. Hospital executives have long sought better electronic record systems to reduce error rates and cut down on paperwork….

Legally, the information gathered by Google can be used for purposes beyond diagnosing illnesses, under laws enacted during the dial-up era. U.S. federal privacy laws make it possible for health-care providers, with little or no input from patients, to share data with certain outside companies. That applies to partners, like Google, with significant presences outside health care. The company says its intentions in health are unconnected with its advertising business, which depends largely on data it has collected on users of its many services, including email and maps.

Medical information is perhaps the last bounty of personal data yet to be scooped up by technology companies. The health data-gathering efforts of other tech giants such as Amazon and International Business Machines Corp. face skepticism from physician and patient advocates. But Google’s push in particular has set off alarm bells in the industry, including over privacy concerns. U.S. senators, as well as health-industry executives, are questioning Google’s expansion and its potential for commercializing personal data….(More)”.

On Digital Disinformation and Democratic Myths


 David Karpf at MediaWell: “…How many votes did Cambridge Analytica affect in the 2016 presidential election? How much of a difference did the company actually make?

Cambridge Analytica has become something of a Rorschach test among those who pay attention to digital disinformation and microtargeted propaganda. Some hail the company as a digital Svengali, harnessing the power of big data to reshape the behavior of the American electorate. Others suggest the company was peddling digital snake oil, with outlandish marketing claims that bore little resemblance to their mundane product.

One thing is certain: the company has become a household name, practically synonymous with disinformation and digital propaganda in the aftermath of the 2016 election. It has claimed credit for the surprising success of the Brexit referendum and for the Trump digital strategy. Journalists such as Carole Cadwalladr and Hannes Grasseger and Mikael Krogerus have published longform articles that dive into the “psychographic” breakthroughs that the company claims to have made. Cadwalladr also exposed the links between the company and a network of influential conservative donors and political operatives. Whistleblower Chris Wylie, who worked for a time as the company’s head of research, further detailed how it obtained a massive trove of Facebook data on tens of millions of American citizens, in violation of Facebook’s terms of service. The Cambridge Analytica scandal has been a driving force in the current “techlash,” and has been the topic of congressional hearings, documentaries, mass-market books, and scholarly articles.

The reasons for concern are numerous. The company’s own marketing materials boasted about radical breakthroughs in psychographic targeting—developing psychological profiles of every US voter so that political campaigns could tailor messages to exploit psychological vulnerabilities. Those marketing claims were paired with disturbing revelations about the company violating Facebook’s terms of service to scrape tens of millions of user profiles, which were then compiled into a broader database of US voters. Cambridge Analytica behaved unethically. It either broke a lot of laws or demonstrated that old laws needed updating. When the company shut down, no one seemed to shed a tear.

But what is less clear is just how different Cambridge Analytica’s product actually was from the type of microtargeted digital advertisements that every other US electoral campaign uses. Many of the most prominent researchers warning the public about how Cambridge Analytica uses our digital exhaust to “hack our brains” are marketing professors, more accustomed to studying the impact of advertising in commerce than in elections. The political science research community has been far more skeptical. An investigation from Nature magazine documented that the evidence of Cambridge Analytica’s independent impact on voter behavior is basically nonexistent (Gibney 2018). There is no evidence that psychographic targeting actually works at the scale of the American electorate, and there is also no evidence that Cambridge Analytica in fact deployed psychographic models while working for the Trump campaign. The company clearly broke Facebook’s terms of service in acquiring its massive Facebook dataset. But it is not clear that the massive dataset made much of a difference.

At issue in the Cambridge Analytica case are two baseline assumptions about political persuasion in elections. First, what should be our point of comparison for digital propaganda in elections? Second, how does political persuasion in elections compare to persuasion in commercial arenas and marketing in general?…(More)”.

Copy, Paste, Legislate


The Center for Public Integrity: “Do you know if a bill introduced in your statehouse — it might govern who can fix your shattered iPhone screen or whether you can still sue a pedophile priest years later — was actually written by your elected lawmakers? Use this new tool to find out.

Spoiler alert The answer may well be no.

Thousands of pieces of “model legislation” are drafted each year by business organizations and special interest groups and distributed to state lawmakers for introduction. These copycat bills influence policymaking across the nation, state by state, often with little scrutiny. This news application was developed by the Center for Public Integrity, part of a year-long collaboration with USA TODAY and the Arizona Republic to bring the practice into the light….(More)”.

Meaningful Inefficiencies: Civic Design in an Age of Digital Expediency


Book by Eric Gordon and Gabriel Mugar: “Public trust in the institutions that mediate civic life-from governing bodies to newsrooms-is low. In facing this challenge, many organizations assume that ensuring greater efficiency will build trust. As a result, these organizations are quick to adopt new technologies to enhance what they do, whether it’s a new app or dashboard. However, efficiency, or charting a path to a goal with the least amount of friction, is not itself always built on a foundation of trust.

Meaningful Inefficiencies is about the practices undertaken by civic designers that challenge the normative applications of “smart technologies” in order to build or repair trust with publics. Based on over sixty interviews with change makers in public serving organizations throughout the United States, as well as detailed case studies, this book provides a practical and deeply philosophical picture of civic life in transition. The designers in this book are not professional designers, but practitioners embedded within organizations who have adopted an approach to public engagement Eric Gordon and Gabriel Mugar call “meaningful inefficiencies,” or the deliberate design of less efficient over more efficient means of achieving some ends. This book illustrates how civic designers are creating meaningful inefficiencies within public serving organizations. It also encourages a rethinking of how innovation within these organizations is understood, applied, and sought after. Different than market innovation, civic innovation is not just about invention and novelty; it is concerned with building communities around novelty, and cultivating deep and persistent trust.

At its core, Meaningful Inefficiencies underlines that good civic innovation will never just involve one single public good, but must instead negotiate a plurality of publics. In doing so, it creates the conditions for those publics to play, resulting in people truly caring for the world. Meaningful Inefficiencies thus presents an emergent and vitally needed approach to creating civic life at a moment when smart and efficient are the dominant forces in social and organizational change….(More)”.

What is My Data Worth?


Ruoxi Jia at Berkeley artificial intelligence research: “People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that people should be paid for their contributions—but the million-dollar question is: by how much?

This article discusses methods proposed in our recent AISTATS and VLDB papers that attempt to answer this question in the machine learning context. This is joint work with David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Nick Hynes, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song, as well as a collaborative effort between UC Berkeley, ETH Zurich, and UIUC. More information about the work in our group can be found here.

What are the existing approaches to data valuation?

Various ad-hoc data valuation schemes have been studied in the literature and some of them have been deployed in the existing data marketplaces. From a practitioner’s point of view, they can be grouped into three categories:

  • Query-based pricing attaches values to user-initiated queries. One simple example is to set the price based on the number of queries allowed during a time window. Other more sophisticated examples attempt to adjust the price to some specific criteria, such as arbitrage avoidance.
  • Data attribute-based pricing constructs a price model that takes into account various parameters, such as data age, credibility, potential benefits, etc. The model is trained to match market prices released in public registries.
  • Auction-based pricing designs auctions that dynamically set the price based on bids offered by buyers and sellers.

However, existing data valuation schemes do not take into account the following important desiderata:

  • Task-specificness: The value of data depends on the task it helps to fulfill. For instance, if Alice’s medical record indicates that she has disease A, then her data will be more useful to predict disease A as opposed to other diseases.
  • Fairness: The quality of data from different sources varies dramatically. In the worst-case scenario, adversarial data sources may even degrade model performance via data poisoning attacks. Hence, the data value should reflect the efficacy of data by assigning high values to data which can notably improve the model’s performance.
  • Efficiency: Practical machine learning tasks may involve thousands or billions of data contributors; thus, data valuation techniques should be capable of scaling up.

With the desiderata above, we now discuss a principled notion of data value and computationally efficient algorithms for data valuation….(More)”.

Dollars for Profs: How to Investigate Professors’ Conflicts of Interest


ProPublica: “When professors moonlight, the income may influence their research and policy views. Although most universities track this outside work, the records have rarely been accessible to the public, potentially obscuring conflicts of interests.

That changed last month when ProPublica launched Dollars for Profs, an interactive database that, for the first time ever, allows you to look up more than 37,000 faculty and staff disclosures from about 20 public universities and the National Institutes of Health.

We believe there are hundreds of stories in this database, and we hope to tell as many as possible. Already, we’ve revealed how the University of California’s weak monitoring of conflicts has allowed faculty members to underreport their outside income, potentially depriving the university of millions of dollars. In addition, using a database of NIH records, we found that health researchers have acknowledged a total of at least $188 million in financial conflicts of interest since 2012.

We hope journalists all over the country will look into the database and find more. Here are tips for local education reporters, college newspaper journalists and anyone else who wants to hold academia accountable on how to dig into the disclosures….(More)”.