Can You Really Spot Cancer Through a Search Engine?


Michael Reilly at MIT Technology Review: “In the world of cancer treatment, early diagnosis can mean the difference between being cured and being handed a death sentence. At the very least, catching a tumor early increases a patient’s chances of living longer.

Researchers at Microsoft think they may know of a tool that could help detect cancers before you even think to go to a doctor: your search engine.

In a study published Tuesday in the Journal of Oncology Practice, the Microsoft team showed that it was able to mine the anonymized search queries of 6.4 million Bing users to find searches that indicated someone had been diagnosed with pancreatic cancer (such as “why did I get cancer in pancreas,” and “I was told I have pancreatic cancer what to expect”). Then, looking at people’s search patterns before their diagnosis, they identified patterns of search that indicated they had been experiencing symptoms before they ever sought medical treatment.

Pancreatic cancer is a particularly deadly form of the disease. It’s the fourth-leading cause of cancer death in the U.S., and three-quarters of people diagnosed with it die within a year. But catching it early still improves the odds of living longer.

By looking for searches for symptoms—which include yellowing, itchy skin, and abdominal pain—and checking the user’s search history for signs of other risk factors like alcoholism and obesity, the team was often able to identify searches for symptoms up to five months before they were diagnosed.

In their paper, the team acknowledged the limitations of the work, saying that it is not meant to provide people with a diagnosis. Instead they suggested that it might one day be turned into a tool that warns users whose searches indicate they may have symptoms of cancer.

“The goal is not to perform the diagnosis,” said Ryen White, one of the researchers, on a post on Microsoft’s blog. “The goal is to help those at highest risk to engage with medical professionals who can actually make the true diagnosis.”…(More)”

Digital Keywords: A Vocabulary of Information Society and Culture


Book edited by Benjamin Peters: “In the age of search, keywords increasingly organize research, teaching, and even thought itself. Inspired by Raymond Williams’s 1976 classic Keywords, the timely collection Digital Keywords gathers pointed, provocative short essays on more than two dozen keywords by leading and rising digital media scholars from the areas of anthropology, digital humanities, history, political science, philosophy, religious studies, rhetoric, science and technology studies, and sociology. Digital Keywords examines and critiques the rich lexicon animating the emerging field of digital studies.

This collection broadens our understanding of how we talk about the modern world, particularly of the vocabulary at work in information technologies. Contributors scrutinize each keyword independently: for example, the recent pairing of digital and analog is separated, while classic terms such as community, culture, event, memory, and democracy are treated in light of their historical and intellectual importance. Metaphors of the cloud in cloud computing and the mirror in data mirroring combine with recent and radical uses of terms such as information, sharing, gaming, algorithm, and internet to reveal previously hidden insights into contemporary life. Bookended by a critical introduction and a list of over two hundred other digital keywords, these essays provide concise, compelling arguments about our current mediated condition.

Digital Keywords delves into what language does in today’s information revolution and why it matters…(More)”.

Why Didn’t E-Gov Live Up To Its Promise?


Excerpt from the report Delivering on Digital: The Innovators and Technologies that are Transforming Government” by William Eggers: “Digital is becoming the new normal. Digital technologies have quietly and quickly pervaded every facet of our daily lives, transforming how we eat, shop, work, play and think.

An aging population, millennials assuming managerial positions, budget shortfalls and ballooning entitlement spending all will significantly impact the way government delivers services in the coming decade, but no single factor will alter citizens’ experience of government more than the pure power of digital technologies.

Ultimately, digital transformation means reimagining virtually every facet of what government does, from headquarters to the field, from health and human services to transportation and defense.

By now, some of you readers with long memories can’t be blamed for feeling a sense of déjà vu.

After all, technology was supposed to transform government 15 years ago; an “era of electronic government” was poised to make government faster, smaller, digitized and increasingly transparent.

Many analysts (including yours truly, in a book called “Government 2.0”) predicted that by 2016, digital government would already long be a reality. In practice, the “e-gov revolution” has been an exceedingly slow-moving one. Sure, technology has improved some processes, and scores of public services have moved online, but the public sector has hardly been transformed.

What initial e-gov efforts managed was to construct pretty storefronts—in the form of websites—as the entrance to government systems stubbornly built for the industrial age. Few fundamental changes altered the structures, systems and processes of government behind those websites.

With such halfhearted implementation, the promise of cost savings from information technology failed to materialize, instead disappearing into the black hole of individual agency and division budgets. Government websites mirrored departments’ short-term orientation rather than citizens’ long-term needs. In short, government became wired—but not transformed.

So why did the reality of e-gov fail to live up to the promise?

For one thing, we weren’t yet living in a digitized economy—our homes, cars and workplaces were still mostly analog—and the technology wasn’t as far along as we thought; without the innovations of cloud computing and open-source software, for instance, the process of upgrading giant, decades-old legacy systems proved costly, time-consuming and incredibly complex.

And not surprisingly, most governments—and private firms, for that matter—lacked deep expertise in managing digital services. What we now call “agile development”—an iterative development model that allows for constant evolution through recurrent testing and evaluation—was not yet mainstreamed.

Finally, most governments explicitly decided to focus first on the Hollywood storefront and postpone the bigger and tougher issues of reengineering underlying processes and systems. When budgets nosedived—even before the recession—staying solvent and providing basic services took precedence over digital transformation.

The result: Agencies automated some processes but failed to transform them; services were put online, but rarely were they focused logically and intelligently around the citizen.

Given this history, it’s natural to be skeptical after years of hype about government’s amazing digital future. But conditions on the ground (and in the cloud) are finally in place for change, and citizens are not only ready for digital government—many are demanding it.

Digital-native millennials are now consumers of public services, and millions of them work in and around government; they won’t tolerate balky and poorly designed systems, and they’ll let the world know through social media. Gen Xers and baby boomers, too, have become far more savvy consumers of digital products and services….(More)”

Soon Your City Will Know Everything About You


Currently, the biggest users of these sensor arrays are in cities, where city governments use them to collect large amounts of policy-relevant data. In Los Angeles, the crowdsourced traffic and navigation app Waze collects data that helps residents navigate the city’s choked highway networks. In Chicago, an ambitious program makes public data available to startups eager to build apps for residents. The city’s 49th ward has been experimenting with participatory budgeting and online votingto take the pulse of the community on policy issues. Chicago has also been developing the “Array of Things,” a network of sensors that track, among other things, the urban conditions that affect bronchitis.

Edmonton uses the cloud to track the condition of playground equipment. And a growing number of countries have purpose-built smart cities, like South Korea’s high tech utopia city of Songdo, where pervasive sensor networks and ubiquitous computing generate immense amounts of civic data for public services.

The drive for smart cities isn’t restricted to the developed world. Rio de Janeiro coordinates the information flows of 30 different city agencies. In Beijing and Da Nang (Vietnam), mobile phone data is actively tracked in the name of real-time traffic management. Urban sensor networks, in other words, are also developing in countries with few legal protections governing the usage of data.

These services are promising and useful. But you don’t have to look far to see why the Internet of Things has serious privacy implications. Public data is used for “predictive policing” in at least 75 cities across the U.S., including New York City, where critics maintain that using social media or traffic data to help officers evaluate probable cause is a form of digital stop-and-frisk. In Los Angeles, the security firm Palantir scoops up publicly generated data on car movements, merges it with license plate information collected by the city’s traffic cameras, and sells analytics back to the city so that police officers can decide whether or not to search a car. In Chicago, concern is growing about discriminatory profiling because so much information is collected and managed by the police department — an agency with a poor reputation for handling data in consistent and sensitive ways. In 2015, video surveillance of the police shooting Laquan McDonald outside a Burger King was erased by a police employee who ironically did not know his activities were being digitally recorded by cameras inside the restaurant.

Since most national governments have bungled privacy policy, cities — which have a reputation for being better with administrative innovations — will need to fill this gap. A few countries, such as Canada and the U.K., have independent “privacy commissioners” who are responsible for advocating for the public when bureaucracies must decide how to use or give out data. It is pretty clear that cities need such advocates too.

What would Urban Privacy Commissioners do? They would teach the public — and other government staff — about how policy algorithms work. They would evaluate the political context in which city agencies make big data investments. They would help a city negotiate contracts that protect residents’ privacy while providing effective analysis to policy makers and ensuring that open data is consistently serving the public good….(more)”.

While governments talk about smart cities, it’s citizens who create them


Carlo Ratti at the Conversation: “The Australian government recently released an ambitious Smart Cities Plan, which suggests that cities should be first and foremost for people:

If our cities are to continue to meet their residents’ needs, it is essential for people to engage and participate in planning and policy decisions that have an impact on their lives.

Such statements are a good starting point – and should probably become central to Australia’s implementation efforts. A lot of knowledge has been collected over the past decade from successful and failed smart cities experiments all over the world; reflecting on them could provide useful information for the Australian government as it launches its national plan.

What is a smart city?

But, before embarking on such review, it would help to start from a definition of “smart city”.

The term has been used and abused in recent years, so much so that today it has lost meaning. It is often used to encompass disparate applications: we hear people talk and write about “smart city” when they refer to anything from citizen engagement to Zipcar, from open data to Airbnb, from smart biking to broadband.

Where to start with a definition? It is a truism to say the internet has transformed our lives over the past 20 years. Everything in the way we work, meet, mate and so on is very different today than it was just a few decades ago, thanks to a network of connectivity that now encompasses most people on the planet.

In a similar way, we are today at the beginning of a new technological revolution: the internet is entering physical space – the very space of our cities – and is becoming the Internet of Things; it is opening the door to a new world of applications that, as with the first wave of the internet, can incorporate many domains….

What should governments do?

In the above technological context, what should governments do? Over the past few years, the first wave of smart city applications followed technological excitement.

For instance, some of Korea’s early experiments such as Songdo City were engineered by the likes of Cisco, with technology deployment assisted by top-down policy directives.

In a similar way, in 2010, Rio de Janeiro launched the Integrated Centre of Command and Control, engineered by IBM. It’s a large control room for the city, which collects real-time information from cameras and myriad sensors suffused in the urban fabric.

Such approaches revealed many shortcomings, most notably the lack of civic engagement. It is as if they thought of the city simply as a “computer in open air”. These approaches led to several backlashes in the research and academic community.

A more interesting lesson can come from the US, where the focus is more on developing a rich Internet of Things innovation ecosystem. There are many initiatives fostering spaces – digital and physical – for people to come together and collaborate on urban and civic innovations….

That isn’t to say that governments should take a completely hands-off approach to urban development. Governments certainly have an important role to play. This includes supporting academic research and promoting applications in fields that might be less appealing to venture capital – unglamorous but nonetheless crucial domains such as municipal waste or water services.

The public sector can also promote the use of open platforms and standards in such projects, which would speed up adoption in cities worldwide.

Still, the overarching goal should always be to focus on citizens. They are in the best position to determine how to transform their cities and to make decisions that will have – as the Australian Smart Cities Plan puts it – “an impact on their lives”….(more)”

Private Data and the Public Good


Gideon Mann‘s remarks on the occasion of the Robert Khan distinguished lecture at The City College of New York on 5/22/16: and opportunities about a specific aspect of this relationship, the broader need for computer science to engage with the real world. Right now, a key aspect of this relationship is being built around the risks and opportunities of the emerging role of data.

Ultimately, I believe that these relationships, between computer science andthe real world, between data science and real problems, hold the promise tovastly increase our public welfare. And today, we, the people in this room,have a unique opportunity to debate and define a more moral dataeconomy….

The hybrid research model proposes something different. The hybrid research model, embeds, as it were, researchers as practitioners.The thought was always that you would be going about your regular run of business,would face a need to innovate to solve a crucial problem, and would do something novel. At that point, you might choose to work some extra time and publish a paper explaining your innovation. In practice, this model rarely works as expected. Tight deadlines mean the innovation that people do in their normal progress of business is incremental..

This model separated research from scientific publication, and shortens thetime-window of research, to what can be realized in a few year time zone.For me, this always felt like a tremendous loss, with respect to the older so-called “ivory tower” research model. It didn’t seem at all clear how this kindof model would produce the sea change of thought engendered byShannon’s work, nor did it seem that Claude Shannon would ever want towork there. This kind of environment would never support the freestanding wonder, like the robot mouse that Shannon worked on. Moreover, I always believed that crucial to research is publication and participation in the scientific community. Without this engagement, it feels like something different — innovation perhaps.

It is clear that the monopolistic environment that enabled AT&T to support this ivory tower research doesn’t exist anymore. .

Now, the hybrid research model was one model of research at Google, butthere is another model as well, the moonshot model as exemplified byGoogle X. Google X brought together focused research teams to driveresearch and development around a particular project — Google Glass and the Self-driving car being two notable examples. Here the focus isn’t research, but building a new product, with research as potentially a crucial blocking issue. Since the goal of Google X is directly to develop a new product, by definition they don’t publish papers along the way, but they’re not as tied to short-term deliverables as the rest of Google is. However, they are again decidedly un-Bell-Labs like — a secretive, tightly focused, non-publishing group. DeepMind is a similarly constituted initiative — working, for example, on a best-in-the-world Go playing algorithm, with publications happening sparingly.

Unfortunately, both of these approaches, the hybrid research model and the moonshot model stack the deck towards a particular kind of research — research that leads to relatively short term products that generate corporate revenue. While this kind of research is good for society, it isn’t the only kind of research that we need. We urgently need research that is longterm, and that is undergone even without a clear financial local impact. Insome sense this is a “tragedy of the commons”, where a shared public good (the commons) is not supported because everyone can benefit from itwithout giving back. Academic research is thus a non-rival, non-excludible good, and thus reasonably will be underfunded. In certain cases, this takes on an ethical dimension — particularly in health care, where the choice ofwhat diseases to study and address has a tremendous potential to affect human life. Should we research heart disease or malaria? This decisionmakes a huge impact on global human health, but is vastly informed by the potential profit from each of these various medicines….

Private Data means research is out of reach

The larger point that I want to make, is that in the absence of places where long-term research can be done in industry, academia has a tremendous potential opportunity. Unfortunately, it is actually quite difficult to do the work that needs to be done in academia, since many of the resources needed to push the state of the art are only found in industry: in particular data.

Of course, academia also lacks machine resources, but this is a simpler problem to fix — it’s a matter of money, resources form the government could go to enabling research groups building their own data centers or acquiring the computational resources from the market, e.g. Amazon. This is aided by the compute philanthropy that Google and Microsoft practice that grant compute cycles to academic organizations.

But the data problem is much harder to address. The data being collected and generated at private companies could enable amazing discoveries and research, but is impossible for academics to access. The lack of access to private data from companies actually is much more significant effects than inhibiting research. In particular, the consumer level data, collected by social networks and internet companies could do much more than ad targeting.

Just for public health — suicide prevention, addiction counseling, mental health monitoring — there is enormous potential in the use of our online behavior to aid the most needy, and academia and non-profits are set-up to enable this work, while companies are not.

To give a one examples, anorexia and eating disorders are vicious killers. 20 million women and 10 million men suffer from a clinically significant eating disorder at some time in their life, and sufferers of eating disorders have the highest mortality rate of any other mental health disorder — with a jaw-dropping estimated mortality rate of 10%, both directly from injuries sustained by the disorder and by suicide resulting from the disorder.

Eating disorders are particular in that sufferers often seek out confirmatory information, blogs, images and pictures that glorify and validate what sufferers see as “lifestyle” choices. Browsing behavior that seeks out images and guidance on how to starve yourself is a key indicator that someone is suffering. Tumblr, pinterest, instagram are places that people host and seek out this information. Tumblr has tried to help address this severe mental health issue by banning blogs that advocate for self-harm and by adding PSA announcements to query term searches for queries for or related to anorexia. But clearly — this is not the be all and end all of work that could be done to detect and assist people at risk of dying from eating disorders. Moreover, this data could also help understand the nature of those disorders themselves…..

There is probably a role for a data ombudsman within private organizations — someone to protect the interests of the public’s data inside of an organization. Like a ‘public editor’ in a newspaper according to how you’ve set it up. There to protect and articulate the interests of the public, which means probably both sides — making sure a company’s data is used for public good where appropriate, and making sure the ‘right’ to privacy of the public is appropriately safeguarded (and probably making sure the public is informed when their data is compromised).

Next, we need a platform to make collaboration around social good between companies and between companies and academics. This platform would enable trusted users to have access to a wide variety of data, and speed process of research.

Finally, I wonder if there is a way that government could support research sabbaticals inside of companies. Clearly, the opportunities for this research far outstrip what is currently being done…(more)”

Foundation Transparency: Game Over?


Brad Smith at Glass Pockets (Foundation Center): “The tranquil world of America’s foundations is about to be shaken, but if you read the Center for Effective Philanthropy’s (CEP) new study — Sharing What Matters, Foundation Transparency — you would never know it.

Don’t get me wrong. That study, like everything CEP produces, is carefully researched, insightful and thoroughly professional. But it misses the single biggest change in foundation transparency in decades: the imminent release by the Internal Revenue Service of foundation 990-PF (and 990) tax returns as machine-readable open data.

Clara Miller, President of the Heron Foundation, writes eloquently in her manifesto, Building a Foundation for the 21St Century: “…the private foundation model was designed to be protective and separate, much like a terrarium.”

Terrariums, of course, are highly “curated” environments over which their creators have complete control. The CEP study, proves that point, to the extent that much of the study consists of interviews with foundation leaders and reviews of their websites as if transparency were a kind of optional endeavor in which foundations may choose to participate, if at all, and to what degree.

To be fair, CEP also interviewed the grantees of various foundations (sometimes referred to as “partners”), which helps convey the reality that foundations have stakeholders beyond their four walls. However, the terrarium metaphor is about to become far more relevant as the release of 990 tax returns as open data will literally make it possible for anyone to look right through those glass walls to the curated foundation world within.

What Is Open Data?

It is safe to say that most foundation leaders and a fair majority of their staff do not understand what open data really is. Open data is free, yes, but more importantly it is digital and machine-readable. This means it can be consumed in enormous volumes at lightning speed, directly by computers.

Once consumed, open data can be tagged, sorted, indexed and searched using statistical methods to make obvious comparisons while discovering previously undetected correlations. Anyone with a computer, some coding skills and a hard drive or cloud storage can access open data. In today’s world, a lot of people meet those requirements, and they are free to do whatever they please with your information once it is, as open data enthusiasts like to say, “in the wild.”

What is the Internal Revenue Service Releasing?

Thanks to the Aspen Institute’s leadership of a joint effort – funded by foundations and including Foundation Center, GuideStar, the National Center for Charitable Statistics, the Johns Hopkins Center for Civil Society Studies, and others – the IRS has started to make some 1,000,000 Form 990s and 40,000 Form 990PF available as machine-readable open data.

Previously, all Form 990s had been released as image (TIFF) files, essentially a picture, making it both time-consuming and expensive to extract useful data from them. Credit where credit is due; a kick in the butt in the form of a lawsuit from open data crusader Carl Malamud helped speed the process along.

The current test phase includes only those tax returns that were digitally filed by nonprofits and community foundations (990s) and private foundations (990PFs). Over time, the IRS will phase in a mandatory digital filing requirement for all Form 990s, and the intent is to release them all as open data. In other words, that which is born digital will be opened up to the public in digital form. Because of variations in the 990 forms, getting the information from them into a database will still require some technical expertise, but will be far more feasible and faster than ever before.

The Good

The work of organizations like Foundation Center– who have built expensive infrastructure in order to turn years of 990 tax returns into information that can be used by nonprofits looking for funding, researchers trying to understand the role of foundations and foundations, themselves, seeking to benchmark themselves against peers—will be transformed.

Work will shift away from the mechanics of capturing and processing the data to higher level analysis and visualization to stimulate the generation and sharing of new insights and knowledge. This will fuel greater collaboration between peer organizations, innovation, the merging of previous disparate bodies of data, better philanthropy, and a stronger social sector… (more)

 

How Open Data Is Creating New Opportunities in the Public Sector


Martin Yan at GovTech: Increased availability of open data in turn increases the ease with which citizens and their governments can collaborate, as well as equipping citizens to be active in identifying and addressing issues themselves. Technology developers are able to explore innovative uses of open data in combination with digital tools, new apps or other products that can tackle recognized inefficiencies. Currently, both the public and private sectors are teeming with such apps and projects….

Open data has proven to be a catalyst for the creation of new tools across industries and public-sector uses. Examples of a few successful projects include:

  • Citymapper — The popular real-time public transport app uses open data from Apple, Google, Cyclestreets, OpenStreetMaps and more sources to help citizens navigate cities. Features include A-to-B trip planning with ETA, real-time departures, bike routing, transit maps, public transport line status, real-time disruption alerts and integration with Uber.
  • Dataverse Project — This project from Harvard’s Institute for Quantitative Social Science makes it easy to share, explore and analyze research data. By simplifying access to this data, the project allows researchers to replicate others’ work to the benefit of all.
  • Liveplasma — An interactive search engine, Liveplasma lets users listen to music and view a web-like visualization of similar songs and artists, seeing how they are related and enabling discovery. Content from YouTube is streamed into the data visualizations.
  • Provenance — The England-based online platform lets users trace the origin and history of a product, also providing its manufacturing information. The mission is to encourage transparency in the practices of the corporations that produce the products we all use.

These examples demonstrate open data’s reach, value and impact well beyond the public sector. As open data continues to be put to wider use, the results will not be limited to increased efficiency and reduced wasteful spending in government, but will also create economic growth and jobs due to the products and services using the information as a foundation.

However, in the end, it won’t be the data alone that solves issues. Rather, it will be dependent on individual citizens, developers and organizations to see the possibilities, take up the call to arms and use this available data to introduce changes that make our world better….(More)”

Legal Aid With a Digital Twist


Tina Rosenberg in the New York Times: “Matthew Stubenberg was a law student at the University of Maryland in 2010 when he spent part of a day doing expungements. It was a standard law school clinic where students learn by helping clients — in this case, he helped them to fill out and file petitions to erase parts of their criminal records. (Last week I wrote about the lifelong effects of these records, even if there is no conviction, and the expungement process that makes them go away.)

Although Maryland has a public database called Case Search, using that data to fill out the forms was tedious. “We spent all this time moving data from Case Search onto our forms,” Stubenberg said. “We spent maybe 30 seconds on the legal piece. Why could this not be easier? This was a problem that could be fixed by a computer.”

Stubenberg knew how to code. After law school, he set out to build software that automatically did that tedious work. By September 2014 he had a prototype for MDExpungement, which went live in January 2015. (The website is not pretty — Stubenberg is a programmer, not a designer.)

With MDExpungement, entering a case number brings it up on Case Search. The software then determines whether the case is expungeable. If so, the program automatically transfers the information from Case Search to the expungement form. All that’s left is to print, sign and file it with the court.

In October 2015 a change in Maryland law made more cases eligible for expungement. Between then and March 2016, people filed 7,600 petitions to have their criminal records removed in Baltimore City District Court. More than two-thirds of them came from MDExpungement.

“With the ever-increasing amount of expungements we’re all doing, the app has just made it a lot easier,” said Mary-Denise Davis, a public defender in Baltimore. “I put in a case number and it fills the form out for me. Like magic.”

The rise of online legal forms may not be a gripping subject, but it matters. Tens of millions of Americans need legal help for civil problems — they need a divorce, child support or visitation, protection from abuse or a stay of eviction. They must hold off debt collectors or foreclosure, or get government benefits….(more)

Transparency reports make AI decision-making accountable


Phys.org: “Machine-learning algorithms increasingly make decisions about credit, medical diagnoses, personalized recommendations, advertising and job opportunities, among other things, but exactly how usually remains a mystery. Now, new measurement methods developed by Carnegie Mellon University researchers could provide important insights to this process.

 Was it a person’s age, gender or education level that had the most influence on a decision? Was it a particular combination of factors? CMU’s Quantitative Input Influence (QII) measures can provide the relative weight of each factor in the final decision, said Anupam Datta, associate professor of computer science and electrical and computer engineering.

“Demands for algorithmic transparency are increasing as the use of algorithmic decision-making systems grows and as people realize the potential of these systems to introduce or perpetuate racial or sex discrimination or other social harms,” Datta said.

“Some companies are already beginning to provide transparency reports, but work on the computational foundations for these reports has been limited,” he continued. “Our goal was to develop measures of the degree of influence of each factor considered by a system, which could be used to generate transparency reports.”

These reports might be generated in response to a particular incident—why an individual’s loan application was rejected, or why police targeted an individual for scrutiny or what prompted a particular medical diagnosis or treatment. Or they might be used proactively by an organization to see if an artificial intelligence system is working as desired, or by a regulatory agency to see whether a decision-making system inappropriately discriminated between groups of people….(More)”