Explore our articles
View All Results

Stefaan Verhulst

Keynote by Robert M. Goerge at the 2016 Third International Conference on eDemocracy & eGovernment (ICEDEG) Open data portals are springing up around the world. Municipalities, states and countries have made available data that has never been as accessible to the general public. These data have led to many applications that have informed the public of new urban conditions or provided information to make urban life easier. However, it should be clear that these data have limitations in the effort to solve many urban problems because in may cases they do not provide all of the information that is needed by government and NGOs to get at the cause or at least correlations of the problem at hand. It is still necessary to have access to data that cannot be made public to address some of most serious urban problems. While this seems just to apply to public access, it is also the case that government employees or those with legitimate access to the necessary non-open data lack access because of legal, organizational, privacy, or bureaucratic issues. This limits the promise of increasing data-driven efforts to address the most critical urban issues. Solutions to these problems in the context of ethical behavior will be discussed….(More)”

The promises and pitfalls of open urban data

Christopher Mims at the Wall Street Journal: “When Kaspar Korjus was born, he was given a number before he was given a name, as are all babies in Estonia. “My name is 38712012796, which I got before my name of Kaspar,”says Mr. Korjus.

In Estonia, much of life—voting, digital signatures, prescriptions, taxes, banktransactions—is conducted with this number. The resulting services aren’t just more convenient, they are demonstrably better. It takes an Estonian three minutes to file his or her taxes.

Americans are unlikely to accept a unified national ID system. But Estonia offers an example of the kind of innovation possible around government services, a competitive factor for modern nations.

The former Soviet republic—with a population of 1.3 million, roughly the size of SanDiego—is regularly cited as a world leader in e-governance. At base, e-governance is about making government function as well as private enterprise, mostly by adopting the same information-technology infrastructure and management techniques as the world’s most technologically savvy corporations.

It isn’t that Estonia devotes more people to the problem—it took only 60 to build the identity system. It is that the country’s leaders are willing to empower those engineers.“There is a need for politicians to not only show leadership but also there is a need to take risks,” says Estonia’s prime minister, Taavi Rõivas.

In the U.S., Matt Lira, senior adviser for House Majority Leader Kevin McCarthy, says the gap between the government’s information technology and the private sector’s has grown larger than ever. Americans want to access government services—paying property taxes or renewing a driver’s license—as easily as they look up a restaurant on Yelp or a business on Alphabet’s Google, says Neil Kleiman, a professor of policy at New York University who collaborates with cities in this subject area.

The government is unlikely to catch up soon. The Government Accountability Office last year estimated that about 25% of the federal government’s 738 major IT investments—projected to cost a total of $42 billion—were in danger of significant delays or cost overruns.

One reason for such overruns is the government’s reliance on big, monolithic projects based on proposal documents that can run to hundreds of pages. It is an approach to software development that is at least 20 years out of date. Modern development emphasizes small chunks of code accomplished in sprints and delivered to end users quickly so that problems can be identified and corrected.

Two years ago, the Obama administration devised a novel way to address these issues:assembling a crack team of coders and project managers from the likes of Google,Amazon.com and Microsoft and assigning them to big government boondoggles to help existing IT staff run more like the private sector. Known as 18F, this organization and its sister group, the U.S. Digital Service, are set to hit 500 staffers by the end of 2016….(More)”

Yelp, Google Hold Pointers to Fix Governments

Pasi Sahlberg and Jonathan Hasak in the Washington Post: “One thing that distinguishes schools in the United States from schools around the world is how data walls, which typically reflect standardized test results, decorate hallways and teacher lounges. Green, yellow, and red colors indicate levels of performance of students and classrooms. For serious reformers, this is the type of transparency that reveals more data about schools and is seen as part of the solution to how to conduct effective school improvement. These data sets, however, often don’t spark insight about teaching and learning in classrooms; they are based on analytics and statistics, not on emotions and relationships that drive learning in schools. They also report outputs and outcomes, not the impacts of learning on the lives and minds of learners….

If you are a leader of any modern education system, you probably care a lot about collecting, analyzing, storing, and communicating massive amounts of information about your schools, teachers, and students based on these data sets. This information is “big data,” a term that first appeared around 2000, which refers to data sets that are so large and complex that processing them by conventional data processing applications isn’t possible. Two decades ago, the type of data education management systems processed were input factors of education system, such as student enrollments, teacher characteristics, or education expenditures handled by education department’s statistical officer. Today, however, big data covers a range of indicators about teaching and learning processes, and increasingly reports on student achievement trends over time.

With the outpouring of data, international organizations continue to build regional and global data banks. Whether it’s the United Nations, the World Bank, the European Commission, or the Organization for Economic Cooperation and Development, today’s international reformers are collecting and handling more data about human development than before. Beyond government agencies, there are global education and consulting enterprises like Pearson and McKinsey that see business opportunities in big data markets.

Among the best known today is the OECD’s Program for International Student Assessment (PISA), which measures reading, mathematical, and scientific literacy of 15-year-olds around the world. OECD now also administers an Education GPS, or a global positioning system, that aims to tell policymakers where their education systems place in a global grid and how to move to desired destinations. OECD has clearly become a world leader in the big data movement in education.

Despite all this new information and benefits that come with it, there are clear handicaps in how big data has been used in education reforms. In fact, pundits and policymakers often forget that Big data, at best, only reveals correlations between variables in education, not causality. As any introduction to statistics course will tell you, correlation does not imply causation….
We believe that it is becoming evident that big data alone won’t be able to fix education systems. Decision-makers need to gain a better understanding of what good teaching is and how it leads to better learning in schools. This is where information about details, relationships and narratives in schools become important. These are what Martin Lindstrom calls “small data”: small clues that uncover huge trends. In education, these small clues are often hidden in the invisible fabric of schools. Understanding this fabric must become a priority for improving education.

To be sure, there is not one right way to gather small data in education. Perhaps the most important next step is to realize the limitations of current big data-driven policies and practices. Too strong reliance on externally collected data may be misleading in policy-making. This is an example of what small data look like in practice:

  • It reduces census-based national student assessments to the necessary minimum and transfer saved resources to enhance the quality of formative assessments in schools and teacher education on other alternative assessment methods. Evidence shows that formative and other school-based assessments are much more likely to improve quality of education than conventional standardized tests.
  • It strengthens collective autonomy of schools by giving teachers more independence from bureaucracy and investing in teamwork in schools. This would enhance social capital that is proved to be critical aspects of building trust within education and enhancing student learning.
  • It empowers students by involving them in assessing and reflecting their own learning and then incorporating that information into collective human judgment about teaching and learning (supported by national big data). Because there are different ways students can be smart in schools, no one way of measuring student achievement will reveal success. Students’ voices about their own growth may be those tiny clues that can uncover important trends of improving learning.

Edwards Deming once said that “without data you are another person with an opinion.” But Deming couldn’t have imagined the size and speed of data systems we have today….(More)”

‘Big data’ was supposed to fix education. It didn’t. It’s time for ‘small data’

Sean Captain at FastCompany: “These days GPS technology can get you as close as about 10 feet from your destination, close enough to see it—assuming you can see.

But those last few feet are a chasm for the blind (and GPS accuracy sometimes falls only within about 30 feet).

“Actually finding the bus stop, not the right street, but standing in the right place when the bus comes, is pretty hard,” says Dave Power, president and CEO of the Perkins School for the Blind near Boston. Helen Keller’s alma mater is developing a mobile app that will provide audio directions—contributed by volunteers—so that blind people can get close enough to the stop for the bus driver to notice them.

Perkins’s app is one of 29 projects that recently received a total of $20 million in funding from Google.org’s Google Impact Challenge: Disabilities awards. Several of the winning initiatives rely on crowdsourced information to help the disabled—be they blind, in a wheelchair, or cognitively impaired. It’s a commonsense approach to tackling big logistical projects in a world full of people who have snippets of downtime during which they might perform bite-size acts of kindness online. But moving these projects from being just clever concepts to extensive services, based on the goodwill of volunteers, is going to be quite a hurdle.

People with limited mobility may have trouble traversing the last few feet between them and a wheelchair ramp, automatic doors, or other accommodations that aren’t easy to find (or may not even exist in some places).Wheelmap, based in Berlin, is trying to help by building online maps of accessible locations. Its website incorporates crowdsourced data. The site lets users type in a city and search for accessible amenities such as restaurants, hotels, and public transit.

Paris-based J’accede (which received 500,000 euros from Google, which is the equivalent of about $565,000) provides similar capabilities in both a website and an app, with a slicker design somewhat resembling TripAdvisor.

Both services have a long way to go. J’accede lists 374 accessible bars/restaurants in its hometown and a modest selection in other French cities like Marseille. “We still have a lot of work to do to cover France,” says J’accede’s president Damien Birambeau in an email. The goal is to go global though, and the site is available in English, German, and Spanish, in addition to French. Likewise, Wheelmap (which got 825,000 euros, or $933,000) performs best in the German capital of Berlin and cities like Hamburg, but is less useful in other places.

These sites face the same challenge as many other volunteer-based, crowdsourced projects: getting a big enough crowd to contribute information to the service. J’accede hopes to make the process easier. In June, it will connect itself with Google Places, so contributors will only need to supply details about accommodations at a site; information like the location’s address and phone number will be pulled in automatically. But both J’accede and Wheelmap recognize that crowdsourcing has its limits. They are now going beyond voluntary contributions, setting up automated systems to scrape information from other databases of accessible locations, such as those maintained by governments.

Wheelmap and J’accede are dwarfed by general-interest crowdsourced sites like TripAdvisor and Yelp, which offer some information about accessibility, too. For instance, among the many filters they offer users searching for restaurants—such as price range and cuisine type—TripAdvisor and Yelp both offer a Wheelchair Accessible checkbox. Applying that filter to Parisian establishments brings up about 1,000 restaurants on TripAdvisor and 2,800 in Yelp.

So what can Wheelmap and J’accede provide that the big players can’t? Details. “A person in a wheelchair, for example, will face different obstacles than a partially blind person or a person with cognitive disabilities,” says Birambeau. “These different needs and profiles means that we need highly detailed information about the accessibility of public places.”…(More)”

Can Crowdsourcing Help Make Life Easier For People With Disabilities?

Pradip Sigdyal at CNBC: “…The often cited case of big data discrimination points to a research conducted few years ago by Latanya Sweeny, who heads the Data Privacy Lab at Harvard University.

The case involves Google ad results when searching for certain kinds of names on the internet. In her research, Sweeney found that distinct sounding names often associated with blacks showed up with a disproportionately higher number of arrest record ads compared to white sounding names by roughly 18 percent of the time. Google has since fixed the issue, although they never publicly stated what they did to correct the problem.

The proliferation of big data in the last few years has seen other allegations of improper use and bias. These allegations run the gamut, from online price discrimination and consequences of geographic targeting to the controversial use of crime predicting technology by law enforcement, and lack of sufficient representative[data] sampleused in some public works decisions.

The benefits of big data need to be balanced with the risks associated with applying modern technologies to address societal issues. Yet data advocates believe that democratization of data has in essence givenpower to the people to affect change by transferring ‘tribal knowledge’ from experts to data-savvy practitioners.

Big data is here to stay

According to some advocates, the problem is not so much that ‘big data discriminates’, but that failures by data professionals risk misinterpreting the findings at the heart of data mining and statistical learning. They add that the benefits far outweigh the concerns.

“In my academic research and industry consulting, I have seen tremendous benefits accruing to firms, organizations and consumers alike from the use of data-driven decision-making, data science, and business analytics,” Anindya Ghose, the director of Center for Business Analytics at New York University’s Stern School of Business, said.

“To be perfectly honest, I do not at all understand these big-data cynics who engage in fear mongering about the implications of data analytics,” Ghose said.

“Here is my message to the cynics and those who keep cautioning us: ‘Deal with it, big data analytics is here to stay forever’.”…(More)”

Critics allege big data can be discriminatory, but is it really bias?

Clayton A Davis et al at Peer J. PrePrint:  “The study of social phenomena is becoming increasingly reliant on big data from online social networks. Broad access to social media data, however, requires software development skills that not all researchers possess. Here we present the IUNI Observatory on Social Media, an open analytics platform designed to facilitate computational social science. The system leverages a historical, ongoing collection of over 70 billion public messages from Twitter. We illustrate a number of interactive open-source tools to retrieve, visualize, and analyze derived data from this collection. The Observatory, now available at osome.iuni.iu.edu, is the result of a large, six-year collaborative effort coordinated by the Indiana University Network Science Institute.”…(More)”

OSoMe: The IUNI observatory on social media

César A. Hidalgo at Scientific American: “Imagine shopping in a supermarket where every item is stored in boxes that look exactly the same. Some are filled with cereal, others with apples, and others with shampoo. Shopping would be an absolute nightmare! The design of most open data sites—the (usually government) sites that distribute census, economic and other data to be used and redistributed freely—is not exactly equivalent to this nightmarish supermarket. But it’s pretty close.

During the last decade, such sites—data.gov, data.gov.uk, data.gob.cl,data.gouv.fr, and many others—have been created throughout the world. Most of them, however, still deliver data as sets of links to tables, or links to other sites that are also hard to comprehend. In the best cases, data is delivered through APIs, or application program interfaces, which are simple data query languages that require a user to have a basic knowledge of programming. So understanding what is inside each dataset requires downloading, opening, and exploring the set in ways that are extremely taxing for users. The analogy of the nightmarish supermarket is not that far off.

THE U.S. GOVERNMENT’S OPEN DATA SITE

The consensus among those who have participated in the creation of open data sites is that current efforts have failed and we need new options. Pointing your browser to these sites should show you why. Most open data sites are badly designed, and here I am not talking about their aesthetics—which are also subpar—but about the conceptual model used to organize and deliver data to users. The design of most open data sites follows a throwing-spaghetti-against-the-wall strategy, where opening more data, instead of opening data better, has been the driving force.

Some of the design flaws of current open data sites are pretty obvious. The datasets that are more important, or could potentially be more useful, are not brought into the surface of these sites or are properly organized. In our supermarket analogy, not only all boxes look the same, but also they are sorted in the order they came. This cannot be the best we can do.

There are other design problems that are important, even though they are less obvious. The first one is that most sites deliver data in the way in which it is collected, instead of used. People are often looking for data about a particular place, occupation, industry, or about an indicator (such as income, or population). If the data they need comes from the national survey of X, or the bureau of Y, it is secondary and often—although not always—irrelevant to the user. Yet, even though this is not the way we should be giving data back to users, this is often what open data sites do.

The second non-obvious design problem, which is probably the most important, is that most open data sites bury data in what is known as the deep web. The deep web is the fraction of the Internet that is not accessible to search engines, or that cannot be indexed properly. The surface of the web is made of text, pictures, and video, which search engines know how to index. But search engines are not good at knowing that the number that you are searching for is hidden in row 17,354 of a comma separated file that is inside a zip file linked in a poorly described page of an open data site. In some cases, pressing a radio button and selecting options from a number of dropdown menus can get you the desired number, but this does not help search engines either, because crawlers cannot explore dropdown menus. To make open data really open, we need to make it searchable, and for that we need to bring data to the surface of the web.

So how do we that? The solution may not be simple, but it starts by taking design seriously. This is something that I’ve been doing for more than half a decade when creating data visualization engines at MIT. The latest iteration of our design principles are now embodied in DataUSA, a site we created in a collaboration between Deloitte, Datawheel, and my group at MIT.

So what is design, and how do we use it to improve open data sites? My definition of design is simple. Design is discovering the forms that best fulfill a function….(More)”

What’s Wrong with Open-Data Sites–and How We Can Fix Them

Dara M. WaldJustin Longo and A. R. Dobell at Conservation Biology: “Citizen science initiatives encourage volunteer participants to collect and interpret data and contribute to formal scientific projects. The growth of virtual citizen science (VCS), facilitated through websites and mobile applications since the mid-2000s, has been driven by a combination of software innovations and mobile technologies, growing scientific data flows without commensurate increases in resources to handle them, and the desire of internet-connected participants to contribute to collective outputs. However, the increasing availability of internet-based activities requires individual VCS projects to compete for the attention of volunteers and promote their long-term retention. We examined program and platform design principles that might allow VCS initiatives to compete more effectively for volunteers, increase productivity of project participants, and retain contributors over time. We surveyed key personnel engaged in managing a sample of VCS projects to identify the principles and practices they pursued for these purposes and led a team in a heuristic evaluation of volunteer engagement, website or application usability, and participant retention. We received 40 completed survey responses (33% response rate) and completed a heuristic evaluation of 20 VCS program sites. The majority of the VCS programs focused on scientific outcomes, whereas the educational and social benefits of program participation, variables that are consistently ranked as important for volunteer engagement and retention, were incidental. Evaluators indicated usability, across most of the VCS program sites, was higher and less variable than the ratings for participant engagement and retention. In the context of growing competition for the attention of internet volunteers, increased attention to the motivations of virtual citizen scientists may help VCS programs sustain the necessary engagement and retention of their volunteers….(More)”

Design principles for engaging and retaining virtual citizen scientists

Sarah Telford and Stefaan G. Verhulst at Understanding Risk Forum: “….In creating the policy, OCHA partnered with the NYU Governance Lab (GovLab) and Leiden University to understand the policy and privacy landscape, best practices of partner organizations, and how to assess the data it manages in terms of potential harm to people.

We seek to share our findings with the UR community to get feedback and start a conversation around the risk to using certain types of data in humanitarian and development efforts and when understanding risk.

What is High-Risk Data?

High-risk data is generally understood as data that includes attributes about individuals. This is commonly referred to as PII or personally identifiable information. Data can also create risk when it identifies communities or demographics within a group and ties them to a place (i.e., women of a certain age group in a specific location). The risk comes when this type of data is collected and shared without proper authorization from the individual or the organization acting as the data steward; or when the data is being used for purposes other than what was initially stated during collection.

The potential harms of inappropriately collecting, storing or sharing personal data can affect individuals and communities that may feel exploited or vulnerable as the result of how data is used. This became apparent during the Ebola outbreak of 2014, when a number of data projects were implemented without appropriate risk management measures. One notable example was the collection and use of aggregated call data records (CDRs) to monitor the spread of Ebola, which not only had limited success in controlling the virus, but also compromised the personal information of those in Ebola-affected countries. (See Ebola: A Big Data Disaster).

A Data-Risk Framework

Regardless of an organization’s data requirements, it is useful to think through the potential risks and harms for its collection, storage and use. Together with the Harvard Humanitarian Initiative, we have set up a four-step data risk process that includes doing an assessment and inventory, understanding risks and harms, and taking measures to counter them.

  1. Assessment – The first step is to understand the context within which the data is being generated and shared. The key questions to ask include: What is the anticipated benefit of using the data? Who has access to the data? What constitutes the actionable information for a potential perpetrator? What could set off the threat to the data being used inappropriately?
  1. Data Inventory – The second step is to take inventory of the data and how it is being stored. Key questions include: Where is the data – is it stored locally or hosted by a third party? Where could the data be housed later? Who might gain access to the data in the future? How will we know – is data access being monitored?
  1. Risks and Harms – The next step is to identify potential ways in which risk might materialize. Thinking through various risk-producing scenarios will help prepare staff for incidents. Examples of risks include: your organization’s data being correlated with other data sources to expose individuals; your organization’s raw data being publicly released; and/or your organization’s data system being maliciously breached.
  1. Counter-Measures – The next step is to determine what measures would prevent risk from materializing. Methods and tools include developing data handling policies, implementing access controls to the data, and training staff on how to use data responsibly….(More)
A Framework for Understanding Data Risk

Latest White House report on Big Data charts pathways for fairness and opportunity but also cautions against re-encoding bias and discrimination into algorithmic systems: ” Advertisements tailored to reflect previous purchasing decisions; targeted job postings based on your degree and social networks; reams of data informing predictions around college admissions and financial aid. Need a loan? There’s an app for that.

As technology advances and our economic, social, and civic lives become increasingly digital, we are faced with ethical questions of great consequence. Big data and associated technologies create enormous new opportunities to revisit assumptions and instead make data-driven decisions. Properly harnessed, big data can be a tool for overcoming longstanding bias and rooting out discrimination.

The era of big data is also full of risk. The algorithmic systems that turn data into information are not infallible—they rely on the imperfect inputs, logic, probability, and people who design them. Predictors of success can become barriers to entry; careful marketing can be rooted in stereotype. Without deliberate care, these innovations can easily hardwire discrimination, reinforce bias, and mask opportunity.

Because technological innovation presents both great opportunity and great risk, the White House has released several reports on “big data” intended to prompt conversation and advance these important issues. The topics of previous reports on data analytics included privacy, prices in the marketplace, and consumer protection laws. Today, we are announcing the latest report on big data, one centered on algorithmic systems, opportunity, and civil rights.

The first big data report warned of “the potential of encoding discrimination in automated decisions”—that is, discrimination may “be the inadvertent outcome of the way big data technologies are structured and used.” A commitment to understanding these risks and harnessing technology for good prompted us to specifically examine the intersection between big data and civil rights.

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The purpose of the report is not to offer remedies to the issues it raises, but rather to identify these issues and prompt conversation, research—and action—among technologists, academics, policy makers, and citizens, alike.

The report includes a number of recommendations for advancing work in this nascent field of data and ethics. These include investing in research, broadening and diversifying technical leadership, cross-training, and expanded literacy on data discrimination, bolstering accountability, and creating standards for use within both the government and the private sector. It also calls on computer and data science programs and professionals to promote fairness and opportunity as part of an overall commitment to the responsible and ethical use of data.

Big data is here to stay; the question is how it will be used: to advance civil rights and opportunity, or to undermine them….(More)”

Big Risks, Big Opportunities: the Intersection of Big Data and Civil Rights

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday