Stefaan Verhulst

The Wisdom of Networks – and the Lessons of Wikipedia

Curated on March 23, 2016August 3, 2018 by Stefaan Verhulst

Philip Reitinger at the Analogies Project: “Douglas Merrill said “All of us are smarter than any of us.” This motto of crowdsourcing – looking to the information that can arise from the combined observation by and intelligence of many – is also the prescription for a more secure cyber future. Crowdsourcing security among machines – rather than people – is our best path forward.

Attackers have the advantage online for many reasons, including the ability to leverage a simple error into a significant compromise, to scale attacks more readily than defenses can scale, and to attack at a distance. While the maxim that defenders have to be right all the time, while attackers only have to be right once, is not literally true, it conveys the dilemma of defenders. The connectivity of our devices and agents is inexorably increasing, creating more targets for attack. The complexity of the software we use and the network we must defend is also increasing, making an attack on the individual target or the network easier. And the criticality of our connected systems to our lives is also growing and will continue to grow. Together, this means that we live in a world of steadily increasing risk.

In this environment, the good guys and gals have one significant but counter-intuitive advantage: the size of the network being defended. The soaring prevalence of smart devices is a risk only until it is not, until we combine the abilities of these devices to observe, to induce, and to act to defend the network itself. The cyber ecosystem is the greatest sensor network imaginable, and the data generated by its sensors can drive collective intelligence and collective action to stem threats and isolate infections. The ability of the network components to defend the network may make the future of cybersecurity on the Internet look very much like Wikipedia – one of the best known examples of crowdsourcing – with some obvious failures, but if of importance, generally quickly corrected….

What is necessary to enable the crowdsourcing of defense among network components? A few years ago, while I was at the Department of Homeland Security, it published a paper entitled “Enabling Distributed Security in Cyberspace: Building a Healthy and Resilient Cyber Ecosystem with Automated Collective Action.” This paper posits three requirements:

Automation so the network can act at Internet speed;
Interoperability so the barriers to effective collective (network or “crowd”) action are those we impose by policy, as opposed to those imposed on us by technology or process; and
Authentication to enhance the decision-making and action of the network against attacks.

It has been five years since the paper was published, and I still think these are the key elements of a more secure Internet future. Until we enable the network to defend itself, using its own wisdom of crowds (of agents), offense wins. People should do what people do best, adjust how the network defends itself, and take action when necessary based on intuition, rather than responding to alerts. So when you think about future Internet security problems, think about Stephen Colbert and Wikipedia….(More)”

Wikidata

Curated on March 22, 2016August 3, 2018 by Stefaan Verhulst

Wikidata aims to create a multilingual free knowledge base about the world that can be read and edited by humans and machines alike. It provides data in all the languages of the Wikimedia projects, and allows for the central access to data in a similar vein as Wikimedia Commons does for multimedia files, it is also used by many other websites. The data on Wikidata is added by a community of volunteers both manually and by using software, much like other Wikimedia projects including Wikipedia.

Wikidata has millions of items, each representing a human, a place, a painting, a concept, etc. Each item has statements (key-value pairs), each statement in turn consisting of a property such as “birth date”, and the appropriate value for the item. Likewise, there can be statements for external IDs, such as a VIAF identifier.

Wikidata is hosted by the Wikimedia Foundation, a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual, educational content, and to providing the full content of these wiki-based projects to the public free of chargeWikidata focuses on a basic level of useful information about the world and links to other resources for specialized data on the subject. Sources for data on Wikidata must be:

There are many reasons to add data to Wikidata including:Why add data to Wikidata[edit]

Help more people to see your information[edit]

Data from Wikidata is used by many high traffic websites including Wikipedia which is one of the most used websites in the world receiving over 15 billion page views per month.

Improve open knowledge[edit]

Wikidata hosts data that can be used on Wikimedia projects and beyond. By adding data to Wikidata you can ensure the data on your subject is well covered and up to date in all Wikimedia project languages.

Increase traffic to your website[edit]

Anyone looking at Wikidata or other sites that use Wikidata including Wikipedia can see the reference link for the source of the data.

Make your data more useful for yourself and others[edit]

By adding data to Wikidata it becomes more useful. You can:

Combine it with other data
Use Wikidata tools to explore the data
Visualise your data along with data from other sources…(More)

Next Generation Crowdsourcing for Collective Intelligence

Curated on March 22, 2016August 3, 2018 by Stefaan Verhulst

Paper by John Prpić : “New techniques leveraging IT-mediated crowds such as Crowdsensing, Situated Crowdsourcing, Spatial Crowdsourcing, and Wearables Crowdsourcing have now materially emerged. These techniques, here termed next generation Crowdsourcing, serve to extend Crowdsourcing efforts beyond the heretofore dominant desktop computing paradigm. Employing new configurations of hardware, software, and people, these techniques represent new forms of organization for IT-mediated crowds. However, it is not known how these new techniques change the processes and outcomes of IT-mediated crowds for Collective Intelligence purposes? The aim of this exploratory work is to begin to answer this question. The work ensues by outlining the relevant findings of the first generation Crowdsourcing paradigm, before reviewing the emerging literature pertaining to the new generation of Crowdsourcing techniques. Premised on this review, a collectively exhaustive and mutually exclusive typology is formed, organizing the next generation Crowdsourcing techniques along two salient dimensions common to all first generation Crowdsourcing techniques. As a result, this work situates the next generation Crowdsourcing techniques within the extant Crowdsourcing literature, and identifies new research avenues stemming directly from the analysis….(More)”

States’ using iwaspoisoned.com for outbreak alerts

Curated on March 22, 2016August 3, 2018 by Stefaan Verhulst

Dan Flynn at Food Safety News: “The crowdsourcing site iwaspoisoned.com has collected thousands of reports of foodborne illnesses from individuals across the United States since 2009 and is expanding with a custom alert service for state health departments.

“There are now 26 states signed up, allowing government (health) officials and epidemiologists to receive real time, customized alerts for reported foodborne illness incidents,” said iwaspoisoned.com founder Patrick Quade.
Quade said he wanted to make iwaspoisoned.com data more accessible to health departments and experts in each state.

“This real time information provides a wider range of information data to help local agencies better manage food illness outbreaks,” he said. “It also supplements existing reporting channels and serves to corroborate their own reporting systems.”

The Florida Department of Health, Food and Waterborne Disease Program (FWDP) began receiving iwaspoisoned.com alerts beginning in December 2015.

“The FWDP has had an online complaint form for individuals to report food and waterborne illnesses,” a spokesman said. “However, the program has been looking for ways to expand their reach to ensure they are investigating all incidents. Partnering with iwaspoisoned.com was a logical choice for this expansion.”…

Quade established iwaspoisoned.com in New York City seven years ago to give people a place to report their experiences of being sickened by restaurant food. It gives such people a place to report the restaurants, locations, symptoms and other details and permits others to comment on the report….

The crowdsourcing site has played an increasing role in recent nationally known outbreaks, including those associated with Chipotle Mexican Grill in the last half of 2015. For example, CBS News in Los Angeles first reported on the Simi Valley, Calif., norovirus outbreak after noticing that about a dozen Chipotle customers had logged their illness reports on iwaspoisoned.com.

Eventually, health officials confirmed at least 234 norovirus illnesses associated with a Chipotle location in Simi Valley…(More)”

It’s not big data that discriminates – it’s the people that use it

Curated on March 22, 2016August 15, 2018 by Stefaan Verhulst

Reuben Binns in the Conversation: “Data can’t be racist or sexist, but the way it is used can help reinforce discrimination. The internet means more data is collected about us than ever before and it is used to make automatic decisions that can hugely affect our lives, from our credit scores to our employment opportunities.

If that data reflects unfair social biases against sensitive attributes, such as our race or gender, the conclusions drawn from that data might also be based on those biases.

But this era of “big data” doesn’t need to to entrench inequality in this way. If we build smarter algorithms to analyse our information and ensure we’re aware of how discrimination and injustice may be at work, we can actually use big data to counter our human prejudices.

This kind of problem can arise when computer models are used to make predictions in areas such as insurance, financial loans and policing. If members of a certain racial group have historically been more likely to default on their loans, or been more likely to be convicted of a crime, then the model can deem these people more risky. That doesn’t necessarily mean that these people actually engage in more criminal behaviour or are worse at managing their money. They may just be disproportionately targeted by police and sub-prime mortgage salesmen.

Excluding sensitive attributes

Data scientist Cathy O’Neil has written about her experience of developing models for homeless services in New York City. The models were used to predict how long homeless clients would be in the system and to match them with appropriate services. She argues that including race in the analysis would have been unethical.

If the data showed white clients were more likely to find a job than black ones, the argument goes, then staff might focus their limited resources on those white clients that would more likely have a positive outcome. While sociological research has unveiled the ways that racial disparities in homelessness and unemployment are the result of unjust discrimination, algorithms can’t tell the difference between just and unjust patterns. And so datasets should exclude characteristics that may be used to reinforce the bias, such as race.

But this simple response isn’t necessarily the answer. For one thing, machine learning algorithms can often infer sensitive attributes from a combination of other, non-sensitive facts. People of a particular race may be more likely to live in a certain area, for example. So excluding those attributes may not be enough to remove the bias….

An enlightened service provider might, upon seeing the results of the analysis, investigate whether and how racism is a barrier to their black clients getting hired. Equipped with this knowledge they could begin to do something about it. For instance, they could ensure that local employers’ hiring practices are fair and provide additional help to those applicants more likely to face discrimination. The moral responsibility lies with those responsible for interpreting and acting on the model, not the model itself.

So the argument that sensitive attributes should be stripped from the datasets we use to train predictive models is too simple. Of course, collecting sensitive data should be carefully regulated because it can easily be misused. But misuse is not inevitable, and in some cases, collecting sensitive attributes could prove absolutely essential in uncovering, predicting, and correcting unjust discrimination. For example, in the case of homeless services discussed above, the city would need to collect data on ethnicity in order to discover potential biases in employment practices….(More)

A new data viz tool shows what stories are being undercovered in countries around the world

Curated on March 21, 2016October 24, 2018 by Stefaan Verhulst

Jospeh Lichterman at NiemanLab: “It’s a common lament: Though the Internet provides us access to a nearly unlimited number of sources for news, most of us rarely venture beyond the same few sources or topics. And as news consumption shifts to our phones, people are using even fewer sources: On average, consumers access 1.52 trusted news sources on their phones, according to the 2015 Reuters Digital News Report, which studied news consumption across several countries.

To try and diversify people’s perspectives on the news, Jigsaw — the techincubator, formerly known as Google Ideas, that’s run by Google’s parentcompany Alphabet — this week launched Unfiltered.News, an experimentalsite that uses Google News data to show users what topics are beingunderreported or are popular in regions around the world.

Unfiltered.News’ main data visualization shows which topics are most reported in countries around the world. A column on the right side of the page highlights stories that are being reported widely elsewhere in the world, but aren’t in the top 100 stories on Google News in the selected country. In the United States yesterday, five of the top 10 underreported topics, unsurprisingly, dealt with soccer. In China, Barack Obama was the most undercovered topic….(More)”

To Make Cities More Efficient, Fix Procurement To Welcome Startups

Curated on March 21, 2016July 18, 2019 by Stefaan Verhulst

Jay Nath and Jeremy M. Goldberg at the Aspen Journal of Ideas: “In 2014, an amazing thing happened in government: In just 16 weeks, a new system to help guide visually impaired travelers through San Francisco International Airport was developed, going from a rough idea to ready-to-go-status, through a city program that brings startups and agencies together. Yet two and half years later, a request for proposals to expand this ground-breaking, innovative technology is yet to be finalized.

For people in government, that’s an all-too-familiar scenario. While procurement serves an important role in ensuring that government is a responsible steward of taxpayer dollars, there’s tremendous opportunity to improve the way the public sector has traditionally bought goods and services. And the stakes are higher than simply dealing with red tape. By limiting the pool of partners to those who know how to work the system, taxpayers are missing out on low-cost, innovative solutions. Essentially, RFPs are a Do Not Enter sign for startups — the engine of innovation across nearly every industry except the public sector.

Essentially, RFPs are a Do Not Enter sign for startups — the engine of innovation across nearly every industry except the public sector.

In San Francisco, under our Startup In Residence program, we’re experimenting with how to remove the friction associated with RFPs for both government staff and startups. For government staff, that means publishing an RFP in days, not months. For startups, it means responding to an RFP in hours not weeks.

So what did we learn from our experience with the airport? We combined 17 RFPs into one; utilized general “challenge statements” in place of highly detailed project specifications; leveraged modern technology; and created a simple guide to navigating the process. Here’s a look at how each of those innovations works:

The RFP bus: Today, most RFPs are like a single driver in a car — inefficient and resource-intensive. We should be looking at what might be thought of as mass-transit option, like a bus. By combining a number of RFPs for projects that have characteristics in common into a single procurement vehicle, we can spread the process costs over a number of RFPs.

Challenges, not prescriptions: Under the traditional procurement process, city staffers develop highly prescribed requirements that are often dozens of pages long, a practice that tends to favor existing approaches and established vendors. Shifting to brief challenge statements opens the door for residents, small businesses and entrepreneurs with new ideas. And it reduces the time required by government staff to develop an RFP from weeks or months to days.

Technology that enables the process: This was critical to enabling San Francisco to combine 17 RFPs into one. Without the right technology, we wouldn’t be able to automatically route bidders’ proposals to the appropriate evaluation committees for online scoring or let bidders easily submit their responses. While this kind of procurement technology is not new, it’s use is still uncommon. That needs to change, and it’s more than a question of efficiency. When citizens and entrepreneurs have a painful experience interacting with government, they wonder how we can address the big challenges if we can’t get the small stuff right…(More)“

Can Big Data Help Measure Inflation?

Curated on March 21, 2016October 10, 2018 by Stefaan Verhulst

Bourree Lam in The Atlantic: “…As more and more people are shopping online, calculating this index has gotten more difficult, because there haven’t been any great ways of recording prices from the sites disparate retailers.Data shared by retailers and compiled by the technology firm Adobe might help close this gap. The company is perhaps known best for its visual software,including Photoshop, but the company has also become a provider of software and analytics for online retailers. Adobe is now aggregating the sales data that flows through their software for its Digital Price Index (DPI) project, an initiative that’s meant to answer some of the questions that have been dogging researcher snow that e-commerce is such a big part of the economy.

The project, which tracks billions of online transactions and the prices of over a million products, was developed with the help of the economists Austan Goolsbee, the former chairman of Obama’s Council of Economic Advisors and a professor at the University of Chicago’s Booth School of Business, and Peter Klenow, a professor at Stanford University. “We’ve been excited to help them setup various measures of the digital economy, and of prices, and also to see what the Adobe data can teach us about some of the questions that everybody’s had about the CPI,” says Goolsbee. “People are asking questions like ‘How price sensitive is online commerce?’ ‘How much is it growing?’ ‘How substitutable is itf or non-electronic commerce?’ A lot issues you can address with this in a way that we haven’t really been able to do before.” These are some questions that the DPI has the potential to answer.

…While this new trove of data will certainly be helpful to economists and analysts looking at inflation, it surely won’t replace the CPI. Currently, the government sends out hundreds of BLS employees to stores around the country to collect price data. Online pricing is a small part of the BLS calculation, which is incorporated into its methodology as people increasingly report shopping from retailers online, but there’s a significant time lag. While it’s unlikely that the BLS would incorporate private sources of data into its inflation calculations, as e-commerce grows they might look to improve the way they include online prices.Still, economists are optimistic about the potential of Adobe’s DPI. “I don’t think we know the digital economy as well as we should,” says Klenow, “and this data can help us eventually nail that better.”…(More)

Pigeon patrol takes flight to tackle London’s air pollution crisis

Curated on March 18, 2016August 3, 2018 by Stefaan Verhulst

Adam Vaughan at The Guardian: They’ve been driven from Trafalgar square for being a nuisance, derided as rats with wings and maligned as a risk to public health.

But now pigeons could play a small part in helping Londoners overcome one of the capital’s biggest health problems – its illegal levels of air pollution blamed for thousands of deaths a year.

On Monday, a flock of half a dozen racing pigeons were set loose from a rooftop in Brick Lane by pigeon fancier, Brian Woodhouse, with one strapped with a pollution sensor to its back and one with a GPS tracker.

But while the 25g sensor records the nitrogen dioxide produced by the city’s diesel cars, buses, and trucks and tweets it at anyone who asks for a reading, its real purpose – and the use of the pigeons – is to raise awareness.

“It is a scandal. It is a health and environmental scandal for humans – and pigeons. We’re making the invisible visible,” said Pierre Duquesnoy, who won a London Design Festival award for the idea last year.

“Most of the time when we talk about pollution people think about Beijing or other places, but there are some days in the year when pollution was higher and more toxic in London than Beijing, that’s the reality.”

He said he was inspired by the use of pigeons in the first and second world wars to deliver information and save lives, but they were also a practical way of taking mobile air quality readings and beating London’s congested roads. They fly relatively low, at 100-150ft, and fast, at speeds up to 80mph.

“There’s something about taking what is seen as a flying rat and reversing that into something quite positive,” said Duquesnoy, who is creative director at marketing agency DigitasLBI.

Gary Fuller, an air quality expert at King’s College London, said it was the first time he had heard of urban animals being put to such use.

“It’s great that unemployed pigeons from Trafalgar Square are being put to work. Around 15 years ago tests were done on around 150 stray dogs in Mexico City, showing the ways in which air pollution was affecting lungs and heart health. But this is the first time that I’ve heard of urban wild animals being used to carry sensors to give us a picture of the air pollution over our heads.”

The release of the pigeons for three days this week, dubbed the Pigeon Air Patrol, came as moderate to high pollution affected much of the city, with Battersea recording ‘very high’, the top of the scale.

Elsewhere in the UK, Stockton-on-tees and Middlesbrough recorded high pollution readings and the forecast is for moderate and possibly high pollution in urban areas in northern England and Scotland on Tuesday. Other areas will have low pollution levels….(More).

Crowdlaw and open data policy: A perfect match?

Curated on March 16, 2016August 15, 2018 by Stefaan Verhulst

Stephen Larrick at Sunlight: “The open government community has long envisioned a future where all public policy is collaboratively drafted online and in the open — a future in which we (the people) don’t just have a say in who writes and votes on the rules that govern our society, but are empowered in a substantive way to participate, annotating or even crafting those rules ourselves. If that future seems far away, it’s because we’ve seen few successful instances of this approach in the United States. But an increasing amount of open and collaborative online approaches to drafting legislation — a set of practices the NYU GovLab and others have called “crowdlaw” — seem to have found their niche in open data policy.

This trend has taken hold at the local level, where multiple cities have employed crowdlaw techniques to draft or revise the regulations which establish and govern open data initiatives. But what explains this trend and the apparent connection between crowdlaw and the proactive release of government information online? Is it simply that both are “open government” practices? Or is there something more fundamental at play here?…

Since 2012, several high-profile U.S. cities have utilized collaborative tools such as Google Docs,GitHub, and Madison to open up the process of open data policymaking. The below chronology of notable instances of open data policy drafted using crowdlaw techniques gives the distinct impression of a good idea spreading in American cities:….

While many cities may not be ready to take their hands off of the wheel and trust the public to help engage in meaningful decisions about public policy, it’s encouraging to see some giving it a try when it comes to open data policy. Even for cities still feeling skeptical, this approach can be applied internally; it allows other departments impacted by changes that come about through an open data policy to weigh in, too. Cities can open up varying degrees of the process, retaining as much autonomy as they feel comfortable with. In the end, utilizing the crowdlaw process with open data legislation can increase its effectiveness and accountability by engaging the public directly — a win-win for governments and their citizens alike….(More)”