Twitter releasing trove of user data to scientists for research


Joe Silver at ArsTechnica: “Twitter has a 200-million-strong and ever-growing user base that broadcasts 500 million updates daily. It has been lauded for its ability to unsettle repressive political regimes, bring much-needed accountability to corporations that mistreat their customers, and combat other societal ills (whether such characterizations are, in fact, accurate). Now, the company has taken aim at disrupting another important sphere of human society: the scientific research community.
Back in February, the site announced its plan—in collaboration with Gnip—to provide a handful of research institutions with free access to its data sets from 2006 to the present. It’s a pilot program called “Twitter Data Grants,” with the hashtag #DataGrants. At the time, Twitter’s engineering blog explained the plan to enlist grant applications to access its treasure trove of user data:

Twitter has an expansive set of data from which we can glean insights and learn about a variety of topics, from health-related information such as when and where the flu may hit to global events like ringing in the new year. To date, it has been challenging for researchers outside the company who are tackling big questions to collaborate with us to access our public, historical data. Our Data Grants program aims to change that by connecting research institutions and academics with the data they need.

In April, Twitter announced that, after reviewing the more than 1,300 proposals submitted from more than 60 different countries, it had selected six institutions to provide with data access. Projects approved included a study of foodborne gastrointestinal illnesses, a study measuring happiness levels in cities based on images shared on Twitter, and a study using geosocial intelligence to model urban flooding in Jakarta, Indonesia. There’s even a project exploring the relationship between tweets and sports team performance.
Twitter did not directly respond to our questions on Tuesday afternoon regarding the specific amount and types of data the company is providing to the six institutions. But in its privacy policy, Twitter explains that most user information is intended to be broadcast widely. As a result, the company likely believes that sharing such information with scientific researchers is well within its rights, as its services “are primarily designed to help you share information with the world,” Twitter says. “Most of the information you provide us is information you are asking us to make public.”
While mining such data sets will undoubtedly aid scientists in conducting experiments for which similar data was previously either unavailable or quite limited, these applications raise some legal and ethical questions. For example, Scientific American has asked whether Twitter will be able to retain any legal rights to scientific findings and whether mining tweets (many of which are not publicly accessible) for scientific research when Twitter users have not agreed to such uses is ethically sound.
In response, computational epidemiologists Caitlin Rivers and Bryan Lewis have proposed guidelines for ethical research practices when using social media data, such as avoiding personally identifiable information and making all the results publicly available….”

Open government: getting beyond impenetrable online data


Jed Miller in The Guardian: “Mathematician Blaise Pascal famously closed a long letter by apologising that he hadn’t had time to make it shorter. Unfortunately, his pithy point about “download time” is regularly attributed to Mark Twain and Henry David Thoreau, probably because the public loves writers more than it loves statisticians. Scientists may make things provable, but writers make them memorable.
The World Bank confronted a similar reality of data journalism earlier this month when it revealed that, of the 1,600 bank reports posted online on from 2008 to 2012, 32% had never been downloaded at all and another 40% were downloaded under 100 times each.
Taken together, these cobwebbed documents represent millions of dollars in World Bank funds and hundreds of thousands of person-hours, spent by professionals who themselves represent millions of dollars in university degrees. It’s difficult to see the return on investment in producing expert research and organising it into searchable web libraries when almost three quarters of the output goes largely unseen.
The World Bank works at a scale unheard of by most organisations, but expert groups everywhere face the same challenges. Too much knowledge gets trapped in multi-page pdf files that are slow to download (especially in low-bandwidth areas), costly to print, and unavailable for computer analysis until someone manually or automatically extracts the raw data.
Even those who brave the progress bar find too often that urgent, incisive findings about poverty, health, discrimination, conflict or social change are presented in prose written by and for high-level experts, rendering it impenetrable to almost everyone else. Information isn’t just trapped in pdfs; it’s trapped in PhDs.
Governments and NGOs are beginning to realise that digital strategy means more than posting a document online, but what will it take for these groups to change not just their tools, but their thinking? It won’t be enough to partner with WhatsApp or hire GrumpyCat.
I asked strategists from the development, communications and social media fields to offer simple, “Tweetable” suggestions for how the policy community can become better communicators.

For nonprofits and governments that still publish 100-page pdfs on their websites and do not optimise the content to share in other channels such as social: it is a huge waste of time and ineffective. Stop it now.

– Beth Kanter, author and speaker. Beth’s Blog: How Nonprofits Can Use Social Media

Treat text as #opendata so infomediaries can mash it up and make it more accessible (see, for example federalregister.gov) and don’t just post and blast: distribute information in a targeted way to those most likely to be interested.

– Beth Noveck, director at the Governance Lab and former director at White House Open Government Initiative

Don’t be boring. Sounds easy, actually quite hard, super-important.

– Eli Pariser, CEO of Upworthy

Surprise me. Uncover the key finding that inspired you, rather than trying to tell it all at once and show me how the world could change because of it.

– Jay Golden, co-founder of Wakingstar Storyworks

For the Bank or anyone who is generating policy information they actually want people to use, they must actually write it for the user, not for themselves. As Steve Jobs said, ‘Simple can be harder than complex’.

– Kristen Grimm, founder and president at Spitfire Strategies

The way to reach the widest audience is to think beyond content format and focus on content strategy.

– Laura Silber, director of public affairs at Open Society Foundations

Open the door to policy work with short, accessible pieces – a blog post, a video take, infographics – that deliver the ‘so what’ succinctly.

– Robert McMahon, editor at Council on Foreign Relations

Policy information is more usable if it’s linked to corresponding actions one can take, or if it helps stir debate.  Also, whichever way you slice it, there will always be a narrow market for raw policy reports … that’s why explainer sites, listicles and talking heads exist.

– Ory Okolloh, director of investments at Omidyar Network and former public policy and government relations manager at Google Africa
Ms Okolloh, who helped found the citizen reporting platform Ushahidi, also offered a simple reminder about policy reports: “‘Never gets downloaded’ doesn’t mean ‘never gets read’.” Just as we shouldn’t mistake posting for dissemination, we shouldn’t confuse popularity with influence….”

Democracy and open data: are the two linked?


Molly Shwartz at R-Street: “Are democracies better at practicing open government than less free societies? To find out, I analyzed the 70 countries profiled in the Open Knowledge Foundation’s Open Data Index and compared the rankings against the 2013 Global Democracy Rankings. As a tenet of open government in the digital age, open data practices serve as one indicator of an open government. Overall, there is a strong relationship between democracy and transparency.
Using data collected in October 2013, the top ten countries for openness include the usual bastion-of-democracy suspects: the United Kingdom, the United States, mainland Scandinavia, the Netherlands, Australia, New Zealand and Canada.
There are, however, some noteworthy exceptions. Germany ranks lower than Russia and China. All three rank well above Lithuania. Egypt, Saudi Arabia and Nepal all beat out Belgium. The chart (below) shows the democracy ranking of these same countries from 2008-2013 and highlights the obvious inconsistencies in the correlation between democracy and open data for many countries.
transparency
There are many reasons for such inconsistencies. The implementation of open-government efforts – for instance, opening government data sets – often can be imperfect or even misguided. Drilling down to some of the data behind the Open Data Index scores reveals that even countries that score very well, such as the United States, have room for improvement. For example, the judicial branch generally does not publish data and houses most information behind a pay-wall. The status of legislation and amendments introduced by Congress also often are not available in machine-readable form.
As internationally recognized markers of political freedom and technological innovation, open government initiatives are appealing political tools for politicians looking to gain prominence in the global arena, regardless of whether or not they possess a real commitment to democratic principles. In 2012, Russia made a public push to cultivate open government and open data projects that was enthusiastically endorsed by American institutions. In a June 2012 blog post summarizing a Russian “Open Government Ecosystem” workshop at the World Bank, one World Bank consultant professed the opinion that open government innovations “are happening all over Russia, and are starting to have genuine support from the country’s top leaders.”
Given the Russian government’s penchant for corruption, cronyism, violations of press freedom and increasing restrictions on public access to information, the idea that it was ever committed to government accountability and transparency is dubious at best. This was confirmed by Russia’s May 2013 withdrawal of its letter of intent to join the Open Government Partnership. As explained by John Wonderlich, policy director at the Sunlight Foundation:

While Russia’s initial commitment to OGP was likely a surprising boon for internal champions of reform, its withdrawal will also serve as a demonstration of the difficulty of making a political commitment to openness there.

Which just goes to show that, while a democratic government does not guarantee open government practices, a government that regularly violates democratic principles may be an impossible environment for implementing open government.
A cursory analysis of the ever-evolving international open data landscape reveals three major takeaways:

  1. Good intentions for government transparency in democratic countries are not always effectively realized.
  2. Politicians will gladly pay lip-service to the idea of open government without backing up words with actions.
  3. The transparency we’ve established can go away quickly without vigilant oversight and enforcement.”

The rise of open data driven businesses in emerging markets


Alla Morrison at the Worldbank blog:

Key findings —

  • Many new data companies have emerged around the world in the last few years. Of these companies, the majority use some form of government data.
  • There are a large number of data companies in sectors with high social impact and tremendous development opportunities.
  • An actionable pipeline of data-driven companies exists in Latin America and in Asia. The most desired type of financing is equity, followed by quasi-equity in the amounts ranging from $100,000 to $5 million, with averages of between $2 and $3 million depending on the region. The total estimated need for financing may exceed $400 million.

“The economic value of open data is no longer a hypothesis
How can one make money with open data which is akin to air – free and open to everyone? Should the World Bank Group be in the catalyzer role for a sector that is just emerging?  And if so, what set of interventions would be the most effective? Can promoting open data-driven businesses contribute to the World Bank Group’s twin goals of fighting poverty and boosting shared prosperity?
These questions have been top of the mind since the World Bank Open Finances team convened a group of open data entrepreneurs from across Latin America to share their business models, success stories and challenges at the Open Data Business Models workshop in Uruguay in June 2013. We were in Uruguay to find out whether open data could lead to the creation of sustainable new businesses and jobs. To do so, we tested a couple of hypotheses: open data has economic value, beyond the benefits of increased transparency and accountability; and open data companies with sustainable business models already exist in emerging economies.
Encouraged by our findings in Uruguay we set out to further explore the economic development potential of open data, with a focus on:

  • Contribution of open data to countries’ GDP;
  • Innovative solutions to tackle social problems in key sectors like agriculture, health, education, transportation, climate change, financial services, especially those benefiting low income populations;
  • Economic benefits of governments’ buy-in into the commercial value of open data and resulting release of new datasets, which in turn would lead to increased transparency in public resource management (reductions in misallocations, a more level playing field in procurement) and better service delivery; and
  • Creation of data-related private sector jobs, especially suited for the tech savvy young generation.

We proposed a joint IFC/World Bank approach (From open data to development impact – the crucial role of private sector) that envisages providing financing to data-driven companies through a dedicated investment fund, as well as loans and grants to governments to create a favorable enabling environment. The concept was received enthusiastically for the most part by a wide group of peers at the Bank, the IFC, as well as NGOs, foundations, DFIs and private sector investors.
Thanks also in part to a McKinsey report last fall stating that open data could help unlock more than $3 trillion in value every year, the potential value of open data is now better understood. The acquisition of Climate Corporation (whose business model holds enormous potential for agriculture and food security, if governments open up the right data) for close to a billion dollars last November and the findings of the Open Data 500 project led by GovLab of the NYU further substantiated the hypothesis. These days no one asks whether open data has economic value; the focus has shifted to finding ways for companies, both startups and large corporations, and governments to unlock it. The first question though is – is it still too early to plan a significant intervention to spur open data driven economic growth in emerging markets?”

New Research Suggests Collaborative Approaches Produce Better Plans


JPER: “In a previous blog post (see, http://goo.gl/pAjyWE), we discussed how many of the most influential articles in the Journal of Planning Education and Research (and in peer publications, like JAPA) over the last two decades have focused on communicative or collaborative planning. Proponents of these approaches, most notably Judith Innes, Patsy Healey, Larry Susskind, and John Forester, developed the idea that the collaborative and communicative structures that planners use impact the quality, legitimacy, and equity of planning outcomes. In practice, communicative theory has led to participatory initiatives, such as those observed in New Orleans (post-Katrina, http://goo.gl/A5J5wk), Chattanooga (to revitalize its downtown and riverfront, http://goo.gl/zlQfKB), and in many other smaller efforts to foment wider involvement in decision making. Collaboration has also impacted regional governance structures, leading to more consensus based forms of decision making, notably CALFED (SF Bay estuary governance, http://goo.gl/EcXx9Q) and transportation planning with Metropolitan Planning Organizations (MPOs)….
Most studies testing the implementation of collaborative planning have been case studies. Previous work by authors such as Innes and Booher, has provided valuable qualitative data about collaboration in planning, but few studies have attempted to empirically test the hypothesis that consensus building and participatory practices lead to better planning outcomes.
Robert Deyle (Florida State) and Ryan Weidenman (Atkins Global) build on previous case study research by surveying officials in involved in developing long-range transportation plans in 88 U.S. MPOs about the process and outcomes of those plans. The study tests the hypothesis that collaborative processes provide better outcomes and enhanced long-term relationships in situations where “many stakeholders with different needs” have “shared interests in common resources or challenges” and where “no actor can meet his/her interests without the cooperation of many others (Innes and Booher 2010, 7; Innes and Gruber 2005, 1985–2186). Current theory posits that consensus-based collaboration requires 1) the presence of all relevant interests, 2) mutual interdependence for goal achievement, and 3) honest and authentic dialog between participants (Innes and Booher 2010, 35–36, Deyle and Weidenmann, 2014).

Figure 2 Deyle and Weidenman (2014)
By surveying planning authorities, the authors found that most of the conditions (See Figure 2, above) posited in collaborative planning literature had statistically significant impacts on planning outcomes.These included perceptions of plan quality, participant satisfaction with the plan, as well as intangible outcomes that benefit both the participants and their ongoing collaboration efforts. However, having a planning process in which all or most decisions were made by consensus did not improve outcomes.  ….
Deyle, Robert E., and Ryan E. Wiedenman. “Collaborative Planning by Metropolitan Planning Organizations A Test of Causal Theory.” Journal of Planning Education and Research (2014): 0739456X14527621.
To access this article FREE until May 31 click the following links: Online, http://goo.gl/GU9inf, PDF, http://goo.gl/jehAf1.”

The Emerging Science of Superspreaders (And How to Tell If You're One Of Them)


Emerging Technology From the arXiv: “Who are the most influential spreaders of information on a network? That’s a question that marketers, bloggers, news services and even governments would like answered. Not least because the answer could provide ways to promote products quickly, to boost the popularity of political parties above their rivals and to seed the rapid spread of news and opinions.
So it’s not surprising that network theorists have spent some time thinking about how best to identify these people and to check how the information they receive might spread around a network. Indeed, they’ve found a number of measures that spot so-called superspreaders, people who spread information, ideas or even disease more efficiently than anybody else.
But there’s a problem. Social networks are so complex that network scientists have never been able to test their ideas in the real world—it has always been too difficult to reconstruct the exact structure of Twitter or Facebook networks, for example. Instead, they’ve created models that mimic real networks in certain ways and tested their ideas on these instead.
But there is growing evidence that information does not spread through real networks in the same way as it does through these idealised ones. People tend to pass on information only when they are interested in a topic and when they are active, factors that are hard to take into account in a purely topological model of a network.
So the question of how to find the superspreaders remains open. That looks set to change thanks to the work of Sen Pei at Beihang University in Beijing and a few pals who have performed the first study of superspreaders on real networks.
These guys have studied the way information flows around various networks ranging from the Livejournal blogging network to the network of scientific publishing at the American Physical Society’s, as well as on subsets of the Twitter and Facebook networks. And they’ve discovered the key indicator that identifies superspreaders in these networks.
In the past, network scientists have developed a number of mathematical tests to measure the influence that individuals have on the spread of information through a network. For example, one measure is simply the number of connections a person has to other people in the network, a property known as their degree. The thinking is that the most highly connected people are the best at spreading information.
Another measure uses the famous PageRank algorithm that Google developed for ranking webpages. This works by ranking somebody more highly if they are connected to other highly ranked people.
Then there is ‘betweenness centrality’ , a measure of how many of the shortest paths across a network pass through a specific individual. The idea is that these people are more able to inject information into the network.
And finally there is a property of nodes in a network known as their k-core. This is determined by iteratively pruning the peripheries of a network to see what is left. The k-core is the step at which that node or person is pruned from the network. Obviously, the most highly connected survive this process the longest and have the highest k-core score..
The question that Sen and co set out to answer was which of these measures best picked out superspreaders of information in real networks.
They began with LiveJournal, a network of blogs in which individuals maintain lists of friends that represent social ties to other LiveJournal users. This network allows people to repost information from other blogs and to use a reference the links back to the original post. This allows Sen and co to recreate not only the network of social links between LiveJournal users but also the way in which information is spread between them.
Sen and co collected all of the blog posts from February 2010 to November 2011, a total of more than 56 million posts. Of these, some 600,000 contain links to other posts published by LiveJournal users.
The data reveals two important properties of information diffusion. First, only some 250,000 users are actively involved in spreading information. That’s a small fraction of the total.
More significantly, they found that information did not always diffuse across the social network. The found that information could spread between two LiveJournal users even though they have no social connection.
That’s probably because they find this information outside of the LiveJournal ecosystem, perhaps through web searches or via other networks. “Only 31.93% of the spreading posts can be attributed to the observable social links,” they say.
That’s in stark contrast to the assumptions behind many social network models. These simulate the way information flows by assuming that it travels directly through the network from one person to another, like a disease spread by physical contact.
The work of Sen and co suggests that influences outside the network are crucial too. In practice, information often spreads via several seemingly independent sources within the network at the same time. This has important implications for the way superspreaders can be spotted.
Sen and co say that a person’s degree– the number of other people he or her are connected to– is not as good a predictor of information diffusion as theorists have thought.  “We find that the degree of the user is not a reliable predictor of influence in all circumstances,” they say.
What’s more, the Pagerank algorithm is often ineffective in this kind of network as well. “Contrary to common belief, although PageRank is effective in ranking web pages, there are many situations where it fails to locate superspreaders of information in reality,” they say….
Ref: arxiv.org/abs/1405.1790 : Searching For Superspreaders Of Information In Real-World Social Media”

Can Big Data Stop Wars Before They Happen?


Foreign Policy: “It has been almost two decades exactly since conflict prevention shot to the top of the peace-building agenda, as large-scale killings shifted from interstate wars to intrastate and intergroup conflicts. What could we have done to anticipate and prevent the 100 days of genocidal killing in Rwanda that began in April 1994 or the massacre of thousands of Bosnian Muslims at Srebrenica just over a year later? The international community recognized that conflict prevention could no longer be limited to diplomatic and military initiatives, but that it also requires earlier intervention to address the causes of violence between nonstate actors, including tribal, religious, economic, and resource-based tensions.
For years, even as it was pursued as doggedly as personnel and funding allowed, early intervention remained elusive, a kind of Holy Grail for peace-builders. This might finally be changing. The rise of data on social dynamics and what people think and feel — obtained through social media, SMS questionnaires, increasingly comprehensive satellite information, news-scraping apps, and more — has given the peace-building field hope of harnessing a new vision of the world. But to cash in on that hope, we first need to figure out how to understand all the numbers and charts and figures now available to us. Only then can we expect to predict and prevent events like the recent massacres in South Sudan or the ongoing violence in the Central African Republic.
A growing number of initiatives have tried to make it across the bridge between data and understanding. They’ve ranged from small nonprofit shops of a few people to massive government-funded institutions, and they’ve been moving forward in fits and starts. Few of these initiatives have been successful in documenting incidents of violence actually averted or stopped. Sometimes that’s simply because violence or absence of it isn’t verifiable. The growing literature on big data and conflict prevention today is replete with caveats about “overpromising and underdelivering” and the persistent gap between early warning and early action. In the case of the Conflict Early Warning and Response Mechanism (CEWARN) system in central Africa — one of the earlier and most prominent attempts at early intervention — it is widely accepted that the project largely failed to use the data it retrieved for effective conflict management. It relied heavily on technology to produce large databases, while lacking the personnel to effectively analyze them or take meaningful early action.
To be sure, disappointments are to be expected when breaking new ground. But they don’t have to continue forever. This pioneering work demands not just data and technology expertise. Also critical is cross-discipline collaboration between the data experts and the conflict experts, who know intimately the social, political, and geographic terrain of different locations. What was once a clash of cultures over the value and meaning of metrics when it comes to complex human dynamics needs to morph into collaboration. This is still pretty rare, but if the past decade’s innovations are any prologue, we are hopefully headed in the right direction.
* * *
Over the last three years, the U.S. Defense Department, the United Nations, and the CIA have all launched programs to parse the masses of public data now available, scraping and analyzing details from social media, blogs, market data, and myriad other sources to achieve variations of the same goal: anticipating when and where conflict might arise. The Defense Department’s Information Volume and Velocity program is designed to use “pattern recognition to detect trends in a sea of unstructured data” that would point to growing instability. The U.N.’s Global Pulse initiative’s stated goal is to track “human well-being and emerging vulnerabilities in real-time, in order to better protect populations from shocks.” The Open Source Indicators program at the CIA’s Intelligence Advanced Research Projects Activity aims to anticipate “political crises, disease outbreaks, economic instability, resource shortages, and natural disasters.” Each looks to the growing stream of public data to detect significant population-level changes.
Large institutions with deep pockets have always been at the forefront of efforts in the international security field to design systems for improving data-driven decision-making. They’ve followed the lead of large private-sector organizations where data and analytics rose to the top of the corporate agenda. (In that sector, the data revolution is promising “to transform the way many companies do business, delivering performance improvements not seen since the redesign of core processes in the 1990s,” as David Court, a director at consulting firm McKinsey, has put it.)
What really defines the recent data revolution in peace-building, however, is that it is transcending size and resource limitations. It is finding its way to small organizations operating at local levels and using knowledge and subject experts to parse information from the ground. It is transforming the way peace-builders do business, delivering data-led programs and evidence-based decision-making not seen since the field’s inception in the latter half of the 20th century.
One of the most famous recent examples is the 2013 Kenyan presidential election.
In March 2013, the world was watching and waiting to see whether the vote would produce more of the violence that had left at least 1,300 people dead and 600,000 homeless during and after 2010 elections. In the intervening years, a web of NGOs worked to set up early-warning and early-response mechanisms to defuse tribal rivalries, party passions, and rumor-mongering. Many of the projects were technology-based initiatives trying to leverage data sources in new ways — including a collaborative effort spearheaded and facilitated by a Kenyan nonprofit called Ushahidi (“witness” in Swahili) that designs open-source data collection and mapping software. The Umati (meaning “crowd”) project used an Ushahidi program to monitor media reports, tweets, and blog posts to detect rising tensions, frustration, calls to violence, and hate speech — and then sorted and categorized it all on one central platform. The information fed into election-monitoring maps built by the Ushahidi team, while mobile-phone provider Safaricom donated 50 million text messages to a local peace-building organization, Sisi ni Amani (“We are Peace”), so that it could act on the information by sending texts — which had been used to incite and fuel violence during the 2007 elections — aimed at preventing violence and quelling rumors.
The first challenges came around 10 a.m. on the opening day of voting. “Rowdy youth overpowered police at a polling station in Dandora Phase 4,” one of the informal settlements in Nairobi that had been a site of violence in 2007, wrote Neelam Verjee, programs manager at Sisi ni Amani. The young men were blocking others from voting, and “the situation was tense.”
Sisi ni Amani sent a text blast to its subscribers: “When we maintain peace, we will have joy & be happy to spend time with friends & family but violence spoils all these good things. Tudumishe amani [“Maintain the peace”] Phase 4.” Meanwhile, security officers, who had been called separately, arrived at the scene and took control of the polling station. Voting resumed with little violence. According to interviews collected by Sisi ni Amani after the vote, the message “was sent at the right time” and “helped to calm down the situation.”
In many ways, Kenya’s experience is the story of peace-building today: Data is changing the way professionals in the field think about anticipating events, planning interventions, and assessing what worked and what didn’t. But it also underscores the possibility that we might be edging closer to a time when peace-builders at every level and in all sectors — international, state, and local, governmental and not — will have mechanisms both to know about brewing violence and to save lives by acting on that knowledge.
Three important trends underlie the optimism. The first is the sheer amount of data that we’re generating. In 2012, humans plugged into digital devices managed to generate more data in a single year than over the course of world history — and that rate more than doubles every year. As of 2012, 2.4 billion people — 34 percent of the world’s population — had a direct Internet connection. The growth is most stunning in regions like the Middle East and Africa where conflict abounds; access has grown 2,634 percent and 3,607 percent, respectively, in the last decade.
The growth of mobile-phone subscriptions, which allow their owners to be part of new data sources without a direct Internet connection, is also staggering. In 2013, there were almost as many cell-phone subscriptions in the world as there were people. In Africa, there were 63 subscriptions per 100 people, and there were 105 per 100 people in the Arab states.
The second trend has to do with our expanded capacity to collect and crunch data. Not only do we have more computing power enabling us to produce enormous new data sets — such as the Global Database of Events, Language, and Tone (GDELT) project, which tracks almost 300 million conflict-relevant events reported in the media between 1979 and today — but we are also developing more-sophisticated methodological approaches to using these data as raw material for conflict prediction. New machine-learning methodologies, which use algorithms to make predictions (like a spam filter, but much, much more advanced), can provide “substantial improvements in accuracy and performance” in anticipating violent outbreaks, according to Chris Perry, a data scientist at the International Peace Institute.
This brings us to the third trend: the nature of the data itself. When it comes to conflict prevention and peace-building, progress is not simply a question of “more” data, but also different data. For the first time, digital media — user-generated content and online social networks in particular — tell us not just what is going on, but also what people think about the things that are going on. Excitement in the peace-building field centers on the possibility that we can tap into data sets to understand, and preempt, the human sentiment that underlies violent conflict.
Realizing the full potential of these three trends means figuring out how to distinguish between the information, which abounds, and the insights, which are actionable. It is a distinction that is especially hard to make because it requires cross-discipline expertise that combines the wherewithal of data scientists with that of social scientists and the knowledge of technologists with the insights of conflict experts.

United States federal government use of crowdsourcing grows six-fold since 2011


at E Pluribus Unum: “Citizensourcing and open innovation can work in the public sector, just as crowdsourcing can in the private sector. Around the world, the use of prizes to spur innovation has been booming for years. The United States of America has been significantly scaling up its use of prizes and challenges to solving grand national challenges since January 2011, when, President Obama signed an updated version of the America COMPETES Act into law.
According to the third congressionally mandated report released by the Obama administration today (PDF/Text), the number of prizes and challenges conducted under the America COMPETES Act has increased by 50% since 2012, 85% since 2012, and nearly six-fold overall since 2011. 25 different federal agencies offered prizes under COMPETES in fiscal year 2013, with 87 prize competitions in total. The size of the prize purses has also grown as well, with 11 challenges over $100,000 in 2013. Nearly half of the prizes conducted in FY 2013 were focused on software, including applications, data visualization tools, and predictive algorithms. Challenge.gov, the award-winning online platform for crowdsourcing national challenges, now has tens of thousands of users who have participated in more than 300 public-sector prize competitions. Beyond the growth in prize numbers and amounts, Obama administration highlighted 4 trends in public-sector prize competitions:

  • New models for public engagement and community building during competitions
  • Growth software and information technology challenges, with nearly 50% of the total prizes in this category
  • More emphasis on sustainability and “creating a post-competition path to success”
  • Increased focus on identifying novel approaches to solving problems

The growth of open innovation in and by the public sector was directly enabled by Congress and the White House, working together for the common good. Congress reauthorized COMPETES in 2010 with an amendment to Section 105 of the act that added a Section 24 on “Prize Competitions,” providing all agencies with the authority to conduct prizes and challenges that only NASA and DARPA has previously enjoyed, and the White House Office of Science and Technology Policy (OSTP), which has been guiding its implementation and providing guidance on the use of challenges and prizes to promote open government.
“This progress is due to important steps that the Obama Administration has taken to make prizes a standard tool in every agency’s toolbox,” wrote Cristin Dorgelo, assistant director for grand challenges in OSTP, in a WhiteHouse.gov blog post on engaging citizen solvers with prizes:

In his September 2009 Strategy for American Innovation, President Obama called on all Federal agencies to increase their use of prizes to address some of our Nation’s most pressing challenges. Those efforts have expanded since the signing of the America COMPETES Reauthorization Act of 2010, which provided all agencies with expanded authority to pursue ambitious prizes with robust incentives.
To support these ongoing efforts, OSTP and the General Services Administration have trained over 1,200 agency staff through workshops, online resources, and an active community of practice. And NASA’s Center of Excellence for Collaborative Innovation (COECI) provides a full suite of prize implementation services, allowing agencies to experiment with these new methods before standing up their own capabilities.

Sun Microsystems co-founder Bill Joy famously once said that “No matter who you are, most of the smartest people work for someone else.” This rings true, in and outside of government. The idea of governments using prizes like this to inspire technological innovation, however, is not reliant on Web services and social media, born from the fertile mind of a Silicon Valley entrepreneur. As the introduction to the third White House prize report  notes:

“One of the most famous scientific achievements in nautical history was spurred by a grand challenge issued in the 18th Century. The issue of safe, long distance sea travel in the Age of Sail was of such great importance that the British government offered a cash award of £20,000 pounds to anyone who could invent a way of precisely determining a ship’s longitude. The Longitude Prize, enacted by the British Parliament in 1714, would be worth some £30 million pounds today, but even by that measure the value of the marine chronometer invented by British clockmaker John Harrison might be a deal.”

Centuries later, the Internet, World Wide Web, mobile devices and social media offer the best platforms in history for this kind of approach to solving grand challenges and catalyzing civic innovation, helping public officials and businesses find new ways to solve old problem. When a new idea, technology or methodology that challenges and improves upon existing processes and systems, it can improve the lives of citizens or the function of the society that they live within….”

Thanks-for-Ungluing launches!


Blog from Unglue.it: “Great books deserve to be read by all of us, and we ought to be supporting the people who create these books. “Thanks for Ungluing” gives readers, authors, libraries and publishers a new way to build, sustain, and nourish the books we love.
“Thanks for Ungluing” books are Creative Commons licensed and free to download. You don’t need to register or anything. But when you download, the creators can ask for your support. You can pay what you want. You can just scroll down and download the book. But when that book has become your friend, your advisor, your confidante, you’ll probably want to show your support and tell all your friends.
We have some amazing creators participating in this launch….”

Findings of the Big Data and Privacy Working Group Review


John Podesta at the White House Blog: “Over the past several days, severe storms have battered Arkansas, Oklahoma, Mississippi and other states. Dozens of people have been killed and entire neighborhoods turned to rubble and debris as tornadoes have touched down across the region. Natural disasters like these present a host of challenges for first responders. How many people are affected, injured, or dead? Where can they find food, shelter, and medical attention? What critical infrastructure might have been damaged?
Drawing on open government data sources, including Census demographics and NOAA weather data, along with their own demographic databases, Esri, a geospatial technology company, has created a real-time map showing where the twisters have been spotted and how the storm systems are moving. They have also used these data to show how many people live in the affected area, and summarize potential impacts from the storms. It’s a powerful tool for emergency services and communities. And it’s driven by big data technology.
In January, President Obama asked me to lead a wide-ranging review of “big data” and privacy—to explore how these technologies are changing our economy, our government, and our society, and to consider their implications for our personal privacy. Together with Secretary of Commerce Penny Pritzker, Secretary of Energy Ernest Moniz, the President’s Science Advisor John Holdren, the President’s Economic Advisor Jeff Zients, and other senior officials, our review sought to understand what is genuinely new and different about big data and to consider how best to encourage the potential of these technologies while minimizing risks to privacy and core American values.
Over the course of 90 days, we met with academic researchers and privacy advocates, with regulators and the technology industry, with advertisers and civil rights groups. The President’s Council of Advisors for Science and Technology conducted a parallel study of the technological trends underpinning big data. The White House Office of Science and Technology Policy jointly organized three university conferences at MIT, NYU, and U.C. Berkeley. We issued a formal Request for Information seeking public comment, and hosted a survey to generate even more public input.
Today, we presented our findings to the President. We knew better than to try to answer every question about big data in three months. But we are able to draw important conclusions and make concrete recommendations for Administration attention and policy development in a few key areas.
There are a few technological trends that bear drawing out. The declining cost of collection, storage, and processing of data, combined with new sources of data like sensors, cameras, and geospatial technologies, mean that we live in a world of near-ubiquitous data collection. All this data is being crunched at a speed that is increasingly approaching real-time, meaning that big data algorithms could soon have immediate effects on decisions being made about our lives.
The big data revolution presents incredible opportunities in virtually every sector of the economy and every corner of society.
Big data is saving lives. Infections are dangerous—even deadly—for many babies born prematurely. By collecting and analyzing millions of data points from a NICU, one study was able to identify factors, like slight increases in body temperature and heart rate, that serve as early warning signs an infection may be taking root—subtle changes that even the most experienced doctors wouldn’t have noticed on their own.
Big data is making the economy work better. Jet engines and delivery trucks now come outfitted with sensors that continuously monitor hundreds of data points and send automatic alerts when maintenance is needed. Utility companies are starting to use big data to predict periods of peak electric demand, adjusting the grid to be more efficient and potentially averting brown-outs.
Big data is making government work better and saving taxpayer dollars. The Centers for Medicare and Medicaid Services have begun using predictive analytics—a big data technique—to flag likely instances of reimbursement fraud before claims are paid. The Fraud Prevention System helps identify the highest-risk health care providers for waste, fraud, and abuse in real time and has already stopped, prevented, or identified $115 million in fraudulent payments.
But big data raises serious questions, too, about how we protect our privacy and other values in a world where data collection is increasingly ubiquitous and where analysis is conducted at speeds approaching real time. In particular, our review raised the question of whether the “notice and consent” framework, in which a user grants permission for a service to collect and use information about them, still allows us to meaningfully control our privacy as data about us is increasingly used and reused in ways that could not have been anticipated when it was collected.
Big data raises other concerns, as well. One significant finding of our review was the potential for big data analytics to lead to discriminatory outcomes and to circumvent longstanding civil rights protections in housing, employment, credit, and the consumer marketplace.
No matter how quickly technology advances, it remains within our power to ensure that we both encourage innovation and protect our values through law, policy, and the practices we encourage in the public and private sector. To that end, we make six actionable policy recommendations in our report to the President:
Advance the Consumer Privacy Bill of Rights. Consumers deserve clear, understandable, reasonable standards for how their personal information is used in the big data era. We recommend the Department of Commerce take appropriate consultative steps to seek stakeholder and public comment on what changes, if any, are needed to the Consumer Privacy Bill of Rights, first proposed by the President in 2012, and to prepare draft legislative text for consideration by stakeholders and submission by the President to Congress.
Pass National Data Breach Legislation. Big data technologies make it possible to store significantly more data, and further derive intimate insights into a person’s character, habits, preferences, and activities. That makes the potential impacts of data breaches at businesses or other organizations even more serious. A patchwork of state laws currently governs requirements for reporting data breaches. Congress should pass legislation that provides for a single national data breach standard, along the lines of the Administration’s 2011 Cybersecurity legislative proposal.
Extend Privacy Protections to non-U.S. Persons. Privacy is a worldwide value that should be reflected in how the federal government handles personally identifiable information about non-U.S. citizens. The Office of Management and Budget should work with departments and agencies to apply the Privacy Act of 1974 to non-U.S. persons where practicable, or to establish alternative privacy policies that apply appropriate and meaningful protections to personal information regardless of a person’s nationality.
Ensure Data Collected on Students in School is used for Educational Purposes. Big data and other technological innovations, including new online course platforms that provide students real time feedback, promise to transform education by personalizing learning. At the same time, the federal government must ensure educational data linked to individual students gathered in school is used for educational purposes, and protect students against their data being shared or used inappropriately.
Expand Technical Expertise to Stop Discrimination. The detailed personal profiles held about many consumers, combined with automated, algorithm-driven decision-making, could lead—intentionally or inadvertently—to discriminatory outcomes, or what some are already calling “digital redlining.” The federal government’s lead civil rights and consumer protection agencies should expand their technical expertise to be able to identify practices and outcomes facilitated by big data analytics that have a discriminatory impact on protected classes, and develop a plan for investigating and resolving violations of law.
Amend the Electronic Communications Privacy Act. The laws that govern protections afforded to our communications were written before email, the internet, and cloud computing came into wide use. Congress should amend ECPA to ensure the standard of protection for online, digital content is consistent with that afforded in the physical world—including by removing archaic distinctions between email left unread or over a certain age.
We also identify several broader areas ripe for further study, debate, and public engagement that, collectively, we hope will spark a national conversation about how to harness big data for the public good. We conclude that we must find a way to preserve our privacy values in both the domestic and international marketplace. We urgently need to build capacity in the federal government to identify and prevent new modes of discrimination that could be enabled by big data. We must ensure that law enforcement agencies using big data technologies do so responsibly, and that our fundamental privacy rights remain protected. Finally, we recognize that data is a valuable public resource, and call for continuing the Administration’s efforts to open more government data sources and make investments in research and technology.
While big data presents new challenges, it also presents immense opportunities to improve lives, the United States is perhaps better suited to lead this conversation than any other nation on earth. Our innovative spirit, technological know-how, and deep commitment to values of privacy, fairness, non-discrimination, and self-determination will help us harness the benefits of the big data revolution and encourage the free flow of information while working with our international partners to protect personal privacy. This review is but one piece of that effort, and we hope it spurs a conversation about big data across the country and around the world.
Read the Big Data Report.
See the fact sheet from today’s announcement.