Introduction to Open Geospatial Consortium (OGC) Standards


Joseph McGenn; Dominic Taylor; Gail Millin-Chalabi (Editor); Kamie Kitmitto (Editor) at Jorum : “The onset of the Information Age and Digital Revolution has created a knowledge based society where the internet acts as a global platform for the sharing of information. In a geospatial context, this resulted in an advancement of techniques in how we acquire, study and share geographic information and with the development of Geographic Information Systems (GIS), locational services, and online mapping, spatial data has never been more abundant. The transformation to this digital era has not been without its drawbacks, and a forty year lack of common polices to data sharing has resulted in compatibility issues and great diversity in how software and data are delivered. Essential to the sharing of spatial information is interoperability, where different programmes can exchange and open data from various sources seamlessly. Applying universal standards across a sector provides interoperable solutions. The Open Geospatial Consortium (OGC) facilitates interoperability by providing open standard specifications which organisations can use to develop geospatial software. This means that two separate pieces of software or platforms, if developed using open standard specifications, can exchange data without compatibility issues. By defining these specifications and standards the OGC plays a crucial role in how geospatial information is shared on a global scale. Standard specifications are the invisible glue that holds information systems together, without which, data sharing generally would be an arduous task. On some level they keep the world spinning and this course will instil some appreciation for them from a geospatial perspective. This course introduces users to the OGC and all the common standards in the context of geoportals and mapping solutions. These standards are defined and explored using a number of platforms and interoperability is demonstrated in a practical sense. Finally, users will implement these standards to develop their own platforms for sharing geospatial information.”

Brief survey of crowdsourcing for data mining


Paper by Guo XintongWang Hongzhi, Yangqiu Song, and Gao Hong in Expert Systems with Applications: “Crowdsourcing allows large-scale and flexible invocation of human input for data gathering and analysis, which introduces a new paradigm of data mining process. Traditional data mining methods often require the experts in analytic domains to annotate the data. However, it is expensive and usually takes a long time. Crowdsourcing enables the use of heterogeneous background knowledge from volunteers and distributes the annotation process to small portions of efforts from different contributions. This paper reviews the state-of-the-arts on the crowdsourcing for data mining in recent years. We first review the challenges and opportunities of data mining tasks using crowdsourcing, and summarize the framework of them. Then we highlight several exemplar works in each component of the framework, including question designing, data mining and quality control. Finally, we conclude the limitation of crowdsourcing for data mining and suggest related areas for future research.

New Commerce Department report explores huge benefits, low cost of government data


Mark Doms, Under Secretary for Economic Affairs in a blog: Today we are pleased to roll out an important new Commerce Department report on government data. “Fostering Innovation, Creating Jobs, Driving Better Decisions: The Value of Government Data,” arrives as our society increasingly focuses on how the intelligent use of data can make our businesses more competitive, our governments smarter, and our citizens better informed.

And when it comes to data, as the Under Secretary for Economic Affairs, I have a special appreciation for the Commerce Department’s two preeminent statistical agencies, the Census Bureau and the Bureau of Economic Analysis. These agencies inform us on how our $17 trillion economy is evolving and how our population (318 million and counting) is changing, data critical to our country. Although “Big Data” is all the rage these days, the government has been in this  business for a long time: the first Decennial Census was in 1790, gathering information on close to four million people, a huge dataset for its day, and not too shabby by today’s standards as well.

Just how valuable is the data we provide? Our report seeks to answer this question by exploring the range of federal statistics and how they are applied in decision-making. Examples of our data include gross domestic product, employment, consumer prices, corporate profits, retail sales, agricultural supply and demand, population, international trade and much more.

Clearly, as shown in the report, the value of this information to our society far exceeds its cost – and not just because the price tag is shockingly low: three cents, per person, per day. Federal statistics guide trillions of dollars in annual investments at an average annual cost of $3.7 billion: just 0.02 percent of our $17 trillion dollar economy covers the massive amount of data collection, processing and dissemination. With a statistical system that is comprehensive, consistent, confidential, relevant and accessible, the federal government is uniquely positioned to provide a wide range of statistics that complement the vast and growing sources of private sector data.

Our federally collected information is frequently “invisible,” because attribution is not required. But it flows daily into myriad commercial products and services. Today’s report identifies the industries that intensively use our data and provides a rough estimate of the size of this sector. The lower-bound estimate suggests government statistics help private firms generate revenues of at least $24 billion annually – more than six times what we spend for the data. The upper-bound estimate suggests annual revenues of $221 billion!

This report takes a first crack at putting an actual dollars and cents value to government data. We’ve learned a lot from this initial study, and look forward to honing in even further on that figure in our next report.”

Forget The Wisdom of Crowds; Neurobiologists Reveal The Wisdom Of The Confident


Emerging Technology From the arXiv: “Way back in 1906, the English polymath Francis Galton visited a country fair in which 800 people took part in a contest to guess the weight of a slaughtered ox. After the fair, he collected the guesses and calculated their average which turned out to be 1208 pounds. To Galton’s surprise, this was within 1 per cent of the true weight of 1198 pounds.
This is one of the earliest examples of a phenomenon that has come to be known as the wisdom of the crowd. The idea is that the collective opinion of a group of individuals can be better than a single expert opinion.
This phenomenon is commonplace today on websites such as Reddit in which users vote on the importance of particular stories and the most popular are given greater prominence.
However, anyone familiar with Reddit will know that the collective opinion isn’t always wise. In recent years, researchers have spent a significant amount of time and effort teasing apart the factors that make crowds stupid. One important factor turns out to be the way members of a crowd influence each other.
It turns out that if a crowd offers a wide range of independent estimates, then it is more likely to be wise. But if members of the crowd are influenced in the same way, for example by each other or by some external factor, then they tend to converge on a biased estimate. In this case, the crowd is likely to be stupid.
Today, Gabriel Madirolas and Gonzalo De Polavieja at the Cajal Institute in Madrid, Spain, say they found a way to analyse the answers from a crowd which allows them to remove this kind of bias and so settle on a wiser answer.
The theory behind their work is straightforward. Their idea is that some people are more strongly influenced by additional information than others who are confident in their own opinion. So identifying these more strongly influenced people and separating them from the independent thinkers creates two different groups. The group of independent thinkers is then more likely to give a wise estimate. Or put another way, ignore the wisdom of the crowd in favour of the wisdom of the confident.
So how to identify confident thinkers. Madirolas and De Polavieja began by studying the data from an earlier set of experiments in which groups of people were given tasks such as to estimate the length of the border between Switzerland and Italy, the correct answer being 734 kilometres.
After one task, some groups were shown the combined estimates of other groups before beginning their second task. These experiments clearly showed how this information biased the answers from these groups in their second tasks.
Madirolas and De Polavieja then set about creating a mathematical model of how individuals incorporate this extra information. They assume that each person comes to a final estimate based on two pieces of information: first, their own independent estimate of the length of the border and second, the earlier combined estimate revealed to the group. Each individual decides on a final estimate depending on the weighting they give to each piece of information.
Those people who are heavily biased give a strong weighting to the additional information whereas people who are confident in their own estimate give a small or zero weighting to the additional information.
Madirolas and De Polavieja then take each person’s behaviour and fit it to this model to reveal how independent their thinking has been.
That allows them to divide the groups into independent thinkers and biased thinkers. Taking the collective opinion of the independent thinkers then gives a much more accurate estimate of the length of the border.
“Our results show that, while a simple operation like the mean, median or geometric mean of a group may not allow groups to make good estimations, a more complex operation taking into account individuality in the social dynamics can lead to a better collective intelligence,” they say.

Ref: arxiv.org/abs/1406.7578 : Wisdom of the Confident: Using Social Interactions to Eliminate the Bias in Wisdom of the Crowds”

Incentivizing Peer Review


in Wired on “The Last Obstacle for Open Access Science: The Galapagos Islands’ Charles Darwin Foundation runs on an annual operating budget of about $3.5 million. With this money, the center conducts conservation research, enacts species-saving interventions, and provides educational resources about the fragile island ecosystems. As a science-based enterprise whose work would benefit greatly from the latest research findings on ecological management, evolution, and invasive species, there’s one glaring hole in the Foundation’s budget: the $800,000 it would cost per year for subscriptions to leading academic journals.
According to Richard Price, founder and CEO of Academia.edu, this episode is symptomatic of a larger problem. “A lot of research centers” – NGOs, academic institutions in the developing world – “are just out in the cold as far as access to top journals is concerned,” says Price. “Research is being commoditized, and it’s just another aspect of the digital divide between the haves and have-nots.”
 
Academia.edu is a key player in the movement toward open access scientific publishing, with over 11 million participants who have uploaded nearly 3 million scientific papers to the site. It’s easy to understand Price’s frustration with the current model, in which academics donate their time to review articles, pay for the right to publish articles, and pay for access to articles. According to Price, journals charge an average of $4000 per article: $1500 for production costs (reformatting, designing), $1500 to orchestrate peer review (labor costs for hiring editors, administrators), and $1000 of profit.
“If there were no legacy in the scientific publishing industry, and we were looking at the best way to disseminate and view scientific results,” proposes Price, “things would look very different. Our vision is to build a complete replacement for scientific publishing,” one that would allow budget-constrained organizations like the CDF full access to information that directly impacts their work.
But getting to a sustainable new world order requires a thorough overhaul of academic publishing industry. The alternative vision – of “open science” – has two key properties: the uninhibited sharing of research findings, and a new peer review system that incorporates the best of the scientific community’s feedback. Several groups have made progress on the former, but the latter has proven particularly difficult given the current incentive structure. The currency of scientific research is the number of papers you’ve published and their citation counts – the number of times other researchers have referred to your work in their own publications. The emphasis is on creation of new knowledge – a worthy goal, to be sure – but substantial contributions to the quality, packaging, and contextualization of that knowledge in the form of peer review goes largely unrecognized. As a result, researchers view their role as reviewers as a chore, a time-consuming task required to sustain the ecosystem of research dissemination.
“Several experiments in this space have tried to incorporate online comment systems,” explains Price, “and the result is that putting a comment box online and expecting high quality comments to flood in is just unrealistic. My preference is to come up with a system where you’re just as motivated to share your feedback on a paper as you are to share your own findings.” In order to make this lofty aim a reality, reviewers’ contributions would need to be recognized. “You need something more nuanced, and more qualitative,” says Price. “For example, maybe you gather reputation points from your community online.” Translating such metrics into tangible benefits up the food chain – hirings, tenure decisions, awards – is a broader community shift that will no doubt take time.
A more iterative peer review process could allow the community to better police faulty methods by crowdsourcing their evaluation. “90% of scientific studies are not reproducible,” claims Price; a problem that is exacerbated by the strong bias toward positive results. Journals may be unlikely to publish methodological refutations, but a flurry of well-supported comments attached to a paper online could convince the researchers to marshal more convincing evidence. Typically, this sort of feedback cycle takes years….”

U.S. Secretary of Commerce Penny Pritzker Announces Expansion and Enhancement of Commerce Data Programs


Press Release from the U.S. Secretary of Commerce:Department will hire first-ever Chief Data Officer

As “America’s Data Agency,” the Department of Commerce is prepared and well-positioned to foster the next phase in the open data revolution. In line with President Obama’s Year of Action, U.S. Secretary of Commerce Penny Pritzker today announced a series of steps taken to enhance and expand the data programs at the Department.
“Data is a key pillar of the Department’s “Open for Business Agenda,” and for the first time, we have made it a department-wide strategic priority,” said Secretary of Commerce Penny Pritzker. “No other department can rival the reach, depth and breadth of the Department of Commerce’s data programs. The Department of Commerce is working to unleash more of its data to strengthen the nation’s economic growth; make its data easier to access, understand, and use; and maximize the return of data investments for businesses, entrepreneurs, government, taxpayers, and communities.”
Secretary Pritzker made a number of major announcements today as a special guest speaker at the Environmental Systems Research Institute’s (Esri) User Conference in San Diego, California. She discussed the power and potential of open data, recognizing that data not only enable start-ups and entrepreneurs, move markets, and empower companies large and small, but also touch the lives of Americans every day.
In her remarks, Secretary Pritzker outlined new ways the Department of Commerce is working to unlock the potential of even more open data to make government smarter, including the following:
Chief Data Officer
Today, Secretary Pritzker announced the Commerce Department will hire its first-ever Chief Data Officer. This leader will be responsible for developing and implementing a vision for the future of the diverse data resources at Commerce.
The new Chief Data Officer will pull together a platform for all data sets; instigate and oversee improvements in data collection and dissemination; and ensure that data programs are coordinated, comprehensive, and strategic.
The Chief Data Officer will hold the key to unlocking more government data to help support a data-enabled Department and economy.
Trade Developer Portal
The International Trade Administration has launched its “Developer Portal,” an online toolkit to put diverse sets of trade and investment data in a single place, making it easier for the business community to use and better tap into the 95 percent of American customers that live overseas.
In creating this portal, the Commerce Department is making its data public to software developers, giving them access to authoritative information on U.S. exports and international trade to help U.S. businesses export and expand their operations in overseas markets. The developer community will be able to integrate the data into applications and mashups to help U.S. business owners compete abroad while also creating more jobs here at home.
Data Advisory Council
Open data requires open dialogue. To facilitate this, the Commerce Department is creating a data advisory council, comprised of 15 private sector leaders that will advise the Department on the best use of government data.
This new advisory council will help Commerce maximize the value of its data by:

  • discovering how to deliver data in more usable, timely, and accessible ways;
  • improving how data is utilized and shared to make businesses and governments more responsive, cost-effective, and efficient;
  • better anticipating customers’ needs; and
  • collaborating with the private sector to develop new data products and services.

The council’s primary focus will be on the accessibility and usability of Commerce data, as well as the transformation of the Department’s supporting infrastructure and procedures for managing data.
These data leaders will represent a broad range of business interests—reflecting the wide range of scientific, statistical, and other data that the Department of Commerce produces. Members will serve two-year terms and will meet about four times a year. The advisory council will be housed within the Economics and Statistics Administration.”
Commerce data inform decisions that help make government smarter, keep businesses more competitive and better inform citizens about their own communities – with the potential to guide up to $3.3 trillion in investments in the United States each year.

Do We Choose Our Friends Because They Share Our Genes?


Rob Stein at NPR: “People often talk about how their friends feel like family. Well, there’s some new research out that suggests there’s more to that than just a feeling. People appear to be more like their friends genetically than they are to strangers, the research found.
“The striking thing here is that friends are actually significantly more similar to one another than we were expecting,” says  James Fowler, a professor of medical genetics at the University of California, San Diego, who conducted the study with Nicholas A. Christakis, a social scientist at Yale University.
In fact, the study in Monday’s issue of the Proceedings of the National Academy of Sciences found that friends are as genetically similar as fourth cousins.
“It’s as if they shared a great- great- great-grandparent in common,” Fowler told Shots.
Some of the genes that friends were most likely to have in common involve smell. “We tend to smell things the same way that our friends do,” Fowler says. The study involved nearly 2,000 adults.
This suggests that as humans evolved, the ability to tolerate and be drawn to certain smells may have influenced where people hung out. Today we might call this the Starbucks effect.
“You may really love the smell of coffee. And you’re drawn to a place where other people have been drawn to who also love the smell of coffee,” Fowler says. “And so that might be the opportunity space for you to make friends. You’re all there together because you love coffee and you make friends because you all love coffee.”…”

Social Network Sites as a Mode to Collect Health Data: A Systematic Review


New paper by Fahdah Alshaikh, et al, in J Med Internet Research: “Background: To date, health research literature has focused on social network sites (SNS) either as tools to deliver health care, to study the effect of these networks on behavior, or to analyze Web health content. Less is known about the effectiveness of these sites as a method for collecting data for health research and the means to use such powerful tools in health research.
Objective: The objective of this study was to systematically review the available literature and explore the use of SNS as a mode of collecting data for health research. The review aims to answer four questions: Does health research employ SNS as method for collecting data? Is data quality affected by the mode of data collection? What types of participants were reached by SNS? What are the strengths and limitations of SNS?
Methods: The literature was reviewed systematically in March 2013 by searching the databases MEDLINE, Embase, and PsycINFO, using the Ovid and PubMed interface from 1996 to the third week of March 2013. The search results were examined by 2 reviewers, and exclusion, inclusion, and quality assessment were carried out based on a pre-set protocol.
Results: The inclusion criteria were met by 10 studies and results were analyzed descriptively to answer the review questions. There were four main results. (1) SNS have been used as a data collection tool by health researchers; all but 1 of the included studies were cross-sectional and quantitative. (2) Data quality indicators that were reported include response rate, cost, timeliness, missing data/completion rate, and validity. However, comparison was carried out only for response rate and cost as it was unclear how other reported indicators were measured. (3) The most targeted population were females and younger people. (4) All studies stated that SNS is an effective recruitment method but that it may introduce a sampling bias.
Conclusions: SNS has a role in health research, but we need to ascertain how to use it effectively without affecting the quality of research. The field of SNS is growing rapidly, and it is necessary to take advantage of the strengths of this tool and to avoid its limitations by effective research design. This review provides an important insight for scholars who plan to conduct research using SNS.”

i-teams


New Report and Site from NESTA: “Last year we were aware of the growing trend for governments to set up innovation teams, funds, and labs. Yet who are they? What do they do? And crucially, are they making any difference for their host and partner governments? Together Nesta and Bloomberg Philanthropies set out to answer these questions.
Drawing on an in-depth literature review, over 80 interviews, and  surveys, i-teams tells the stories of 20 teams, units and funds, all are established by government, and all are charged with making innovation happen. The i-teams case studied are based in city, regional and national governments across six continents, and work across the spectrum of innovation – from focusing on incremental improvements to aiming for radical transformations.
The i-teams were all created in recognition that governments need dedicated structures, capabilities and space to allow innovation to happen. Beyond this, the i-teams work in different ways, drawing on a mix of methods, approaches, skills, resources, and tackling challenges as diverse as reducing murder rates to improving education attainment.
The i-teams report details the different ways in which these twenty i-teams operate, but to highlight a few:

  • The Behavioural Insights Team designs trials to test policy ideas, and achieved government savings of around 22 times the cost of the team in the first two years of operation.
  • MindLab is a Danish unit using human centred design as a way to identify problems and develop policy recommendations. One project helped businesses to find the right industry code for registrations and demonstrated a 21:1 return on investment in savings to government and businesses.
  • New Orleans Innovation Delivery Team is based in city hall and is tasked with solving mayoral challenges. Their public safety efforts led to a 20% reduction in the number of murders in 2013 compared to the previous year.
  • PS21 encourages staff to find better ways of improving Singaporean public services. An evaluation of PS21 estimated that over a year it generated 520,000 suggestions from staff, of which approximately 60 per cent were implemented, leading to savings of around £55 million.

Alongside the report we have launched theiteams.org a living map to keep track of i-teams developing and emerging around the world, and to create a network of global government innovators. As James Anderson from Bloomberg Philanthropies says, “There’s no reason for every government to start its innovation efforts from scratch.” There is much we can learn from what is underway, what’s working and what’s not, to ensure all i-teams are using the most cutting edge techniques, methods and approaches….”

Crowdsourcing Parking Lot Occupancy using a Mobile Phone Application


Paper by Davami, Erfan; Sukthankar, Gita available at ASE@360: “Participatory sensing is a specialized form of crowdsourcing for mobile devices in which the users act as sensors to report on local environmental conditions. • This poster describes the process of prototyping a mobile phone crowdsourcing app for monitoring parking availability on a large university campus. • We present a case study of how an agent-based urban model can be used to perform a sensitivity analysis of the comparative susceptibility of different data fusion paradigms to potentially troublesome user behaviors: 1. Poor user enrollment, 2. Infrequent usage, 3. A preponderance of untrustworthy users.”