Big Talk about Big Data: Discourses of ‘Evidence’ and Data in British Civil Society


From the Digital Economy “Communities and Culture” Network: “The term ‘Big Data’ carries a great deal of currency in business and academic spheres. Data and their subsequent analysis are obviously not new. ‘Bigness’ in this context often refers to three characteristics that differentiate it from so-called ‘small’ data: volume, variety, and velocity. These three attributes of ‘bigness’, promising to open novel, macro-level perspectives on complex issues (Boyd and Crawford 2011), led enthusiasts like Chris Anderson to claim that ‘with enough data, the numbers speak for themselves”. But is this actually the case? Critical voices like Manovich (2011) argue that data never exist in ‘raw’ forms but are rather influenced by humans who—whether intentionally or not—select and construct them in certain ways.
These debates about data are relevant to wider discussions about digital change in society because they point to a more general concern about the potential of all sizes of data to selectively reveal dimensions of social phenomena on which decisions or policies are based. Crucially, if data generation and analysis is not entirely neutral but rather carries assumptions about what is ‘worthwhile’ or ‘acceptable’ to measure in the first place, then it raises critical questions of whether preferences for certain types of research—particularly work conducted under the auspices of a Big Data ‘brand’—reflect coherent sets of values and worldviews. What assumptions underpin preferences for ‘evidence-based’ research based on data? What qualities does such a phrase signify or confer to research? Which ‘sizes’ of data qualify as ‘evidence’ in the first place, or, to play on Anderson’s words, what kinds of data are allowed to speak for themselves in the realms of policy, media, and advocacy?
Hosted at the ESRC Centre on Migration, Policy, and Society (COMPAS) and The Migration Observatory at the University of Oxford, this project critically interrogates the values that inform demands by civil society organisations for research that is ‘data-driven’ or ‘evidence-based’. Specifically, it aims to document the extent to which perceived advantages of data ‘bigness’ (volume, variety, and velocity) influence these demands.
Read the report.

Smart Inclusive Cities: How New Apps, Big Data, and Collaborative Technologies Are Transforming Immigrant Integration


New report by Meghan Benton for the Migration Policy Institute: “The spread of smartphones—cellphones with high-speed Internet access and geolocation technology—is transforming urban life. While many smartphone apps are largely about convenience, policymakers are beginning to explore their potential to address social challenges from disaster response to public health. And cities, in North America and Europe alike, are in the vanguard in exploring creative uses for these apps, including how to improve engagement.
For disadvantaged and diverse populations, accessing city services through a smartphone can help overcome language or literacy barriers and thus increase interactions with city officials. For those with language needs, smartphones allow language training to be accessed anywhere and at any time. More broadly, cities have begun mining the rich datasets that smartphones collect, to help attune services to the needs of their whole population. A new crop of social and civic apps offer new tools to penetrate hard-to-reach populations, including newly arrived and transient groups.
While these digital developments offer promising opportunities for immigrant integration efforts, smartphone apps’ potential to address social problems should not be overstated. In spite of potential shortcomings, since immigrant integration requires a multipronged policy response, any additional tools—especially inexpensive ones—should be examined.
This report explores the kinds of opportunities smartphones and apps are creating for the immigrant integration field. It provides a first look at the opportunities and tradeoffs that smartphones and emerging technologies offer for immigrant integration, and how they might deepen—or weaken—city residents’ sense of belonging…” (Download Report)

An Infographic That Maps 2,000 Years of Cultural History in 5 Minutes


in Wired:  “…Last week in the journal Science, the researchers (led by University of Texas art historian Maximilian Schich) published a study that looked at the cultural history of Europe and North America by mapping the birth and deaths of more than 150,000 notable figures—including everyone from Leonardo Da Vinci to Ernest Hemingway. That data was turned into an amazing animated infographic that looks strikingly similar to the illustrated flight paths you find in the back of your inflight magazine. Blue dots indicate a birth, red ones means death.

The researchers used data from Freebase, which touts itself as a “community curated database of people, places and things.” This gives the data a strong western-bent. You’ll notice that many parts of Asia and the Middle East (not to mention pre-colonized North America), are almost wholly ignored in this video. But to be fair, the abstract did acknowledge that the study was focused mainly on Europe and North America.
Still, mapping the geography of cultural migration does gives you some insight about how the kind of culture we value has shifted over the centuries. It’s also a novel lens through which to view our more general history, as those migration trends likely illuminate bigger historical happenings like wars and the building of cross-country infrastructure.

The Data Act's unexpected benefit


Adam Mazmanian at FCW: “The Digital Accountability and Transparency Act sets an aggressive schedule for creating governmentwide financial standards. The first challenge belongs to the Treasury Department and the Office of Management and Budget. They must come up with a set of common data elements for financial information that will cover just about everything the government spends money on and every entity it pays in order to give oversight bodies and government watchdogs a top-down view of federal spending from appropriation to expenditure. Those data elements are scheduled for completion by May 2015, one year after the act’s passage.
Two years after those standards are in place, agencies will be required to report their financial information following Data Act guidelines. The government currently supports more than 150 financial management systems but lacks a common data dictionary, so there are not necessarily agreed-upon definitions of how to classify and track government programs and types of expenditures.
“As far as systems today and how we can get there, they don’t necessarily map in the way that the act described,” U.S. CIO Steven VanRoekel said in June. “It’s going to be a journey to get to where the act aspires for us to be.”
However, an Obama administration initiative to encourage agencies to share financial services could be part of the solution. In May, OMB and Treasury designated four financial shared-services providers for government agencies: the Agriculture Department’s National Finance Center, the Interior Department’s Interior Business Center, the Transportation Department’s Enterprise Services Center and Treasury’s Administrative Resource Center.
There are some synergies between shared services and data standardization, but shared financial services alone will not guarantee Data Act compliance, especially considering that the government expects the migration to take 10 to 15 years. Nevertheless, the discipline required under the Data Act could boost agency efforts to prepare financial data when it comes time to move to a shared service….”

The Emerging Science of Computational Anthropology


Emerging Technology From the arXiv: The increasing availability of big data from mobile phones and location-based apps has triggered a revolution in the understanding of human mobility patterns. This data shows the ebb and flow of the daily commute in and out of cities, the pattern of travel around the world and even how disease can spread through cities via their transport systems.
So there is considerable interest in looking more closely at human mobility patterns to see just how well it can be predicted and how these predictions might be used in everything from disease control and city planning to traffic forecasting and location-based advertising.
Today we get an insight into the kind of detailed that is possible thanks to the work of Zimo Yang at Microsoft research in Beijing and a few pals. These guys start with the hypothesis that people who live in a city have a pattern of mobility that is significantly different from those who are merely visiting. By dividing travelers into locals and non-locals, their ability to predict where people are likely to visit dramatically improves.
Zimo and co begin with data from a Chinese location-based social network called Jiepang.com. This is similar to Foursquare in the US. It allows users to record the places they visit and to connect with friends at these locations and to find others with similar interests.
The data points are known as check-ins and the team downloaded more than 1.3 million of them from five big cities in China: Beijing, Shanghai, Nanjing, Chengdu and Hong Kong. They then used 90 per cent of the data to train their algorithms and the remaining 10 per cent to test it. The Jiapang data includes the users’ hometowns so it’s easy to see whether an individual is checking in in their own city or somewhere else.
The question that Zimo and co want to answer is the following: given a particular user and their current location, where are they most likely to visit in the near future? In practice, that means analysing the user’s data, such as their hometown and the locations recently visited, and coming up with a list of other locations that they are likely to visit based on the type of people who visited these locations in the past.
Zimo and co used their training dataset to learn the mobility pattern of locals and non-locals and the popularity of the locations they visited. The team then applied this to the test dataset to see whether their algorithm was able to predict where locals and non-locals were likely to visit.
They found that their best results came from analysing the pattern of behaviour of a particular individual and estimating the extent to which this person behaves like a local. That produced a weighting called the indigenization coefficient that the researchers could then use to determine the mobility patterns this person was likely to follow in future.
In fact, Zimo and co say they can spot non-locals in this way without even knowing their home location. “Because non-natives tend to visit popular locations, like the Imperial Palace in Beijing and the Bund in Shanghai, while natives usually check in around their homes and workplaces,” they add.
The team say this approach considerably outperforms the mixed algorithms that use only individual visiting history and location popularity. “To our surprise, a hybrid algorithm weighted by the indigenization coefficients outperforms the mixed algorithm accounting for additional demographical information.”
It’s easy to imagine how such an algorithm might be useful for businesses who want to target certain types of travelers or local people. But there is a more interesting application too.
Zimo and co say that it is possible to monitor the way an individual’s mobility patterns change over time. So if a person moves to a new city, it should be possible to see how long it takes them to settle in.
One way of measuring this is in their mobility patterns: whether they are more like those of a local or a non-local. “We may be able to estimate whether a non-native person will behave like a native person after a time period and if so, how long in average a person takes to become a native-like one,” say Zimo and co.
That could have a fascinating impact on the way anthropologists study migration and the way immigrants become part of a local community. This is computational anthropology a science that is clearly in its early stages but one that has huge potential for the future.”
Ref: arxiv.org/abs/1405.7769 : Indigenization of Urban Mobility

Focus on Migration: A tech ‘wiki’ site could improve lives


Max Martin in SciDev: “Wikipedia is probably the best example of a website that allows users to share and edit information in real time. But several other sites based on the ‘wiki’ model provide a sharing platform specifically for technologies that could help improve lives in the developing world.
One such site, Appropedia, is aimed at collaborative solutions in sustainability, appropriate technology and poverty reduction. Appropedia has had 50 million hits since its 2006 inception and is getting a facelift that will allow it to reach more people.
Such a one-stop information point offers tremendous scope for informing people on the move about green, low-cost and locally owned technologies. A website like Appropedia could function as a clearing house for information on technologies that could make life easier for migrants who are forced to travel and live rough in poor settings — as long as the information is reliable.
For example, displaced people building new homes after a disaster has struck face many choices over the materials they use, as I’ve written previously. The wiki site could be a place for them to swap experiences and learn what has worked for others in different settings.
It could also host advice for people on the move about affordable transport, healthcare and humanitarian aid locations, plus tips for staying safe while travelling in unfamiliar territory and what to pack when camping out in the open.
It could also help channel relevant innovations from other settings to migrants. For example, some villagers in flood-prone areas of Bangladesh grow crops on ‘floating gardens’ made using bamboo-pole rafts lined with soil water hyacinths and cow dung. [1] A local group in India’s frequently flooded Bihar state has shown how to make a life jacket using just plastic bottles, sticky tape, fast-drying cotton and thread. [2] Both of these concepts could be useful for other peoples affected by floods and a dedicated wiki could help disseminate know-how and review the technologies’ safety, reliability and suitability for different locations.
Of course, an information wiki for migrants must offer reliable information. This could be achieved by involving a specialist agency or a consortium of humanitarian groups who could invite experts and local practitioners to review and edit posts.”

AU: Govt finds one third of open data was "junk"


IT News: “The number of datasets available on the Government’s open data website has slimmed by more than half after the agency discovered one third of the datasets were junk.
Since its official launch in 2011 data.gov.au grew to hold 1200 datasets from government agencies for public consumption.
In July this year the Deaprtment of Finance migrated the portal to a new open source platform – the Open Knowledge Foundation CKAN platform – for greater ease of use and publishing ability.
Since July the number of datasets fell from 1200 to 500.
Australian Government CTO John Sheridan said in his blog late yesterday the agency had needed to review the 1200 datasets as a result of the CKAN migration, and discovered a significant amount of them were junk.
“We unfortunately found that a third of the “datasets” were just links to webpages or files that either didn’t exist anymore, or redirected somewhere not useful to genuine seekers of data,” Sheridan said.
“In the second instance, the original 1200 number included each individual file. On the new platform, a dataset may have multiple files. In one case we have a dataset with 200 individual files where before it was counted as 200 datasets.”
The number of datasets following the clean out now sits at 529. Around 123 government bodies contributed data to the portal.
Sheridan said the number was still too low.
“A lot of momentum has built around open data in Australia, including within governments around the country and we are pleased to report that a growing number of federal agencies are looking at how they can better publish data to be more efficient, improve policy development and analysis, deliver mobile services and support greater transparency and public innovation,” he said….
The Federal Government’s approach to open data has previously been criticised as “patchy” and slow, due in part to several shortcomings in the data.gov.au website as well as slow progress in agencies adopting an open approach by default.
The Australian Information Commissioner’s February report on open data in government outlined the manual uploading and updating of datasets, lack of automated entry for metadata and a lack of specific search functions within data.gov.au as obstacles affecting the efforts pushing a whole-of-government approach to open data.
The introduction of the new CKAN platform is expected to go some way to addressing the highlighted concerns.”