Setting High and Compatible Standards


Laura Bacon at Omidyar Network:  “…Standards enable interoperability, replicability, and efficiency. Airplane travel would be chaotic at best and deadly at worst if flights and air traffic control did not use common codes for call signs, flight numbers, location, date, and time. Trains that cross national borders need tracks built to a standard gauge as evidenced by Spain’s experience in making its trains interoperable with the rest of the continent’s.

Standards matter in data collection and publication as well.  This is especially true for those datasets that matter most to people’s lives, such as health, education, agriculture, and water. Disparate standards for basic category definitions like geography and organizations mean that data sources cannot be easily or cost-effectively analyzed for cross-comparison and decision making.

Compatible data standards that enable data being ‘joined up,’ would enable more efficacious logging and use of immunization records, controlling the spread of infectious disease, helping educators prioritize spending based on the greatest needs, and identifying the beneficial owners of companies to help ensure transparent and legal business transactions.

Data: More Valuable When Joined Up

Lots of efforts, time, and money are poured into the generation and publication of open data. And where open data is important in itself, the biggest return on investment is potentially from the inter-linkages among datasets. However, it is very difficult to yield this return because of the now-missing standards and building blocks (e.g., geodata, organizational identifiers, project identifiers) that would enable joining up of data.

Omidyar Network currently supports open data standards for contracting, extractives, budgets, and others. If “joining up” work is not considered and executed at early stages, these standards 1) could evolve in silos and 2) may not reach their full capacity.

Interoperability will not happen automatically; specific investments and efforts must be made to develop the public good infrastructure for the joining up of key datasets….The two organizations leading this project have an impressive track record working in this area. Development Initiatives is a global organization working to empower people to make more effective use of information. In 2013, it commissioned Open Knowledge Foundation to publish a cross-initiative scoping study, Joined-Up Data: Building Blocks for Common Standards, which recommended focus areas, shared learning, and the adoption of joined-up data and common standards for all publishers. Partnering with Development Initiatives is Publish What You Fund,…(More)”

From Governmental Open Data Toward Governmental Open Innovation (GOI)


Chapter by Daniele Archibugi et al in The Handbook of Global Science, Technology, and Innovation: “Today, governments release governmental data that were previously hidden to the public. This democratization of governmental open data (OD) aims to increase transparency but also fuels innovation. Indeed, the release of governmental OD is a global trend, which has evolved into governmental open innovation (GOI). In GOI, governmental actors purposively manage the knowledge flows that span organizational boundaries and reveal innovation-related knowledge to the public with the aim to spur innovation for a higher economic and social welfare at regional, national, or global scale. GOI subsumes different revealing strategies, namely governmental OD, problem, and solution revealing. This chapter introduces the concept of GOI that has evolved from global OD efforts. It present a historical analysis of the emergence of GOI in four different continents, namely, Europe (UK and Denmark), North America (United States and Mexico), Australia, and China to highlight the emergence of GOI at a global scale….(More)”

Citizen Sensor Data Mining, Social Media Analytics and Applications


Paper by Amit P. Sheth: “With the rapid rise in the popularity of social media (1B+ Facebook users, 200M+ twitter users), and near ubiquitous mobile access (4+ billion actively-used mobile phones), the sharing of observations and opinions has become common-place (500M+ tweets a day). This has given us an unprecedented access to the pulse of a populace and the ability to perform analytics on social data to support a variety of socially intelligent applications — be it for brand tracking and management, crisis coordination, organizing revolutions or promoting social development in underdeveloped and developing countries. I will review: 1) understanding and analysis of informal text, esp. microblogs (e.g., issues of cultural entity extraction and role of semantic/background knowledge enhanced techniques), and 2) how we built Twitris, a comprehensive social media analytics (social intelligence) platform. I will describe the analysis capabilities along three dimensions: spatio-temporal-thematic, people-content-network, and sentiment-emption-intent. I will couple technical insights with identification of computational techniques and real-world examples using live demos of Twitris….(More)”

Local Governments Need Financial Transparency Tools


Cities of the Future: “Comprehensive financial transparency — allowing anyone to look up the allocation of budgets, expenses by department, and even the ledger of each individual expense as it happens — can help local governments restore residents’ confidence, help manage the budget efficiently and make more informed decisions for new projects and money allocation.

A few weeks ago, we had municipal elections in Spain. Many local governments changed hands and the new administrations had to review the current budgets, see where money was being spent and, on occasion, discovered expenses they were not expecting.

As costs rise and cities find it more difficult to provide the same services without raising taxes, citizens among others are demanding full disclosure of income and expenses.

Tools such as OpenGov platform are helping cities accomplish that goal…Earlier this year the city of Beaufort (pop. 13,000), South Carolina’s second oldest city known for its historic charm and moss-laden oak trees, decided to implement OpenGov. It rolled out the platform to the public last February, becoming the first city in the State to provide the public with in-depth, comprehensive financial data (spanning five budget years).

The reception by the city council and residents was extremely positive. Residents can now look up where their tax money goes down to itemized expenses. They can also see up-to-date charts of every part of the budget, how it is being spent, and what remains to be used. City council members can monitor the administration’s performance and ask informed questions at town meetings about the budget use, right down to the smallest expenses….

Many cities are now implementing open data tools to share information on different aspects of city services, such as transit information, energy use, water management, etc. But those tools are difficult to use and do not provide comprehensive financial information about the use of public money. …(More)”

‘Smart Cities’ Will Know Everything About You


Mike Weston in the Wall Street Journal: “From Boston to Beijing, municipalities and governments across the world are pledging billions to create “smart cities”—urban areas covered with Internet-connected devices that control citywide systems, such as transit, and collect data. Although the details can vary, the basic goal is to create super-efficient infrastructure, aid urban planning and improve the well-being of the populace.

A byproduct of a tech utopia will be a prodigious amount of data collected on the inhabitants. For instance, at the company I head, we recently undertook an experiment in which some staff volunteered to wear devices around the clock for 10 days. We monitored more than 170 metrics reflecting their daily habits and preferences—including how they slept, where they traveled and how they felt (a fast heart rate and no movement can indicate excitement or stress).

If the Internet age has taught us anything, it’s that where there is information, there is money to be made. With so much personal information available and countless ways to use it, businesses and authorities will be faced with a number of ethical questions.

In a fully “smart” city, every movement an individual makes can be tracked. The data will reveal where she works, how she commutes, her shopping habits, places she visits and her proximity to other people. You could argue that this sort of tracking already exists via various apps and on social-media platforms, or is held by public-transport companies and e-commerce sites. The difference is that with a smart city this data will be centralized and easy to access. Given the value of this data, it’s conceivable that municipalities or private businesses that pay to create a smart city will seek to recoup their expenses by selling it….

Recent history—issues of privacy and security on social networks and chatting apps, and questions about how intellectual-property regulations apply online—has shown that the law has been slow to catch up with digital innovations. So businesses that can purchase smart-city data will be presented with many strategic and ethical concerns.

What degree of targeting is too specific and violates privacy? Should businesses limit the types of goods or services they offer to certain individuals? Is it ethical for data—on an employee’s eating habits, for instance—to be sold to employers or to insurance companies to help them assess claims? Do individuals own their own personal data once it enters the smart-city system?

With or without stringent controlling legislation, businesses in a smart city will need to craft their own policies and procedures regarding the use of data. A large-scale misuse of personal data could provoke a consumer backlash that could cripple a company’s reputation and lead to monster lawsuits. An additional problem is that businesses won’t know which individuals might welcome the convenience of targeted advertising and which will find it creepy—although data science could solve this equation eventually by predicting where each individual’s privacy line is.

A smart city doesn’t have to be as Orwellian as it sounds. If businesses act responsibly, there is no reason why what sounds intrusive in the abstract can’t revolutionize the way people live for the better by offering services that anticipates their needs; by designing ultraefficient infrastructure that makes commuting a (relative) dream; or with a revolutionary approach to how energy is generated and used by businesses and the populace at large….(More)”

Cities show how to make open data usable


Bianca Spinosa at GCN: “Government agencies have no shortage of shareable data. Data.gov, the open-data clearinghouse that launched in May 2009, had more than 147,331 datasets as of mid-July, and state and local governments are joining federal agencies in releasing ever-broader arrays of information.

The challenge, however, remains making all that data usable. Obama administration officials like to talk about how the government’s weather data supports forecasting and analysis that support businesses and help Americans every day. But relatively few datasets do more than just sit there, and fewer still are truly accessible for the average person.

At the federal level, that’s often because agency missions do not directly affect citizens the way that local governments do. Nevertheless, every agency has customers and communities of interest, and there are lessons feds can learn from how cities are sharing their data with the public.

One such model is Citygram. The app links to a city’s open-data platform and sends subscribers a weekly text or email message about selected activities in their neighborhoods. Charlotte officials worked closely with Code for America fellows to develop the software, and the app launched in December 2014 in that city and in Lexington, Ky.

Three other cities – New York, Seattle, and San Francisco – have since joined, and Orlando, Fla.; Honolulu; the Research Triangle area of North Carolina; and Montgomery County, Md., are considering doing so.

Citygram “takes open data and transforms it, curates it and translates it into human speech,” said Twyla McDermott, Charlotte’s corporate IT program manager. “People want to know what’s happening around them.”

Demonstrating real-world utility

People in the participating cities can go to Citygram.org, select their city and choose topics of interest (such as pending rezonings or new business locations). Then they enter their address and a radius to consider “nearby” and finally select either text or email for their weekly notifications.

Any city government can use the technology, which is open source and freely available on GitHub. San Francisco put its own unique spin on the app by allowing subscribers to sign up for notifications on tree plantings. With Citygram NYC, New Yorkers can find information on vehicle collisions within a radius of up to 4 miles….(More)”

The data or the hunch


Ian Leslie at Intelligent Life: “THE GIFT FOR talent-spotting is mysterious, highly prized and celebrated. We love to hear stories about the baseball coach who can spot the raw ability of an erratic young pitcher, the boss who sees potential in the guy in the post room, the director who picks a soloist out of the chorus line. Talent shows are a staple of the TV schedules. We like to believe that certain people—sometimes ourselves—can just sense when a person has something special. But there is another method of spotting talent which doesn’t rely on hunches. In place of intuition, it offers data and analysis. Rather than relying on the gut, it invites us to use our heads. It tends not to make for such romantic stories, but it is effective—which is why, despite our affection, the hunch is everywhere in retreat.

Strike one against the hunch was the publication of “Moneyball” by Michael Lewis (2003), which has attained the status of a management manual for many in sport and beyond. Lewis reported on a cash-strapped major-league baseball team, the Oakland A’s, who enjoyed unlikely success against bigger and better-funded competitors. Their secret sauce was data. Their general manager, Billy Beane, had realised that when it came to evaluating players, the gut instincts of experienced baseball scouts were unreliable, and he employed statisticians to identify talent overlooked by the big clubs…..

These days, when a football club is interested in a player, it considers the average distance he runs in a game, the number of passes and tackles or blocks he makes, his shots on goal, the ratio of goals to shots, and many other details nobody thought to measure a generation ago. Sport is far from the only industry in which talent-spotting is becoming a matter of measurement. Prithwijit Mukerji, a postgraduate at the University of Westminster in London, recently published a paper on the way the music industry is being transformed by “the Moneyball approach”. By harvesting data from Facebook and Twitter and music services like Spotify and Shazam, executives can track what we are listening to in far more detail than ever before, and use it as a guide to what we will listen to next….

This is the day of the analyst. In education, academics are working their way towards a reliable method of evaluating teachers, by running data on test scores of pupils, controlled for factors such as prior achievement and raw ability. The methodology is imperfect, but research suggests that it’s not as bad as just watching someone teach. A 2011 study led by Michael Strong at the University of California identified a group of teachers who had raised student achievement and a group who had not. They showed videos of the teachers’ lessons to observers and asked them to guess which were in which group. The judges tended to agree on who was effective and ineffective, but, 60% of the time, they were wrong. They would have been better off flipping a coin. This applies even to experts: the Gates Foundation funded a vast study of lesson observations, and found that the judgments of trained inspectors were highly inconsistent.

THE LAST STRONGHOLD of the hunch is the interview. Most employers and some universities use interviews when deciding whom to hire or admit. In a conventional, unstructured interview, the candidate spends half an hour or so in a conversation directed at the whim of the interviewer. If you’re the one deciding, this is a reassuring practice: you feel as if you get a richer impression of the person than from the bare facts on their résumé, and that this enables you to make a better decision. The first theory may be true; the second is not.

Decades of scientific evidence suggest that the interview is close to useless as a tool for predicting how someone will do a job. Study after study has found that organisations make better decisions when they go by objective data, like the candidate’s qualifications, track record and performance in tests. “The assumption is, ‘if I meet them, I’ll know’,” says Jason Dana, of Yale School of Management, one of many scholars who have looked into the interview’s effectiveness. “People are wildly over-confident in their ability to do this, from a short meeting.” When employers adopt a holistic approach, combining the data with hunches formed in interviews, they make worse decisions than they do going on facts alone….” (More)

Geek Heresy


Book by Kentaro Toyama “…, an award-winning computer scientist, moved to India to start a new research group for Microsoft. Its mission: to explore novel technological solutions to the world’s persistent social problems. Together with his team, he invented electronic devices for under-resourced urban schools and developed digital platforms for remote agrarian communities. But after a decade of designing technologies for humanitarian causes, Toyama concluded that no technology, however dazzling, could cause social change on its own.

Technologists and policy-makers love to boast about modern innovation, and in their excitement, they exuberantly tout technology’s boon to society. But what have our gadgets actually accomplished? Over the last four decades, America saw an explosion of new technologies – from the Internet to the iPhone, from Google to Facebook – but in that same period, the rate of poverty stagnated at a stubborn 13%, only to rise in the recent recession. So, a golden age of innovation in the world’s most advanced country did nothing for our most prominent social ill.

Toyama’s warning resounds: Don’t believe the hype! Technology is never the main driver of social progress. Geek Heresy inoculates us against the glib rhetoric of tech utopians by revealing that technology is only an amplifier of human conditions. By telling the moving stories of extraordinary people like Patrick Awuah, a Microsoft millionaire who left his lucrative engineering job to open Ghana’s first liberal arts university, and Tara Sreenivasa, a graduate of a remarkable South Indian school that takes children from dollar-a-day families into the high-tech offices of Goldman Sachs and Mercedes-Benz, Toyama shows that even in a world steeped in technology, social challenges are best met with deeply social solutions….(More)”

Using Twitter as a data source: An overview of current social media research tools


Wasim Ahmed at the LSE Impact Blog: “I have a social media research blog where I find and write about tools that can be used to capture and analyse data from social media platforms. My PhD looks at Twitter data for health, such as the Ebola outbreak in West Africa. I am increasingly asked why I am looking at Twitter, and what tools and methods there are of capturing and analysing data from other platforms such as Facebook, or even less traditional platforms such as Amazon book reviews. Brainstorming a couple of responses to this question by talking to members of the New Social Media New Social Science network, there are at least six reasons:

  1. Twitter is a popular platform in terms of the media attention it receives and it therefore attracts more research due to its cultural status
  2. Twitter makes it easier to find and follow conversations (i.e., by both its search feature and by tweets appearing in Google search results)
  3. Twitter has hashtag norms which make it easier gathering, sorting, and expanding searches when collecting data
  4. Twitter data is easy to retrieve as major incidents, news stories and events on Twitter are tend to be centred around a hashtag
  5. The Twitter API is more open and accessible compared to other social media platforms, which makes Twitter more favourable to developers creating tools to access data. This consequently increases the availability of tools to researchers.
  6. Many researchers themselves are using Twitter and because of their favourable personal experiences, they feel more comfortable with researching a familiar platform.

It is probable that a combination of response 1 to 6 have led to more research on Twitter. However, this raises another distinct but closely related question: when research is focused so heavily on Twitter, what (if any) are the implications of this on our methods?

As for the methods that are currently used in analysing Twitter data i.e., sentiment analysis, time series analysis (examining peaks in tweets), network analysis etc., can these be applied to other platforms or are different tools, methods and techniques required? In addition to qualitative methods such as content analysis, I have used the following four methods in analysing Twitter data for the purposes of my PhD, below I consider whether these would work for other social media platforms:

  1. Sentiment analysis works well with Twitter data, as tweets are consistent in length (i.e., <= 140) would sentiment analysis work well with, for example Facebook data where posts may be longer?
  2. Time series analysis is normally used when examining tweets overtime to see when a peak of tweets may occur, would examining time stamps in Facebook posts, or Instagram posts, for example, produce the same results? Or is this only a viable method because of the real-time nature of Twitter data?
  3. Network analysis is used to visualize the connections between people and to better understand the structure of the conversation. Would this work as well on other platforms whereby users may not be connected to each other i.e., public Facebook pages?
  4. Machine learning methods may work well with Twitter data due to the length of tweets (i.e., <= 140) but would these work for longer posts and for platforms that are not text based i.e., Instagram?

It may well be that at least some of these methods can be applied to other platforms, however they may not be the best methods, and may require the formulation of new methods, techniques, and tools.

So, what are some of the tools available to social scientists for social media data? In the table below I provide an overview of some the tools I have been using (which require no programming knowledge and can be used by social scientists):…(More)”

Transforming City Governments for Successful Smart Cities


New book edited by Rodríguez-Bolívar, Manuel Pedro: “There has been much attention paid to the idea of Smart Cities as researchers have sought to define and characterize the main aspects of the concept, including the role of creative industries in urban growth, the importance of social capital in urban development, and the role of urban sustainability. This book develops a critical view of the Smart City concept, the incentives and role of governments in promoting the development of Smart Cities and the analysis of experiences of e-government projects addressed to enhance Smart Cities. This book further analyzes the perceptions of stakeholders, such as public managers or politicians, regarding the incentives and role of governments in Smart Cities and the critical analysis of e-government projects to promote Smart Cities’ development, making the book valuable to academics, researchers, policy-makers, public managers, international organizations and technical experts in understanding the role of government to enhance Smart Cities’ projects….(More)”