Smart Cities, Smarter Citizens


Free eBook courtesy of PTC.com: “The smart city movement is on a roll. Technology leaders are looking to transform major cities through advanced computer technologies, sensors, high-speed data networks, predictive analytics, big data, and IoT. But, as Mike Barlow explains in this O’Reilly report, the story goes beyond technology. Citizens, too, will need to play a large role in turning cities into smart, livable environments.

According to a United Nations report, by 2050 two-thirds of humanity will live in more than 40 mega-cities of 10 million people each. All of them will need to determine how to deliver more services with fewer resources. Cities will have to improve efficiency and reduce expenditures wherever possible, through new technologies and other means.

To create a thriving environment where innovation can blossom, citizens will not only be called upon to generate much of the data, but they’ll also need to be at the center of decision-making, based on what that data reveals.

Download this report today, and learn about the progress that various groups and organizations have already made in major cities around the world, and what lies ahead for all of us….(More)”.

Government 3.0 – Next Generation Government Technology Infrastructure and Services


Book edited by Adegboyega  Ojo and Jeremy Millard: “Historically, technological change has had significant effect on the locus of administrative activity, cost of carrying out administrative tasks, the skill sets needed by officials to effectively function, rules and regulations, and the types of interactions citizens have with their public authorities. Next generation Public Sector Innovation will be  “Government 3.0” powered by innovations related to Open and big data, administrative and business process management, Internet-of-Things and blockchains for public sector innovation to drive improvements in service delivery, decision and policy making and resource management. This book provides fresh insights into this transformation while also examining possible negative side effects of the increasing ope

nness of governments through the adoption of these new innovations. The goal is for technology policy makers to engage with the visions of Government 3.0 . Researchers should be able to critically examine some of the innovations described in the book as the basis for developing research agendas related to challenges associated with the adoption and use of some of the associated technologies.  The book serves as a rich source of materials from leading experts in the field that enables Public administration practitioners to better understand how these new technologies impact traditional public administration paradigms. The book is suitable for graduate courses in Public Sector Innovation, Innovation in Public Administration, E-Government and Information Systems. Public sector technology policy makers, e-government, information systems and public administration researchers and practitioners should all benefit from reading this book….(More).”

Out of the Syrian crisis, a data revolution takes shape


Amy Maxmen in Nature: “…Whenever war, hurricanes or other disasters ravage part of the globe, one of the biggest problems for aid organizations is a lack of reliable data. People die because front-line responders don’t have the information they need to act efficiently. Doctors and epidemiologists plod along with paper surveys and rigid databases in crisis situations, watching with envy as tech companies expertly mine big data for comparatively mundane purposes.

Three years ago, one frustrated first-responder decided to do something about it. The result is an innovative piece of software called the Dharma Platform, which almost anyone can use to rapidly collect information and share, analyse and visualize it so that they can act quickly. And although public-health veterans tend to be sceptical of technological fixes, Dharma is winning fans. MSF and other organizations now use it in 22 countries. And so far, the Rise Fund, a ‘global impact fund’ whose board boasts U2 lead singer Bono, has invested US$14.3 million in the company behind it.

“I think Dharma is special because it has been developed by people who have worked in these chaotic situations,” says Jeremy Farrar, director of biomedical-funding charity the Wellcome Trust in London, “and it’s been road-tested and improved in the midst of reality.”

Now, the ultimate trial is in Syria: Salim, whose name has been changed in this story to protect him, started entering patient records into the Dharma Platform in March, and he is looking at health trends even as he shares his data securely with MSF staff in Amman.

It’s too soon to say that Dharma has transformed his hospital. And some aid organizations and governments may be reluctant to adopt it. But Aziz, who has deployed Dharma in Iraq, Syria, Jordan and Turkey, is confident that it will usher in a wave of platforms that accelerate evidence-based responses in emergencies, or even in health care generally. “This is like the first version of the iPhone or Yahoo! Messenger,” he says. “Maybe something better will come along, but this is the direction we’re going in.”…(More)”

China harnesses big data to buttress the power of the state


James Kynge in the Financial Times: “…Over the period of “reform and opening” since the late 1970s, China has generally sought to “bide its time and hide its strength”. But no longer. At the congress, Xi Jinping, the president, presented “socialism with Chinese characteristics” as a “a new choice” for developing nations to follow. But what lends heft to this globalist intent are technological advances that are already invigorating the Chinese economy and may also help to address some of the historic failings of the country’s polity.

The data revolution is fusing with China’s party-state to create a potential “techno-tatorship”; a hybrid strain in which rigid political control can coexist with ample free-market flexibility….

First of all, he said, the big ecommerce companies, such as Alibaba, Tencent and JD.com, are obliged to share their data with central authorities such as the People’s Bank of China (PBoC), the central bank. Then the PBoC shares the data with about 50 state-owned banks, creating a database that covers about 400m people, detailing their payment history, creditworthiness and even networks of social contacts, the official said.

“We have already seen that the number of bad debts being built up by households has come down sharply since we launched this system,” said the official. “People really care about their credit scores because those with bad scores have reduced access to financial services.”…
To be sure, data-centric approaches to governance can have shortcomings. The data can be ignored or manipulated by humans, or privileged institutions can lobby for special treatment using old fashioned political leverage. But some Chinese see a big opportunity. Economists Wang Binbin and Li Xiaoyan argue in a paper that the marriage of big data and central planning creates a potent new hybrid….(More)”.

Better Data for Better Policy: Accessing New Data Sources for Statistics Through Data Collaboratives


Medium Blog by Stefaan Verhulst: “We live in an increasingly quantified world, one where data is driving key business decisions. Data is claimed to be the new competitive advantage. Yet, paradoxically, even as our reliance on data increases and the call for agile, data-driven policy making becomes more pronounced, many Statistical Offices are confronted with shrinking budgets and an increased demand to adjust their practices to a data age. If Statistical Offices fail to find new ways to deliver “evidence of tomorrow”, by leveraging new data sources, this could mean that public policy may be formed without access to the full range of available and relevant intelligence — as most business leaders have. At worst, a thinning evidence base and lack of rigorous data foundation could lead to errors and more “fake news,” with possibly harmful public policy implications.

While my talk was focused on the key ways data can inform and ultimately transform the full policy cycle (see full presentation here), a key premise I examined was the need to access, utilize and find insight in the vast reams of data and data expertise that exist in private hands through the creation of new kinds of public and private partnerships or “data collaboratives” to establish more agile and data-driven policy making.

Screen Shot 2017-10-20 at 5.18.23 AM

Applied to statistics, such approaches have already shown promise in a number of settings and countries. Eurostat itself has, for instance, experimented together with Statistics Belgium, with leveraging call detail records provided by Proximus to document population density. Statistics Netherlands (CBS) recently launched a Center for Big Data Statistics (CBDS)in partnership with companies like Dell-EMC and Microsoft. Other National Statistics Offices (NSOs) are considering using scanner data for monitoring consumer prices (Austria); leveraging smart meter data (Canada); or using telecom data for complementing transportation statistics (Belgium). We are now living undeniably in an era of data. Much of this data is held by private corporations. The key task is thus to find a way of utilizing this data for the greater public good.

Value Proposition — and Challenges

There are several reasons to believe that public policy making and official statistics could indeed benefit from access to privately collected and held data. Among the value propositions:

  • Using private data can increase the scope and breadth and thus insights offered by available evidence for policymakers;
  • Using private data can increase the quality and credibility of existing data sets (for instance, by complementing or validating them);
  • Private data can increase the timeliness and thus relevance of often-outdated information held by statistical agencies (social media streams, for example, can provide real-time insights into public behavior); and
  • Private data can lower costs and increase other efficiencies (for example, through more sophisticated analytical methods) for statistical organizations….(More)”.

Linux Foundation Debuts Community Data License Agreement


Press Release: “The Linux Foundation, the nonprofit advancing professional open source management for mass collaboration, today announced the Community Data License Agreement(CDLA) family of open data agreements. In an era of expansive and often underused data, the CDLA licenses are an effort to define a licensing framework to support collaborative communities built around curating and sharing “open” data.

Inspired by the collaborative software development models of open source software, the CDLA licenses are designed to enable individuals and organizations of all types to share data as easily as they currently share open source software code. Soundly drafted licensing models can help people form communities to assemble, curate and maintain vast amounts of data, measured in petabytes and exabytes, to bring new value to communities of all types, to build new business opportunities and to power new applications that promise to enhance safety and services.

The growth of big data analytics, machine learning and artificial intelligence (AI) technologies has allowed people to extract unprecedented levels of insight from data. Now the challenge is to assemble the critical mass of data for those tools to analyze. The CDLA licenses are designed to help governments, academic institutions, businesses and other organizations open up and share data, with the goal of creating communities that curate and share data openly.

For instance, if automakers, suppliers and civil infrastructure services can share data, they may be able to improve safety, decrease energy consumption and improve predictive maintenance. Self-driving cars are heavily dependent on AI systems for navigation, and need massive volumes of data to function properly. Once on the road, they can generate nearly a gigabyte of data every second. For the average car, that means two petabytes of sensor, audio, video and other data each year.

Similarly, climate modeling can integrate measurements captured by government agencies with simulation data from other organizations and then use machine learning systems to look for patterns in the information. It’s estimated that a single model can yield a petabyte of data, a volume that challenges standard computer algorithms, but is useful for machine learning systems. This knowledge may help improve agriculture or aid in studying extreme weather patterns.

And if government agencies share aggregated data on building permits, school enrollment figures, sewer and water usage, their citizens benefit from the ability of commercial entities to anticipate their future needs and respond with infrastructure and facilities that arrive in anticipation of citizens’ demands.

“An open data license is essential for the frictionless sharing of the data that powers both critical technologies and societal benefits,” said Jim Zemlin, Executive Director of The Linux Foundation. “The success of open source software provides a powerful example of what can be accomplished when people come together around a resource and advance it for the common good. The CDLA licenses are a key step in that direction and will encourage the continued growth of applications and infrastructure.”…(More)”.

Our laws don’t do enough to protect our health data


 at the Conversation: “A particularly sensitive type of big data is medical big data. Medical big data can consist of electronic health records, insurance claims, information entered by patients into websites such as PatientsLikeMeand more. Health information can even be gleaned from web searches, Facebook and your recent purchases.

Such data can be used for beneficial purposes by medical researchers, public health authorities, and healthcare administrators. For example, they can use it to study medical treatments, combat epidemics and reduce costs. But others who can obtain medical big data may have more selfish agendas.

I am a professor of law and bioethics who has researched big data extensively. Last year, I published a book entitled Electronic Health Records and Medical Big Data: Law and Policy.

I have become increasingly concerned about how medical big data might be used and who could use it. Our laws currently don’t do enough to prevent harm associated with big data.

What your data says about you

Personal health information could be of interest to many, including employers, financial institutions, marketers and educational institutions. Such entities may wish to exploit it for decision-making purposes.

For example, employers presumably prefer healthy employees who are productive, take few sick days and have low medical costs. However, there are laws that prohibit employers from discriminating against workers because of their health conditions. These laws are the Americans with Disabilities Act (ADA) and the Genetic Information Nondiscrimination Act. So, employers are not permitted to reject qualified applicants simply because they have diabetes, depression or a genetic abnormality.

However, the same is not true for most predictive information regarding possible future ailments. Nothing prevents employers from rejecting or firing healthy workers out of the concern that they will later develop an impairment or disability, unless that concern is based on genetic information.

What non-genetic data can provide evidence regarding future health problems? Smoking status, eating preferences, exercise habits, weight and exposure to toxins are all informative. Scientists believe that biomarkers in your blood and other health details can predict cognitive decline, depression and diabetes.

Even bicycle purchases, credit scores and voting in midterm elections can be indicators of your health status.

Gathering data

How might employers obtain predictive data? An easy source is social media, where many individuals publicly post very private information. Through social media, your employer might learn that you smoke, hate to exercise or have high cholesterol.

Another potential source is wellness programs. These programs seek to improve workers’ health through incentives to exercise, stop smoking, manage diabetes, obtain health screenings and so on. While many wellness programs are run by third party vendors that promise confidentiality, that is not always the case.

In addition, employers may be able to purchase information from data brokers that collect, compile and sell personal information. Data brokers mine sources such as social media, personal websites, U.S. Census records, state hospital records, retailers’ purchasing records, real property records, insurance claims and more. Two well-known data brokers are Spokeo and Acxiom.

Some of the data employers can obtain identify individuals by name. But even information that does not provide obvious identifying details can be valuable. Wellness program vendors, for example, might provide employers with summary data about their workforce but strip away particulars such as names and birthdates. Nevertheless, de-identified information can sometimes be re-identified by experts. Data miners can match information to data that is publicly available….(More)”.

How “Big Data” Went Bust


The problem with “big data” is not that data is bad. It’s not even that big data is bad: Applied carefully, massive data sets can reveal important trends that would otherwise go undetected. It’s the fetishization of data, and its uncritical use, that tends to lead to disaster, as Julia Rose West recently wrote for Slate. And that’s what “big data,” as a catchphrase, came to represent.

By its nature, big data is hard to interpret. When you’re collecting billions of data points—clicks or cursor positions on a website; turns of a turnstile in a large public space; hourly wind speed observations from around the world; tweets—the provenance of any given data point is obscured. This in turn means that seemingly high-level trends might turn out to be artifacts of problems in the data or methodology at the most granular level possible. But perhaps the bigger problem is that the data you have are usually only a proxy for what you really want to know. Big data doesn’t solve that problem—it magnifies it….

Aside from swearing off data and reverting to anecdote and intuition, there are at least two viable ways to deal with the problems that arise from the imperfect relationship between a data set and the real-world outcome you’re trying to measure or predict.

One is, in short: moar data. This has long been Facebook’s approach. When it became apparent that users’ “likes” were a flawed proxy for what they actually wanted to see more of in their feeds, the company responded by adding more and more proxies to its model. It began measuring other things, like the amount of time they spent looking at a post in their feed, the amount of time they spent reading a story they had clicked on, and whether they hit “like” before or after they had read the piece. When Facebook’s engineers had gone as far as they could in weighting and optimizing those metrics, they found that users were still unsatisfied in important ways. So the company added yet more metrics to the sauce: It started running huge user-survey panels, added new reaction emojis by which users could convey more nuanced sentiments, and started using A.I. to detect clickbait-y language in posts by pages and publishers. The company knows none of these proxies are perfect. But by constantly adding more of them to the mix, it can theoretically edge ever closer to an algorithm that delivers to users the posts that they most want to see.

One downside of the moar data approach is that it’s hard and expensive. Another is that the more variables are added to your model, the more complex, opaque, and unintelligible its methodology becomes. This is part of the problem Pasquale articulated in The Black Box Society. Even the most sophisticated algorithm, drawing on the best data sets, can go awry—and when it does, diagnosing the problem can be nigh-impossible. There are also the perils of “overfitting” and false confidence: The more sophisticated your model becomes, the more perfectly it seems to match up with all your past observations, and the more faith you place in it, the greater the danger that it will eventually fail you in a dramatic way. (Think mortgage crisis, election prediction models, and Zynga.)

Another possible response to the problems that arise from biases in big data sets is what some have taken to calling “small data.” Small data refers to data sets that are simple enough to be analyzed and interpreted directly by humans, without recourse to supercomputers or Hadoop jobs. Like “slow food,” the term arose as a conscious reaction to the prevalence of its opposite….(More)”

 

Fraud Data Analytics Tools and Techniques in Big Data Era


Paper by Sara Makki et al: “Fraudulent activities (e.g., suspicious credit card transaction, financial reporting fraud, and money laundering) are critical concerns to various entities including bank, insurance companies, and public service organizations. Typically, these activities lead to detrimental effects on the victims such as a financial loss. Over the years, fraud analysis techniques underwent a rigorous development. However, lately, the advent of Big data led to vigorous advancement of these techniques since Big Data resulted in extensive opportunities to combat financial frauds. Given that the massive amount of data that investigators need to sift through, massive volumes of data integrated from multiple heterogeneous sources (e.g., social media, blogs) to find fraudulent patterns is emerging as a feasible approach….(More)”.

On the cultural ideology of Big Data


Nathan Jurgenson in The New Inquiry: “Modernity has long been obsessed with, perhaps even defined by, its epistemic insecurity, its grasping toward big truths that ultimately disappoint as our world grows only less knowable. New knowledge and new ways of understanding simultaneously produce new forms of nonknowledge, new uncertainties and mysteries. The scientific method, based in deduction and falsifiability, is better at proliferating questions than it is at answering them. For instance, Einstein’s theories about the curvature of space and motion at the quantum level provide new knowledge and generates new unknowns that previously could not be pondered.

Since every theory destabilizes as much as it solidifies in our view of the world, the collective frenzy to generate knowledge creates at the same time a mounting sense of futility, a tension looking for catharsis — a moment in which we could feel, if only for an instant, that we know something for sure. In contemporary culture, Big Data promises this relief.

As the name suggests, Big Data is about size. Many proponents of Big Data claim that massive databases can reveal a whole new set of truths because of the unprecedented quantity of information they contain. But the big in Big Data is also used to denote a qualitative difference — that aggregating a certain amount of information makes data pass over into Big Data, a “revolution in knowledge,” to use a phrase thrown around by startups and mass-market social-science books. Operating beyond normal science’s simple accumulation of more information, Big Data is touted as a different sort of knowledge altogether, an Enlightenment for social life reckoned at the scale of masses.

As with the similarly inferential sciences like evolutionary psychology and pop-neuroscience, Big Data can be used to give any chosen hypothesis a veneer of science and the unearned authority of numbers. The data is big enough to entertain any story. Big Data has thus spawned an entire industry (“predictive analytics”) as well as reams of academic, corporate, and governmental research; it has also sparked the rise of “data journalism” like that of FiveThirtyEight, Vox, and the other multiplying explainer sites. It has shifted the center of gravity in these fields not merely because of its grand epistemological claims but also because it’s well-financed. Twitter, for example recently announced that it is putting $10 million into a “social machines” Big Data laboratory.

The rationalist fantasy that enough data can be collected with the “right” methodology to provide an objective and disinterested picture of reality is an old and familiar one: positivism. This is the understanding that the social world can be known and explained from a value-neutral, transcendent view from nowhere in particular. The term comes from Positive Philosophy (1830-1842), by August Comte, who also coined the term sociology in this image. As Western sociology began to congeal as a discipline (departments, paid jobs, journals, conferences), Emile Durkheim, another of the field’s founders, believed it could function as a “social physics” capable of outlining “social facts” akin to the measurable facts that could be recorded about the physical properties of objects. It’s an arrogant view, in retrospect — one that aims for a grand, general theory that can explain social life, a view that became increasingly rooted as sociology became focused on empirical data collection.

A century later, that unwieldy aspiration has been largely abandoned by sociologists in favor of reorienting the discipline toward recognizing complexities rather than pursuing universal explanations for human sociality. But the advent of Big Data has resurrected the fantasy of a social physics, promising a new data-driven technique for ratifying social facts with sheer algorithmic processing power…(More)”