Citi Bike System Data


Citi Bike: “Where do Citi Bikers ride? When do they ride? How far do they go? Which stations are most popular? What days of the week are most rides taken on? We’ve heard all of these questions and more from you and now we are happy to provide the datasets to help you discover the answers to these questions and more. We invite developers, engineers, statisticians, artists, academics and other members of the interested public to use the data we provide for analysis, development, visualization and whatever else moves you.
This data is provided according to the NYCBS Data Use Policy.
Citi Bike Trip Histories
Below are links to downloadable files of Citi Bike trip data. The data includes:

  • Trip Duration (seconds)
  • Start Time and Date
  • Stop Time and Date
  • Start Station Name
  • End Station Name
  • Station ID
  • Station Lat/Long
  • Bike ID
  • User Type (Customer = 24-hour pass or 7-day pass user; Subscriber = Annual Member)
  • Gender
  • Year of Birth”

Big Data’s Dangerous New Era of Discrimination


Michael Schrage in HBR blog: “Congratulations. You bought into Big Data and it’s paying off Big Time. You slice, dice, parse and process every screen-stroke, clickstream, Like, tweet and touch point that matters to your enterprise. You now know exactly who your best — and worst — customers, clients, employees and partners are.  Knowledge is power.  But what kind of power does all that knowledge buy?
Big Data creates Big Dilemmas. Greater knowledge of customers creates new potential and power to discriminate. Big Data — and its associated analytics — dramatically increase both the dimensionality and degrees of freedom for detailed discrimination. So where, in your corporate culture and strategy, does value-added personalization and segmentation end and harmful discrimination begin?
Let’s say, for example, that your segmentation data tells you the following:
Your most profitable customers by far are single women between the ages of 34 and 55 closely followed by “happily married” women with at least one child. Divorced women are slightly more profitable than “never marrieds.” Gay males — single and in relationships — are also disproportionately profitable. The “sweet spot” is urban and 28 to 50. These segments collectively account for roughly two-thirds of your profitability.  (Unexpected factoid: Your most profitable customers are overwhelmingly Amazon Prime subscriber. What might that mean?)
Going more granular, as Big Data does, offers even sharper ethno-geographic insight into customer behavior and influence:

  • Single Asian, Hispanic, and African-American women with urban post codes are most likely to complain about product and service quality to the company. Asian and Hispanic complainers happy with resolution/refund tend to be in the top quintile of profitability. African-American women do not.
  • Suburban Caucasian mothers are most likely to use social media to share their complaints, followed closely by Asian and Hispanic mothers. But if resolved early, they’ll promote the firm’s responsiveness online.
  • Gay urban males receiving special discounts and promotions are the most effective at driving traffic to your sites.

My point here is that these data are explicit, compelling and undeniable. But how should sophisticated marketers and merchandisers use them?
Campaigns, promotions and loyalty programs targeting women and gay males seem obvious. But should Asian, Hispanic and white females enjoy preferential treatment over African-American women when resolving complaints? After all, they tend to be both more profitable and measurably more willing to effectively use social media. Does it make more marketing sense encouraging African-American female customers to become more social media savvy? Or are resources better invested in getting more from one’s best customers? Similarly, how much effort and ingenuity flow should go into making more gay male customers better social media evangelists? What kinds of offers and promotions could go viral on their networks?…
Of course, the difference between price discrimination and discrimination positively correlated with gender, ethnicity, geography, class, personality and/or technological fluency is vanishingly small. Indeed, the entire epistemological underpinning of Big Data for business is that it cost-effectively makes informed segmentation and personalization possible…..
But the main source of concern won’t be privacy, per se — it will be whether and how companies and organizations like your own use Big Data analytics to justify their segmentation/personalization/discrimination strategies. The more effective Big Data analytics are in profitably segmenting and serving customers, the more likely those algorithms will be audited by regulators or litigators.
Tomorrow’s Big Data challenge isn’t technical; it’s whether managements have algorithms and analytics that are both fairly transparent and transparently fair. Big Data champions and practitioners had better be discriminating about how discriminating they want to be.”

The Moneyball Effect: How smart data is transforming criminal justice, healthcare, music, and even government spending


TED: “When Anne Milgram became the Attorney General of New Jersey in 2007, she was stunned to find out just how little data was available on who was being arrested, who was being charged, who was serving time in jails and prisons, and who was being released. It turns out that most big criminal justice agencies like my own didn’t track the things that matter,” she says in today’s talk, filmed at TED@BCG. “We didn’t share data, or use analytics, to make better decisions and reduce crime.”
Milgram’s idea for how to change this: “I wanted to moneyball criminal justice.”
Moneyball, of course, is the name of a 2011 movie starring Brad Pitt and the book it’s based on, written by Michael Lewis in 2003. The term refers to a practice adopted by the Oakland A’s general manager Billy Beane in 2002 — the organization began basing decisions not on star power or scout instinct, but on statistical analysis of measurable factors like on-base and slugging percentages. This worked exceptionally well. On a tiny budget, the Oakland A’s made it to the playoffs in 2002 and 2003, and — since then — nine other major league teams have hired sabermetric analysts to crunch these types of numbers.
Milgram is working hard to bring smart statistics to criminal justice. To hear the results she’s seen so far, watch this talk. And below, take a look at a few surprising sectors that are getting the moneyball treatment as well.

Moneyballing music. Last year, Forbes magazine profiled the firm Next Big Sound, a company using statistical analysis to predict how musicians will perform in the market. The idea is that — rather than relying on the instincts of A&R reps — past performance on Pandora, Spotify, Facebook, etc can be used to predict future potential. The article reads, “For example, the company has found that musicians who gain 20,000 to 50,000 Facebook fans in one month are four times more likely to eventually reach 1 million. With data like that, Next Big Sound promises to predict album sales within 20% accuracy for 85% of artists, giving labels a clearer idea of return on investment.”
Moneyballing human resources. In November, The Atlantic took a look at the practice of “people analytics” and how it’s affecting employers. (Billy Beane had something to do with this idea — in 2012, he gave a presentation at the TLNT Transform Conference called “The Moneyball Approach to Talent Management.”) The article describes how Bloomberg reportedly logs its employees’ keystrokes and the casino, Harrah’s, tracks employee smiles. It also describes where this trend could be going — for example, how a video game called Wasabi Waiter could be used by employers to judge potential employees’ ability to take action, solve problems and follow through on projects. The article looks at the ways these types of practices are disconcerting, but also how they could level an inherently unequal playing field. After all, the article points out that gender, race, age and even height biases have been demonstrated again and again in our current hiring landscape.
Moneyballing healthcare. Many have wondered: what about a moneyball approach to medicine? (See this call out via Common Health, this piece in Wharton Magazine or this op-ed on The Huffington Post from the President of the New York State Health Foundation.) In his TED Talk, “What doctors can learn from each other,” Stefan Larsson proposed an idea that feels like something of an answer to this question. In the talk, Larsson gives a taste of what can happen when doctors and hospitals measure their outcomes and share this data with each other: they are able to see which techniques are proving the most effective for patients and make adjustments. (Watch the talk for a simple way surgeons can make hip surgery more effective.) He imagines a continuous learning process for doctors — that could transform the healthcare industry to give better outcomes while also reducing cost.
Moneyballing government. This summer, John Bridgeland (the director of the White House Domestic Policy Council under President George W. Bush) and Peter Orszag (the director of the Office of Management and Budget in Barack Obama’s first term) teamed up to pen a provocative piece for The Atlantic called, “Can government play moneyball?” In it, the two write, “Based on our rough calculations, less than $1 out of every $100 of government spending is backed by even the most basic evidence that the money is being spent wisely.” The two explain how, for example, there are 339 federally-funded programs for at-risk youth, the grand majority of which haven’t been evaluated for effectiveness. And while many of these programs might show great results, some that have been evaluated show troubling results. (For example, Scared Straight has been shown to increase criminal behavior.) Yet, some of these ineffective programs continue because a powerful politician champions them. While Bridgeland and Orszag show why Washington is so averse to making data-based appropriation decisions, the two also see the ship beginning to turn around. They applaud the Obama administration for a 2014 budget with an “unprecendented focus on evidence and results.” The pair also gave a nod to the nonprofit Results for America, which advocates that for every $99 spent on a program, $1 be spent on evaluating it. The pair even suggest a “Moneyball Index” to encourage politicians not to support programs that don’t show results.
In any industry, figuring out what to measure, how to measure it and how to apply the information gleaned from those measurements is a challenge. Which of the applications of statistical analysis has you the most excited? And which has you the most terrified?”

The Power to Decide


Special Report by Antonio Regalado in MIT Technology Review: “Back in 1956, an engineer and a mathematician, William Fair and Earl Isaac, pooled $800 to start a company. Their idea: a score to handicap whether a borrower would repay a loan.
It was all done with pen and paper. Income, gender, and occupation produced numbers that amounted to a prediction about a person’s behavior. By the 1980s the three-digit scores were calculated on computers and instead took account of a person’s actual credit history. Today, Fair Isaac Corp., or FICO, generates about 10 billion credit scores annually, calculating 50 times a year for many Americans.
This machinery hums in the background of our financial lives, so it’s easy to forget that the choice of whether to lend used to be made by a bank manager who knew a man by his handshake. Fair and Isaac understood that all this could change, and that their company didn’t merely sell numbers. “We sell a radically different way of making decisions that flies in the face of tradition,” Fair once said.
This anecdote suggests a way of understanding the era of “big data”—terabytes of information from sensors or social networks, new computer architectures, and clever software. But even supercharged data needs a job to do, and that job is always about a decision.
In this business report, MIT Technology Review explores a big question: how are data and the analytical tools to manipulate it changing decision making today? On Nasdaq, trading bots exchange a billion shares a day. Online, advertisers bid on hundreds of thousands of keywords a minute, in deals greased by heuristic solutions and optimization models rather than two-martini lunches. The number of variables and the speed and volume of transactions are just too much for human decision makers.
When there’s a person in the loop, technology takes a softer approach (see “Software That Augments Human Thinking”). Think of recommendation engines on the Web that suggest products to buy or friends to catch up with. This works because Internet companies maintain statistical models of each of us, our likes and habits, and use them to decide what we see. In this report, we check in with LinkedIn, which maintains the world’s largest database of résumés—more than 200 million of them. One of its newest offerings is University Pages, which crunches résumé data to offer students predictions about where they’ll end up working depending on what college they go to (see “LinkedIn Offers College Choices by the Numbers”).
These smart systems, and their impact, are prosaic next to what’s planned. Take IBM. The company is pouring $1 billion into its Watson computer system, the one that answered questions correctly on the game show Jeopardy! IBM now imagines computers that can carry on intelligent phone calls with customers, or provide expert recommendations after digesting doctors’ notes. IBM wants to provide “cognitive services”—computers that think, or seem to (see “Facing Doubters, IBM Expands Plans for Watson”).
Andrew Jennings, chief analytics officer for FICO, says automating human decisions is only half the story. Credit scores had another major impact. They gave lenders a new way to measure the state of their portfolios—and to adjust them by balancing riskier loan recipients with safer ones. Now, as other industries get exposed to predictive data, their approach to business strategy is changing, too. In this report, we look at one technique that’s spreading on the Web, called A/B testing. It’s a simple tactic—put up two versions of a Web page and see which one performs better (see “Seeking Edge, Websites Turn to Experiments” and “Startups Embrace a Way to Fail Fast”).
Until recently, such optimization was practiced only by the largest Internet companies. Now, nearly any website can do it. Jennings calls this phenomenon “systematic experimentation” and says it will be a feature of the smartest companies. They will have teams constantly probing the world, trying to learn its shifting rules and deciding on strategies to adapt. “Winners and losers in analytic battles will not be determined simply by which organization has access to more data or which organization has more money,” Jennings has said.

Of course, there’s danger in letting the data decide too much. In this report, Duncan Watts, a Microsoft researcher specializing in social networks, outlines an approach to decision making that avoids the dangers of gut instinct as well as the pitfalls of slavishly obeying data. In short, Watts argues, businesses need to adopt the scientific method (see “Scientific Thinking in Business”).
To do that, they have been hiring a highly trained breed of business skeptics called data scientists. These are the people who create the databases, build the models, reveal the trends, and, increasingly, author the products. And their influence is growing in business. This could be why data science has been called “the sexiest job of the 21st century.” It’s not because mathematics or spreadsheets are particularly attractive. It’s because making decisions is powerful…”

A World Of Wikipedia And Bitcoin: Is That The Promise Of Open Collaboration?


Science 2.0: “Open Collaboration, defined in a new paper as “any system of innovation or production that relies on goal-oriented yet loosely coordinated participants who interact to create a product (or service) of economic value, which they make available to contributors and non-contributors alike” brought the world Wikipedia, Bitcoin and, yes, even Science 2.0.
But what does that mean, really? That’s the first problem with vague terms in an open environment. It is anything people want it to be and sometimes what people want it to be is money, but hidden behind a guise of public weal.
TED’s lesser cousin TEDx is a result of open collaboration but there is no doubt it has successfully leveraged the marketing of TED to sell seats in auditoriums, just as it was designed to do. Generally, Open Collaboration now is less like its early days, where a group of like-minded people got together to create an Open Source tool, and more like corporations. Only they avoid the label, they are not quite non-profits and not quite corporations.
And because they are neither they can operate free of the cultural stigma. Despite efforts to claim that Wikipedia is a hotbed of misogyny and blocks out minorities, the online encyclopedia has endured just fine. Their defense is a simple one; they have no idea what gender or race or religion anyone is and anyone can contribute – it is a true open collaboration. Open Collaboration is goal-oriented, they lack the infrastructure to obey demands that they become about social justice, so the environments can be less touchy-feely than corporations and avoid the social authoritarianism of academia.
Many open collaborations perform well even in ‘harsh’ environments, where some minorities are underrepresented and diversity is lacking or when products by different groups rival one another. It’s a real puzzle for sociologists. The authors conclude that open collaboration is likely to expand into new domains, displacing traditional organizations, because it is so mission-oriented. Business executives and civic leaders should take heed – the future could look a lot more like the 1940s.”
See also: Sheen S. Levine, Michael J. Prietula, ‘Open Collaboration for Innovation: Principles and Performance’, Organization Science December 30, 2014 DOI:10.1287/orsc.2013.0872

Phone Apps Help Government, Others Counter Violence Against Women


NextGov: “Smart and mobile phones have helped authorities solve crimes from beatings that occurred during the London riots to the Boston Marathon bombing. A panel of experts gathered on Monday said the devices can also help reduce and combat rapes and other gender-based violence.
Smartphone apps and text messaging services proliferated in India following a sharp rise in reported gang rapes, including the brutal 2012 rape and murder of a 23-year-old medical student in Delhi, according to panelists at the Wilson Center event on gender-based violence and innovative technologies.
The apps fall into four main categories, said Alex Dehgan, chief data scientist at the United States Agency for International Development: apps that aid sexual assault and domestic violence victims, apps that empower women to fight back against gender-based violence, apps focused on advocacy and apps that crowdsource and map cases of sexual assault.
The final category of apps is largely built on the Ushahidi platform, which was developed to track reports of missing people following the 2010 Haiti earthquake.
One of the apps, Safecity, offers real-time alerts about sexual assaults across India to help women identify unsafe areas.
Similar apps have been launched in Egypt and Syria, Dehgan said. In lower-tech countries the systems often operate using text messages rather than smartphone apps so they’re more widely accessible.
One of the greatest impediments to using mobile technology to reduce gender violence is third world nations in which women often don’t have access to their own mobile or smartphones and rural areas in the U.S. and abroad in which there is limited service or broadband, Christopher Burns, USAID’s team leader for mobile access, said.
Burns suggested international policymakers should align plans for expanding broadband and mobile service with crowdsourced reports of gender violence.
“One suggestion for policy makers to focus on is to take a look at the crowd maps we’ve talked about today and see where there are greater incidences of gender-based violence and violence against women,” he said. “In all likelihood, those pockets probably don’t have the connectivity, don’t have the infrastructure [and] don’t have the capacity in place for survivors to benefit from those tools.”
One tool that’s been used in the U.S. is Circle of 6, an app for women on college campuses to automatically draw on friends when they think they’re in danger. The app allows women to pick six friends they can automatically text if they think they’re in a dangerous situation, asking them to call with an excuse for them to leave.
The app is designed to look like a game so it isn’t clear women are using their phones to seek help, said Nancy Schwartzman, executive director of Tech 4 Good, which developed the app.
Schwartzman has heard reports of gay men on college campuses using the app as well, she said. The military has been in contact with Tech 4 Good about developing a version of the app to combat sexual assault on military bases, she said.”

5 Big Data Projects That Could Impact Your Life


Mashable: “We reached out to a few organizations using information, both hand- and algorithm-collected, to create helpful tools for their communities. This is only a small sample of what’s out there — plenty more pop up each day, and as more information becomes public, the trend will only grow….
1. Transit Time NYC
Transit Time NYC, an interactive map developed by WNYC, lets New Yorkers click a spot in any of the city’s five boroughs for an estimate of subway or train travel times. To create it, WNYC lead developer Steve Melendez broke the city into 2,930 hexagons, then pulled data from open source itinerary platform OpenTripPlanner — the Wikipedia of mapping software — and coupled it with the MTA’s publicly downloadable subway schedule….
2. Twitter’s ‘Topography of Tweets
In a blog post, Twitter unveiled a new data visualization map that displays billions of geotagged tweets in a 3D landscape format. The purpose is to display, topographically, which parts of certain cities most people are tweeting from…
3. Homicide Watch D.C.
Homicide Watch D.C. is a community-driven data site that aims to cover every murder in the District of Columbia. It’s sorted by “suspect” and “victim” profiles, where it breaks down each person’s name, age, gender and race, as well as original articles reported by Homicide Watch staff…
4. Falling Fruit
Can you find a hidden apple tree along your daily bike commute? Falling Fruit can.
The website highlights overlooked or hidden edibles in urban areas across the world. By collecting public information from the U.S. Department of Agriculture, municipal tree inventories, foraging maps and street tree databases, the site has created a network of 615 types of edibles in more than 570,000 locations. The purpose is to remind urban dwellers that agriculture does exist within city boundaries — it’s just more difficult to find….
5. AIDSvu
AIDSVu is an interactive map that illustrates the prevalence of HIV in the United States. The data is pulled from the U.S. Center for Disease Control’s national HIV surveillance reports, which are collected at both state and county levels each year…”

Why Big Data Is Not Truth


Quentin Hardy in the New York Times: “Kate Crawford, a researcher at Microsoft Research, calls the problem “Big Data fundamentalism — the idea with larger data sets, we get closer to objective truth.” Speaking at a conference in Berkeley, Calif., on Thursday, she identified what she calls “six myths of Big Data.”
Myth 1: Big Data is New
In 1997, there was a paper that discussed the difficulty of visualizing Big Data, and in 1999, a paper that discussed the problems of gaining insight from the numbers in Big Data. That indicates that two prominent issues today in Big Data, display and insight, had been around for awhile…..
Myth 2: Big Data Is Objective
Over 20 million Twitter messages about Hurricane Sandy were posted last year. … “These were very privileged urban stories.” And some people, privileged or otherwise, put information like their home addresses on Twitter in an effort to seek aid. That sensitive information is still out there, even though the threat is gone.
Myth 3: Big Data Doesn’t Discriminate
“Big Data is neither color blind nor gender blind,” Ms. Crawford said. “We can see how it is used in marketing to segment people.” …
Myth 4: Big Data Makes Cities Smart
…, moving cities toward digital initiatives like predictive policing, or creating systems where people are seen, whether they like it or not, can promote lots of tension between individuals and their governments.
Myth 5: Big Data Is Anonymous
A study published in Nature last March looked at 1.5 million phone records that had personally identifying information removed. It found that just four data points of when and where a call was made could identify 95 percent of individuals. …
Myth 6: You Can Opt Out
… given the ways that information can be obtained in these big systems, “what are the chances that your personal information will never be used?”
Before Big Data disappears into the background as another fact of life, Ms. Crawford said, “We need to think about how we will navigate these systems. Not just individually, but as a society.”