Daniel Sim at the Data.gov.sg Blog: “Singapore’s MRT Circle Line was hit by a spate of mysterious disruptions in recent months, causing much confusion and distress to thousands of commuters.
Like most of my colleagues, I take a train on the Circle Line to my office at one-north every morning. So on November 5, when my team was given the chance to investigate the cause, I volunteered without hesitation.
But the incidents — which first happened in August — seemed to occur at random, making it difficult for the investigation team to pinpoint the exact cause.
We were given a dataset compiled by SMRT that contained the following information:
- Date and time of each incident
- Location of incident
- ID of train involved
- Direction of train…
LTA and SMRT eventually published a joint press release on November 11 to share the findings with the public….
When we first started, my colleagues and I were hoping to find patterns that may be of interest to the cross-agency investigation team, which included many officers at LTA, SMRT and DSTA. The tidy incident logs provided by SMRT and LTA were instrumental in getting us off to a good start, as minimal cleaning up was required before we could import and analyse the data. We were also gratified by the effective follow-up investigations by LTA and DSTA that confirmed the hardware problems on PV46.
From the data science perspective, we were lucky that incidents happened so close to one another. That allowed us to identify both the problem and the culprit in such a short time. If the incidents were more isolated, the zigzag pattern would have been less apparent, and it would have taken us more time — and data — to solve the mystery….(More).”