This analysis is not provided by the MTA and is in no way comparable to any information or analytics provided by official transit agencies.

Skip to the site

How it Works

The Metropolitan Transit Authority (MTA) provides a public API (Application Programming Interface) that feeds real-time GTFS (General Transit Feed Specification) data on subway train positions and schedules. This data is mostly used for third-party “countdown clock” applications leveraging the schedule information.

The position information is not exact, nor would greater precision really be possible with the NYC subway’s current signal system. It is, however, possible to estimate a train’s time-of-arrival at a station using a relatively simple algorithm.

The GTFS spec provides three possible train positions relative to a station:

  1. In Transit To

  2. Incoming At

  3. Stopped At

It can be reasonably inferred that the actual arrival at the station occurs sometime between ‘Incoming At’ and ‘Stopped At’, though, in reality, the ‘Incoming At’ status is occasionally skipped. For analytical purposes the precision with which the moment of arrival is determined is less important than the consistency. We can be a little off of reality as long as the values we’re comparing against are determined the same way and potential biases are (preferably) balanced.

My algorithm determines the arrival time to be the final ‘Incoming At’ timestamp before the status changes to ‘Stopped At’. In cases where the ‘Incoming At’ status is skipped, the arrival time is calculated to be halfway between the final ‘In Transit To’ and the first ‘Stopped At’. This value may be inaccurate to the actual arrival but is no more likely to be early than it is to be late. In general, the results are pretty good, largely owing to the high frequency (~5 seconds) with which the MTA updates its feeds.

Once an arrival has been determined, it can be analyzed. The time between one arrival and the previous arrival (of that line) can be added to and compared to the average for that station at that time of day. The information is instantly added to the longitudinal dataset and measured against it.

Known Issues

This is a hobby project, so the maintenance cycle tends to lack urgency.

  • Path Changes and Route Closures

    • The system lists all headways from the last hour. When routes change path due to construction or incidents or just stop running for the day, they appear as late.

  • Ends of Lines

    • The algorithm looks for a train’s arrival at a stop to calculate headways. It currently doesn’t attempt to determine when the train departs. Departure is the relevant metric at end-of-route stations as the trains arrive from the inbound direction then sit for an indeterminate amount of time before departing on an outbound trip.

  • Holidays

    • The algorithm calculates average headways for each route at each station for every hour of every day of the week. Some holidays can fall on any day of the week and, for the more generally observed ones, the subway will switch to a Sunday/Holiday schedule. I haven’t come up with a list of MTA holidays to recode those days as different from a normal weekday.