CRIM Blog - Predicting the response times of firefighters using data science

CRIM Blog - Predicting the response times of firefighters using data science

Published March 2, 2018


by Cyril Pecoraro
Originally published on Medium

Response time is one of the most important factors for firefighters because their ability to save lives and rescue people depends on it. Every fire department in the world seeks strategies to decrease their response time, and several analyses have been conducted in the past years to determine what could impact response time. In the meantime, fire departments have been collecting data on their interventions; yet, few of them actually use data science to develop a data-driven decision making approach.

In this article, I will present a data science framework to predict turnout time and travel time and demonstrate its efficiency using data from the fire department of Montréal (Service de Sécurité Incendie de Montréal).

Definition of Response Time

Fire departments usually divide the response times of their firefighters in two main parts: turnout time, which corresponds to the seconds elapsed while firefighters prepare themselves at the station, and travel time, which refers to the time taken by the vehicle to arrive at the location of the incident.

Figure 1: Response time and its measurement

In many fire departments in North America, the measurement of turnout time and travel time are done manually. An officer presses a button located in the vehicle to signal his departure and his arrival. This process introduces irregularities and variation in the data, which will need to be cleaned.

To predict the response time, the prediction of the turnout time is added to the prediction of the travel time; both require the use of a regression algorithm.

Prediction of Turnout Time

In the literature, several factors that affect the turnout time have been studied, but at the time this project was implemented, no project involving artificial intelligence was documented.


The main challenge was to clean the irregularities introduced by manual operations so that the algorithm could learn with consistent data. Although the cleaning step is not detailed in this article, the distribution of the turnout times obtained after the cleaning process is presented in Figure 2 below. It is important to note that these are the turnout times for all the units responding to an emergency, not only the first ones. For instance, it may take slightly longer for a unit called on a fifth alarm in winter to prepare because firefighters know that they will be in action for several hours in cold and wet conditions and will need that extra layer of clothes.

Figure 2: Distribution of turnout times after the cleaning process (vertical axis is in logarithm format)

This distribution shows that most of the turnout times are in the range between 40 and 125 seconds. However, the distribution presents a tail that ranges between 125 seconds and 700 seconds. Since the invalid times are supposed to be removed, we can ask ourselves to what these times correspond to.

In fact, the Emerging Technologies and Data Science team at CRIM met with the firefighters of Montréal several times to acquire field knowledge in firefighting and especially to understand what was happening while a unit was responding to an emergency. It turned out that, although most of the units would respond in less than 80 seconds, some specialized units could take longer in some cases. Figure 3 below presents the mean turnout times (blue bars) and the standard deviation (black lines) for each type of units of the fire department of Montréal.

Figure 3 : Turnout time depending on the type of units

This figure shows that some units have constant turnout times with almost no variation, such as units 2 (pumper/engine), 4 (ladder truck) or 5 (protection vehicle, light rescue), while other units have longer mean turnout times that vary greatly, such as units 9 and 13 (support and auxiliary units), 18 (boats), 21 and 22 (trailers).

Hence, two classes of units were created to take into consideration this specificity. The figure below presents the distribution of the turnout time with the separation between slower and faster units.

Figure 4: Distribution of the turnout times after the cleaning process and separation of the type of unit (vertical axis is in logarithm format)


The objective was to predict the turnout time. The difficulty was to build a model that showed a higher performance than a prediction consisting only of the mean or the median of the turnout times (baseline).

With the baseline model, only the median of the historic turnout times between 2011 and 2016 is given. This time is 72 seconds.

XGBoost was used to predict the turnout time. Features such as the time of day, season, type of units, type of incidents (5 types) and information about the fire station of origin were provided to the algorithm. The training set was composed of around 800,000 interventions that happened between 2009 and 2016. Interventions posterior to 2016 were used in the test set.

The mean absolute error (MAE) decreased from 17.8 seconds (baseline) to 15.7 seconds (XGBoost). When evaluating the model on the slower units only, the MAE decreased from 158 seconds to 85.6 seconds. When evaluating the faster units only, the error decreased from 16.9 seconds to 15.2 seconds. This shows the improvement brought by using machine learning to predict a turnout time that may vary based on several factors.

Prediction of Travel Time

A lot of fire departments and emergency services rely on geographic information systems tools, such as ESRI ARCGis or Network Analyst, to obtain estimations about the response time. These tools rely on computing the shortest route using a graphical representation of the road network, which usually gives an accurate estimate of the travel time. Their drawback is that they cannot always take into consideration external dynamic factors such as the weather, traffic or type of units or intervention. Hence, there is an opportunity for machine learning tools to be used here.


Travel time suffered from the same imprecision affecting turnout time because its measurement was made between the manual pressure of the button signaling the ENR status and ARR status. Thus, a precise cleaning, that will not be detailed here, was carried out on these data. Figure 5 presents the distribution of travel times after cleaning.

Figure 5: Distribution of travel times after cleaning process (vertical axis is in logarithmic scale)

From this distribution, we can observe that most of the travel times is around 220 seconds. We can also note that the range of times is wide, with very short times when the incident was located near the fire station and longer times when firefighters were joining other forces located further in the city.


The challenge here was to build a predictive tool that would give better results than advanced software such as esri’s ArcGIS Network Analyst. The predictions should also depend on external dynamic factors, such as the weather or the time of the day.

The baseline model was given by the estimation that the fire department of Montréal uses now, which relies on a software such as Network Analyst.

In 2015, Uber published a blog post to explain how they computed the Estimated Time of Arrival (ETA) when a car was ordered in their early days. Uber’s data science team innovated as they brought machine learning to improve the prediction of OSRM, a well-known open source routing engine. With this improvement, the impact of external dynamic factors on the time predicted by OSRM could be taken into consideration. More specifically, their system named Goldeta was designed as follows: the origin and destination were submitted to OSRM, which returned an estimation of the time of the trip and was then input to a machine learning regression algorithm that also took into account other external features. The features used by Goldeta are not documented, but it is reasonable to assume that the algorithm used the time of the day, season or weather, as the literature showed that these factors have an influence on travel time.

A similar approach was used in this project, as described by Figure 6.

Figure 6: Combination of OSRM and XGBoost for the prediction of travel time

Moreover, arrival and departure GPS coordinates were used to create features that gave indications about the distance, such as the Manhattan distance and the Haversine distance. The historical mean speed was also input, as well as the mean time between the ZIP code area of departure and arrival points. Eight features indicating the weather condition and features indicating the hour, month and holiday also helped to take into account the traffic.

The mean absolute error (MAE) decreased from 46.60 seconds (baseline given by the ETA used by the fire department) to 30.77 seconds (OSRM + XGBoost). The RMSE also decreased from 59.58 seconds to 43.03 seconds.

Towards Smarter Firefighting

This project shows the opportunity to bring data science in the firefighting field. With a better estimation of the response times, and models that take into consideration the moment of the day, season, weather, type of interventions and units, there could be an advantage to use this tool to better plan strategic responses.

By providing the GPS coordinates of a neighborhood or a building, the response times could then be simulated to see which situation and area could be problematic. The real advantages with the system presented here are the increased precision as well as the modularity of the tool. With a simple change of parameter, a manager could create a brand new map in a few minutes and obtain statistics about the response times. As the pipeline is well defined, the user could simply feed the framework with new data, before training the algorithm again to obtain better predictions.

Also, building more complete tools around this prediction engine could be possible and could lead to solutions tailored to the fire department. During the meetings organized between CRIM’s Emerging Technologies and Data Science team and officers of the fire department of Montréal, the discussions revealed that there was a need to understand how much a variation of the response time could impact the lives of the victims and the material damages to a building. A complete simulation engine could be built combining several predictive tools. An engine predicting the risks of fire in a building similar to Firebird could be associated with another engine predicting the medical emergencies. By simply overlapping the results of the buildings or the areas of the city at risk and the response times map, strategic planning could be greatly improved. For example, a building predicted as a fire risk, located in an area with a predicted high response time could be taken into account during strategic planning. Then, a simulation for a given time frame could be run by triggering events based on these engines and external dynamic factors (e.g. weather, musical events). A tool would manage which units to dispatch according to the interventions and the norms of the fire department and another module would evaluate the damages. Finally, by combining all the modules in the simulation engine, a complete picture of how the fire department would handle such situation could be created. This framework would help the strategic planning of the units. For instance, managers could plan complex scenarios such as relocating a unit or opening a new fire station and observe the results on the service offered to citizens.

Upcoming event

  • MLDM 2018
    14 July 2018 8:00
    New York, U.S.A.
    CRIM will present a paper at the 14th International Conference on Machine Learning and Data Mining (MLDM 2018) which will take place from July 14 to 19, 2018 in New York.
  • "Accélérer l'adoption et la mise en place de solutions d'#IA par l'ensemble du tissu économique québécois" Le…
  • Prompt_Innovation RT @Prompt_Innov: Aujourd’hui, le 22 juin 2018, marque le lancement officiel du 1e appel de projets de INNOV-R! Accélérer la transition ver…