These: Added version sent to reviewers.

2022-10-27 15:01:18 +02:00 · 2022-10-27 15:01:18 +02:00 · 12b5019667
commit 12b5019667
114 changed files with 6411 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,36 @@
+# Ignore LaTeX compile files
+*.aux
+*.glo
+*.idx
+*.log
+*.toc
+*.ist
+*.acn
+*.acr
+*.alg
+*.bbl
+*.blg
+*.dvi
+*.glg
+*.gls
+*.ilg
+*.ind
+*.lof
+*.lot
+*.maf
+*.mtc
+*.mtc1
+*.out
+*.synctex.gz
+*.run.xml
+*.pdf
+*.nav
+*.snm
+*-blx.bib
+*.bcf
+.vscode/
+*.tdo
+
+
+# Ingore local folder
+local/
--- a/0_introduction.tex
+++ b/0_introduction.tex
@ -0,0 +1,120 @@
+\chapter*{Introduction}
+\addcontentsline{toc}{chapter}{Introduction}
+\chaptermark{Introduction}
+
+%\section*{Section of Introduction}
+%\addcontentsline{toc}{section}{Section of Introduction}
+
+In every city, urban dwellers have to travel inside a city, for example they have to work, spend time with friends, buy groceries, etc.
+The inhabitant's travels are made either on foot, with a (motor)bike, with a private car or with public transportation services (buses, subway, trains).
+However with the increased congestion of city streets and the need to reduce pollution and CO$_2$ emissions from transports such as private cars, new services emerged, mostly in North American and Europe, with the objective of satisfying the need for trips inside the city while keeping negative effects of private vehicles' utilization to the minimum.
+Carsharing services are one of several services that emerged: they offer cars to inhabitants that can be rented for a private usage at any time for a fee lower than a taxi service.
+Carsharing services allow any driver to rent a car for a urban usage at any time, without the need to book in advance or to get the keys of the vehicle in a standard car rental agency.
+The cars are dedicated to a city, meaning that a user can only rent it and drop it off in a given city.
+With the objective to replace the inconvenience of car usage inside the cities, such services can reduce the car ownership of city dwellers~\cite{martin_impact_2010,giesel_impact_2016} and can also reduce C0$_2$ emissions when electric vehicles are offered by the service~\cite{firnkorn_what_2011,chen_carsharings_2016}.
+
+However this new type of service has not yet reached its maturity, \emph{Autolib} in Paris (Figure~\ref{fig:ch0_autolib}) is an example of a carsharing service deployed and being shut down after several years due to profitability issues.
+The fleet of vehicles accessible by the customers of such services needs to be monitored every day, either to make maintenance on the vehicles or to recharge the battery of electric vehicles.
+Cars that are put back into service inside the city have to be placed carefully since the demand for this type of transportation is not uniform within the city.
+The need to be able to place the whole fleet correctly arises, the more the cars are used by the customers the more profitable the service is.
+Since the demand for this kind of service is not uniform in the city, hot spots and cold spots of demand shift the spatial distribution of the fleet towards an \textquote{un-optimal} distribution, i.e cars can be taken out of a busy city center towards peripheral areas of the city where the cars are less used and tend to accumulate unused.
+For this reason, the objective of this thesis is to help the operator of carsharing services \emph{Free2Move} to better relocate its cars to make the service profitable enough.
+
+\begin{figure}[!b]
+    \centering
+        \includegraphics[width=0.8\textwidth]{figure/ch0_autolib.jpeg}
+        \caption[Autolib Paris Vehicles]{The one-way station-based carsharing service named \emph{Autolib} was available between 2011 and 2018. Each car on the map represent a station where cars can be returned, the car color designates the number of available reserved parking lots.\textsuperscript{1}}
+        \small\textsuperscript{1}\emph{Carte interactive des stations Autolib' en Île-de-France} from \emph{www.data.gouv.fr}
+        \label{fig:ch0_autolib}
+\end{figure}
+
+\paragraph{Types of Carsharing Services.}
+There exist three different kinds of carsharing services.
+Each type has been developed to alleviate constraints from previous carsharing types.
+They still have common characteristics like the possibility to offer multiple types of cars (city cars, sedans, etc.) or the rent price, based either on the number of minutes of car usage, on the distance traveled by the customer or a combination of both.
+Furthermore, carsharing services can take advantages of electrical vehicles in urban areas, offer only thermal vehicles or a combination of both.
+
+The first type of carsharing service is called \emph{round-trip}, it allows customers to pick a car from any dedicated station of the service and return the vehicle \emph{in the same station}.
+A dedicated station is a group of parking lots in the street reserved for the service customers and accessible for them.
+The main example of this kind of carsharing is a brand named \emph{ZipCar} which is operating mainly in the U.S. and in Canada in over a hundred cities with more than ten thousand vehicles distributed among those cities.
+A typical use case is a user renting a car near his/her home to do some shopping in the city center before going back hours later to his/her home.
+However this kind of carsharing makes it impossible for a customer to take a car near his/her home and drive to his/her workplace by returning the car near it.
+The customer should either rent the car until coming back to his/her home or simply take another means of transportation for the trip.
+
+This constraint was alleviated by the second type of carsharing system called \emph{one-way} carsharing.
+In this kind of service, the user is allowed to rent a vehicle from a station as in the \emph{round trip} type of service. 
+But instead of being forced to return it in the same station, the customer is able to drop it in any other station, including the station where the vehicle was picked first.
+A French example of this type of carsharing was \emph{Autolib}, it ran its service inside the urban area of Paris until late-2019.
+In the case of one-way carsharing, even if it is possible for a customer to commute with a car of the service from home to the workplace, the location of each station still limits his/her mobility. 
+The stations have to be placed by the operator \textquote{optimally}.
+Indeed, the need to take and return the car inside designated stations weaken the user ability to move freely in the town: the customer cannot directly drop the car in front of the place he wants to stop.
+
+\begin{figure}[!b]
+    \centering
+        \includegraphics[width=0.8\textwidth]{figure/ch0_free2move.jpg}
+        \caption[Free2Move Paris Vehicles]{Example of free-floating carsharing service with Free2Move in Paris with its cars parked on previously reserved parking lots of the late \emph{Autolib} station-based carsharing service in Paris.\textsuperscript{1}}
+        \small\textsuperscript{1}Tiraden, CC BY-SA 4.0, via Wikimedia Commons
+        \label{fig:ch0_free2move}
+\end{figure}
+
+Finally, \emph{free-floating} carsharing service is the last type of carsharing which removes the need to have stations designated by the operator.
+In this kind of service, the cars are parked on normal parking lots which are not reserved to the carsharing service.
+Usually the operator of the service has an agreement with the city council: the operator pays a fixed amount annually to be able to park cars inside the city so customers do not have to worry about paying parking lots fees.
+If electrical vehicles are used, the service might still offer stations to its customers to recharge the vehicles, but it does not force the user to leave a car at those stations.
+Furthermore, the service might offer discounts to customers who return a car with a low battery charge to a charge station.
+Some services such as \emph{Communauto} in Montréal thus offers a hybrid service, with cars both in \emph{free-floating} mode and in \emph{round-trip} mode with dedicated stations.
+Furthermore, the main example of free-floating carsharing operators are \emph{Car2Go} and \emph{Free2Move} with cars in numerous cities around the world, mainly in Europe and North America.
+Even if this type of carsharing system is very customer-friendly because of the lack of stations, it does not alleviate the need for cars to be relocated within the city to counter a possible demand imbalance.
+The relocation decisions in this kind of carsharing can be harder and more complex since the user might take or return cars anywhere in the city and not only in dedicated stations of the service.
+
+\paragraph{Carsharing Context.} 
+This thesis has been motivated by the need of the company \emph{Stellantis}, the sponsor of this thesis, to improve the carsharing services of its brand \emph{Free2Move}. 
+\emph{Free2Move} offers a free-floating carsharing service in several cities such as Paris, Washington, Madrid or other cities. 
+While other services might open in the future, the optimization of the parameters for a new service is outside the scope of of this thesis.
+The objective of this thesis concerns the improvement of an already existing free-floating carsharing service.
+This kind of service can be improved in numerous ways, such as increasing the number of trips, the customer satisfaction or even optimizing the number of vehicles.
+In the case of \emph{Free2Move}, the aim is to increase the average car utilization.
+Since car utilization is time-based, there exist two ways to increase this average car utilization.
+First it is possible to make the customer rent cars for a longer period of time and second it is possible to increase the number of bookings so cars are in general used more often.
+While the operator can propose price cuts for a high utilization of the service in order to improve the length of trips, if most of the vehicles are being located in areas of the cities with a low demand overall, then their utilization might be hindered.
+Thus the optimization of the vehicle fleet position is the main objective that should be accomplished to increase the utilization of the service, for example by making sure that customers do not have to walk too far to get a car.
+
+\paragraph{Contribution.}
+The main contribution of this thesis lies in the proposition of a methodology in order to increase the car fleet utilization at the global level.
+This methodology has been created with the aim to be used by \emph{Free2Move}. 
+Since real constraints linked to the physical service operator had to be taken into account for the development of this methodology, it is only possible to relocate cars \emph{during the night} and the number of cars that can be relocated is limited by the number of dedicated staff, also called \emph{jockeys}.
+The main contribution is to compute the ideal car placement to be reached for the next morning, while taking into account constraints about the relocation costs and the jockey relocation capacity, in order to optimize the fleet utilization of the free-floating carsharing service~\cite{martin_prediction_2021,martin_optimisation_2022}.
+The reason for the morning placement of the fleet is to counterbalance the demand imbalance which has changed the distribution of cars in the city since the last time the fleet has been relocated by the staff operator.
+A two-step methodology is proposed in order to produce the result stated earlier.
+The first step consists in predicting the future car utilization the next day according to the car position in the city.
+Then the second step takes into account the utilization prediction of all the possible car positions as well as the distribution of the fleet before the relocation phase to then propose the ideal car fleet distribution for the next morning.
+The contribution of this thesis has been published as two conference articles in: 
+\begin{enumerate}
+    \item Gregory Martin et al., \textquote{Prediction-Based Fleet Relocation for Free Floating Car Sharing Services}, \emph{2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI)}, IEEE, 2021, pp. 1187-1191.
+    \item Gregory Martin et al., \textquote{Optimisation du Positionnement de Voitures en Autopartage basée sur la Prédiction de leur Utilité}, \emph{Conférence Nationale en Intelligence Artificielle 2022 (CNIA 2022)}, 2022
+\end{enumerate}
+
+\paragraph{Thesis Outline.}
+This thesis had been organized in five chapters.
+In Chapter~\ref{ch:background} is depicted how carsharing services are being modeled within the literature and how this kind of service is used by the customers in order to model their behavior accurately.
+Then a focus is made on the possible improvements made for carsharing services and in particular free-floating ones and how numerical optimization can support these ameliorations. 
+After that, the state-of-the-art carsharing simulations and on other transportation simulations that can include carsharing, with the aim to evaluate with more precision the results of the methodology developed during the thesis. 
+Finally, a last focus is made on state-of-the-art regression models that are used in the two-step proposed methodology.
+
+Chapter~\ref{ch:data_analysis} presents the details about trips data provided by \emph{Free2Move} and exogenous data of the three services located in \emph{Madrid}, \emph{Paris} and \emph{Washington}.
+Since those three datasets are used during the experimental evaluation, multiple analyses are made about daily utilization of the service and their spatial distribution.
+Furthermore, a finer analysis on the customers of each service is made in order to determinate if the prediction of the fleet utilization can be based on such information.
+
+Chapter~\ref{ch:method} presents the main contribution of this thesis, i.e. the two-step method predicting the car utilization before providing the ideal distribution of these cars for the next morning.
+First, the modeling of the car usage is explained as well as how it is possible to predict the next day car utilization according to the car position in the city as well as other exogenous data.
+Second, an explanation is made on how the placement of cars the night before the relocation phase and all predicted car utilization can be used in an integer linear programming model to find the next morning car placement.
+Finally, a first evaluation of this method is made, in particular to find out which prediction model should be used to predict the next day's car utilization as well as to find out the methodology effectiveness when it is compared to the historical service utilization.
+
+In Chapter~\ref{ch:simulation}, a second evaluation of the proposed methodology given in Chapter~\ref{ch:method} is performed. 
+The objective is to evaluate the behavior and the performance of this methodology when it is used for several days in a row, which is not possible with the previous experimental setting.
+Thus the simulator created for this experiment will be presented as well as how it actually simulates a carsharing service.
+Several baselines will be exposed and their performance tested against the previously presented methodology in order to make the evaluation.
+
+Finally, Chapter~\ref{ch:ab_testing} details a short A/B testing study made on the real carsharing service of \emph{Free2Move} in Madrid.
+Indeed a simplified version of the methodology proposed in Chapter~\ref{ch:method} has been tested on the field with the help of the operational team managing the service in Madrid.
+The conducted A/B testing proposed to compare a period A where the operational staff had made decisions about the relocations with a period B where the simplified version of the methodology is used to relocate the cars.
--- a/1_chapitre1.tex
+++ b/1_chapitre1.tex
@ -0,0 +1,889 @@
+\chapter{Background and State-of-the-Art}
+\label{ch:background}
+
+In the scientific literature, articles on the subject of carsharing have been published, with objectives ranging from determining the reasons why customers use the service, to determining what the economical or environmental impacts are.
+Since the aim of the current thesis is to optimize the usage of any free-floating carsharing service, the relevant state-of-the-art has been divided into four categories.
+
+The first category of articles presented address how it is possible to model a city and a carsharing service occupying it, depending on whether this service is station-based or free-floating.
+Moreover an additional focus is made on the approaches trying to model customer's usage patterns.
+
+The second category is dedicated to works dealing with the improvement of existing carsharing services. 
+Before presenting any article specific to carsharing, a short explanation is made about integer linear programming that is used in those works, i.e. one of the techniques from the mathematical optimization field. 
+Then methods about the improvement of carsharing services are presented.
+First, the methods leaning towards a user-based approach are presented as those leverage customer incentives as a way to improve the service.
+Then, the operator-based relocation methods are detailed, in this case the operator can act directly on the service to improve the fleet position.
+
+In the third category is presented the state-of-the-art for both the urban mobility simulation and the carsharing simulation, with the aim of evaluating the relevance of a given methodology. 
+Two approaches are presented: one based on events to change the simulator internal state and another one based on agents acting within the simulator according to their needs.
+
+Finally, in the last part of this chapter the state-of-the art of the models used to do regression is presented, since they are used in Chapter~\ref{ch:method} to predict the utilization of cars in the service.
+
+ 
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% \toignore{
+% \section{Urban Mobility}
+
+%     In order to understand the scientific literature about urban mobility and its intersection with shared vehicles-based mobility, this section is split into two subsections. In Section~\ref{sec:bg_vehicle_sharing_concept}, three close but different service types of shared-vehicle are presented. After all, carsharing is not the only service proposing users to rent vehicles by the minute and exploring the works in the surrounding field has interest \info{Elisa: il faut que tu fasses attention à rester informatif et pas "littéraire". Si tu mélanges des points de vue personnels avec des infos scientifiques, tu vas doubler le volume de ton manuscrit}. And then in Section~\ref{sec:bg_city_modeling}, the modeling of urban trips done by people in a city is presented depending on the type of vehicle sharing service studied. Indeed, being able to correctly model the city and its trips allows to use other abstract methodologies like graphs or regression models.
+    
+%     \info{A mon avis, tes chapeaux de section sont trop longs et ce que tu dis est répété apres...}
+
+% \subsection{Vehicle Sharing Concept}\label{sec:bg_vehicle_sharing_concept}
+
+%     When moving from one point to another within a metropolis area, the most common transportation means are: private car or bike, public transportation such as buses or subways, taxis or even foot. Between the mass-transport systems represented by the public transportation services and private owned vehicles, several types of vehicle sharing services exist. From shared bikes to autonomous mobility on demand through carsharing, every system has its constraints and its dedicated methodologies. %to either study them or improve them.
+
+% \subsubsection{Bikesharing}\label{sec:bg_bikesharing}
+
+%     A bikesharing service is a transportation mean with bikes freely available against monetary payment, usually in predefined stations. The users ride the rented bike and once their trip is finished, they lock the bike on a dock in order to let other users have the possibility to rent the shared bike. Since numerous bikesharing services uses predefined stations within the city, the issue of demand and supply of bikes in each predefined station arise. Indeed if a person comes to rent a bike but find that none is available or if a person desires to end her bike trip but find no dock where to secure the rented bike, then the \emph{customer demand} is not met.
+
+%     This issue is often associated with the re-balancing needed to be done by the service operator to keep each bike station with enough bikes to serve customers and enough free docks to let other customers park their rented bikes. %Thus the focus of several works is made on this rebalancing issue. 
+%     For example, the authors of~\cite{chen_dynamic_2016} propose to regroup stations of the service into clusters of same-behaving \info{Elisa: tu es sûr de ton mot là?} stations. Then, the authors propose to predict the probability of an over-demand for bikes or under-supply of free docks \info{Elisa:c'est la même chose ? Peut-etre ne pas dire les deux ?}. This study provides the knowledge that the bike demand is influenced by time and weather features, including the temperature, as well as other less frequent occurrences of traffic or social events. This is further acknowledged by the authors of~\cite{tomaras_modeling_2018} who propose to consider the large city events that affect the distribution of bikes in the stations around the city. This exogenous information will also be important in our carsharing problem. Finally, the authors of~\cite{hulot_towards_2018}  anticipate the over-demand/under-supply by training regression methods to predict the number of trips to be demanded in every station as well as the number of bikes to be returned. This allows them to detect whether a station needs bikes to be added or removed by the staff. \info{Elisa: j'ai fait des petites coupes parfois pour simplifier tes phrases, je te laisse regarder dans l'historique...}
+
+%     By early detecting possible over-demand for bikes, the previously presented works suggest a first relocation strategy. However, knowledge about how many bikes should precisely be taken from which station to be delivered to stations in need of bikes could further improve the strategy. The authors of~\cite{raviv_static_2013} present a method to directly create the route for two vehicle services to re-balance the bikes. They consider that the bike relocation is done exclusively when the service is the least busy, e.g., during the night. In the same way, the authors of~\cite{ghosh_dynamic_2017} and~\cite{ghosh_improving_2019} adopt an optimization formulation in order to address the imbalance number of bikes in stations and create the routes for the staff vehicles. Compared to~\cite{raviv_static_2013} the authors add the support for relocation of bikes during the normal activity hours by globally anticipating the demand, with a probabilistic model of the demand.
+
+%     Even with the addition of a finer relocation process, e.g. by adding the areas from which bikes should be taken to which area they should be dropped off, the methodologies cannot be directly applied to the free-floating carsharing problem faced in this thesis. Indeed the assumption about the relocation capacity is different between bikesharing and carsharing, in the former case a truck can relocate dozens of bikes all at once while in the latter case one staff member needs to go from cars to relocate to other cars to relocate, with often the need for an additional sweeper car to take care of staff members.
+    
+% \info{Elisa: sinon c'est pas mal. Peut-etre que si tu reparles de ces refs plus tard tu remplacer cette section par une version plus courte et moins structurée dans l'intro...}
+    
+
+% \subsubsection{Autonomous Vehicles}
+
+%     \towrite{Those papers should be used:\cite{vosooghi_critical_2017,horl_dynamic_2019,volz_relocation_2022,marczuk_autonomous_2015}}
+
+%     We focus on a subpart of the autonomous vehicles research filed dedicated to the study of automated mobility-on-demand, abbreviated as AMoD. This kind of service, yet hypothetical, can be viewed as carsharing services with some tweaks. In a standard carsharing service the vehicles are not autonomous and cannot relocated themselves if needed, and in addition users of the service have to drive the car themselves too. However the global usage and its costs remains within the same boundaries when compared against taxis and busses whether the cars are operated by humans or autonomously. Indeed as shown by~\cite{bosch_cost_2017}, the cost of autonomous shared vehicles is lower than the cost of non-autonomous taxis while being more expensive than autonomous buses in a urban setting.
+
+    
+    
+
+%     The research in autonomous vehicles is vast, but a focus can be made on a subset of this field revolving around the study of automated mobility-on-demand, abbreviated as AMoD. As described in~\cite{bosch_cost_2017,horl_fleet_2019,horl_dynamic_2019}, this kind of service would offer to each customer a taxi-like service but without the need to have a human taxi driver, who has a big influence on the final price paid by the customer. The aim would be to propose an alternative to the private owned vehicle for (sub)urban trips, with the expected effects of reducing the global cost a person needs to pay of their transports and of reducing the space occupied due to park unused vehicles.
+
+%     In the paper~\cite{horl_fleet_2019} the work presented by its authors revolves around the automated mobility-on-demand
+% }
+
+
+\section{Carsharing Service Modeling}\label{sec:bg_city_modeling}
+
+Optimizing the usage of a carsharing service requires to first model the service, both its spatial dimension and its usage by the customers.
+In this section both are presented, with a first insight on the frequent models encountered in the carsharing literature to spatially represent the service and its vehicles.
+Then the focus is made on several studies about customer usage patterns and about the external factors influencing them.
+
+\subsection{City Modeling}
+
+Each vehicle sharing service generates data about customer usage.
+On top of data required on a practical point of view, such as driver licenses, payment information or personal information, the service will register the trips made by the users.
+This trip data often consists in having at least the spatiotemporal point of the trip's start and the spatiotemporal point of the trip's end.
+Those points are made of a GPS coordinate and a timestamp.
+In the current section, the focus will be made on how to make an abstraction of the GPS coordinates to model the trips.
+The aim is to be able to use additional methods on top of this model for either the prediction of car usage or their relocation.
+It should be noted that the choice of the model to use is not trivial, mainly because the algorithms used to improve the carsharing service are going to be constrained by the chosen model.
+
+\paragraph{Graphs.}
+
+Station-based vehicles sharing services can be straightforward to model compared to their free-floating counterparts.
+Indeed, the backbone of the service is the set of stations where all the vehicles are allowed to park.
+Thus, it is possible to directly use the set of stations instead of the city as the basis of the model.
+In most articles relying on stations, such as~\cite{zimmermann_profiling_2015}, they are modeled as a graph : the stations are represented by a node in the graph and each possible trip between the stations are modeled by the links between the nodes.
+An application of this model is presented in the Figure~\ref{fig:ch1_singapore}, where the article~\cite{xu_new_2007} study the carsharing service located in Singapore as an example.
+The stations represented in Figure~\ref{fig:ch1_singapore_station} are modeled with a graph shown in Figure~\ref{fig:ch1_singapore_graph}.
+Even if in this case it is a bike-sharing service, the modeling principle is the same for carsharing services.
+
+\begin{figure}[!ht]
+    \centering
+    \begin{subfigure}[t]{0.44\linewidth}
+        \includegraphics[width=1\textwidth]{figure/ch1_singapore.jpeg}
+        \caption{carsharing service's stations location in Singapore from~\cite{xu_new_2007}.}
+        \label{fig:ch1_singapore_station}
+    \end{subfigure}
+    %
+    \begin{subfigure}[t]{0.54\linewidth}
+        \includegraphics[width=1\textwidth]{figure/ch1_singapore_graph.jpg}
+        \caption{Representation of the possible graph corresponding to the stations in Singapore, with only some trips represented.}
+        \label{fig:ch1_singapore_graph}
+    \end{subfigure}
+    \caption{Example of model to represent a station-based carsharing service located in Singapore. On the left is the map of Singapore with the location of each station. On the right is graph to model the stations of the service with a subset of possible trips (arcs) from one station (vertex) to another.}
+    \label{fig:ch1_singapore}
+\end{figure}
+
+However it is not possible to directly use graphs as a model for free-floating carsharing services.
+Indeed since the cars are able to park in any parking lot, while it would be theoretically possible to assign a node for each parking lot in the city, this is impracticable.
+A trip between each parking lot would be modeled as a link between the corresponding nodes.
+Many issues arise with this kind of graph, first it would be necessary to know the exact number of parking lots in the city and their location.
+Then, because the number of parking lot in a city is huge and fewer trips are made in comparison, this would lead to a large but sparse graph, i.e a high number of nodes but few arcs overall.
+Finally no abstraction would be done at all, two parking lot side by side would be considered as two separate nodes in the model even if they are in reality equivalent because of their proximity.
+
+\paragraph{Districts.}
+
+Since the use of graphs is not directly possible for free-floating carsharing services, another model is needed to represent the city and its service.
+It is possible to divide the city where the service is implemented into administrative districts.
+For example a service in Paris could use the \textquote{arrondissement}, such as shown in Figure~\ref{fig:ch1_paris_arrondissement}.
+The articles~\cite{febbraro_one_2012, schmoller_empirical_2015} are an example of how the cities were divided in districts in order to create a model of the area where a free-floating carsharing service is operating.
+The aim is to entirely cover the area with districts, without overlapping them.
+Then, for example it would be possible to take all the districts and all the trips between them and to create a graph to represent the carsharing service.
+In this model, a district would be represented with a node and a trip between two districts would be a link between the corresponding two nodes.
+
+This type of model helps to understand and visualize how the city was divided since it is usually based on real districts.
+However, this type of division does not guarantee a pertinent representation of the service usage.
+First because larger districts could hide finer hot spots of demands or trips made, preventing the usage of finer relocation strategies.
+Second because it would not represent correctly the maximum distance a user is capable of traveling before reaching a car to rent it.
+Indeed in the study of~\cite{seign_prescriptions_2013}, a user is willing to walk up to 500 meters to rent a car from a carsharing service and if none is available within this distance, then they might simply not rent a car at all.
+Thus in order to satisfy the users, the operator of a carsharing service will need to add the management of the distribution of the cars inside the districts, which can add more steps in the methodological approach.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.8\textwidth]{figure/ch1_paris_arrondissement.jpeg}
+        \caption[Paris arrondissements]{Paris \textquote{arrondissements} can be used to model the surface serviced by a free-floating carsharing operator (Credits: ThePromenader).}
+        \label{fig:ch1_paris_arrondissement}
+\end{figure}
+
+\paragraph{Grids.}
+
+Another possible means to model the trips of a vehicle sharing service is to divide the city into small areas with regular shapes.
+The objective is to make a grid where each cell represents the maximum distance a user wants to travel before reaching a car.
+This technique is used by the authors of~\cite{weikl_relocation_2012} as a first step for their objective of introducing a two-step algorithm to have optimal positions of the car fleet and a relocation strategy.
+
+In order to represent the maximum distance the users are willing to travel, a circle would be the best shape to use.
+However it is not possible to have a grid made of circles : this would either introduce overlap between circles or introduce blank areas between other circles.
+To pave the city with shapes regular enough to create a grid while staying as close to the shape of a circle, it is then possible to use hexagons.
+Indeed with hexagons it is possible to make grids and keep a good representation of the user's willingness to walk to a car.
+While an obvious alternative to hexagon would be squares, their utilization to make grids hinder the representativeness of the cell's neighbors as concluded by the authors of~\cite{schmoller_empirical_2015}.
+
+However, using a grid standardizes the city districts and erases the differences a further analysis could need like district information.
+For example, it could be interesting to know if most of the trips are done between a residential district and a district with many offices.
+With cells it would add an overhead to mark each created cells as belonging to a certain type of district.
+Furthermore, one of the main issues with the usage of a grid is to determine which radius to use for the hexagon.
+With smaller the cells, the representation of the demand is finer but this implies to create more cells to cover the area serviced by the operator.
+This makes the scaling of additional methods for the relocation difficult for a high number of cells.
+It also might increase the probability of splitting a hot spot of departure or arrival of trips between two hexagons.
+This \textquote{border effect} in which the demand from the same source is split between several cells might reduce the accuracy of the representation in the grid of this source of demand within the city.
+
+\paragraph{Artificial districts with grids.}
+
+A third possibility to divide the city resides in the combination of both previous approaches.
+The article~\cite{weikl_practice_2015} use this approach and first divides the city into artificial districts and then divides each artificial districts into cells.
+But, contrary to administrative districts as presented before, the authors of~\cite{weikl_practice_2015} chooses to create districts out of the data, by creating virtual carsharing stations.
+They are created by finding where they should be placed to cover the city and the utilization of the carsharing service.
+Hence, each artificial district should act as the area of influence of this virtual carsharing station, such that every part of the service area is assigned to the closest virtual station and thus included in the artificial district.
+This creates a homogeneous and uniform district cover of the city.
+The Figure~\ref{fig:ch1_cells_district} shows this technique applied to Munich, where there are more districts in the city center than in the suburbs because of the density of the carsharing demand.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.8\textwidth]{figure/ch1_cells_district.jpeg}
+        \caption[City modeling by \cite{weikl_practice_2015}]{Example from \cite{weikl_practice_2015} of Munich being divided in districts and then in cells. Some cells are not represented since they may consist only of a park or another area where no car can be picked or dropped.}
+        \label{fig:ch1_cells_district}
+\end{figure}
+
+This approach allows to create a division which tries to reduce the length of relocation trips done by the operator since the property of each district is to reduce the imbalance of ongoing and outgoing vehicles.
+However the pertinence of creating districts with the sole basis of the imbalance between the number of cars to enter and to leave the district is questionable.
+Indeed if the idea is to reduce the relocation distance made, then this reduction could have been directly incorporated into the optimization problem proposed by the authors by adding a relocation cost in their objective function, by taking into account only the grid and not the district.
+Thus the created district would be unnecessary.
+
+\paragraph{Conclusion.}
+
+Among these three types of city modeling often used in carsharing improvement works, the first one can only be used if the service is a station-based one.
+The second and third propositions can be used in the context of the current thesis, i.e for a free-floating carsharing service, but the added value of creating districts in the case of the third proposition remains unclear.
+Thus the choice of modeling the cities for the current thesis is to use a hexagonal grid.
+
+
+\subsection{User Behavior Modeling}
+\label{sec:user_behavior}
+
+Not only should the serviced area be modeled, but the user's usage patterns and other external features have to be modeled too.
+Indeed, it makes possible to model carsharing services and their trips more accurately.
+Thus articles from the literature that studies the user's usage patterns and other external features are the focus of this part.
+
+\paragraph{Spatial Observations.}
+If the serviced area of a city is modeled with a grid of squares or hexagons, one of the main parameters of the grid is the size of each cell.
+It will impact the representation of how the cars are regrouped in the reality, for example a big cell might hide the fact that numerous cars are concentrated in the same spot and deprive the neighboring areas of cars for customers.
+Thus some works have been made to assess the maximum distance a carsharing user is willing to travel in order to take a vehicle.
+For example, the authors of~\cite{costain_synopsis_2012} have studied the users of the service named \textit{AutoShare}, a station-based carsharing service in Toronto.
+The authors found that increasing the number of stations, and thus the coverage of the service, was more beneficial than a plain addition of cars.
+They have estimated that 65\% of the trips were made by people living closer than 1 km from any station.
+Furthermore as stated before, the authors of~\cite{seign_prescriptions_2013} expects users to walk a maximum distance of 500m before reaching a car.
+Thus even if the exact distance a customer is willing to walk towards a car in order to take it is difficult to obtain, 500 meters is a reasonable distance to base models for a carsharing service.
+
+\paragraph{Usage Patterns.}
+Usage made of the service by the users is one of the key components to understand the users' behavior.
+In~\cite{becker_comparing_2017}, the authors surveyed the usage patterns of car-sharing members.
+Three types populations of the Swiss city Basel are studied: members of a free-floating carsharing service opened less than one year before the survey, members of an already well-established station-based carsharing service and people not member of any car-sharing service.
+With this survey, the authors discover that most of the trips made by station-based carsharing members were for either leisure or shopping or large goods transport such as furniture.
+However in this survey, free-floating carsharing members used the service less for leisure and more for commute trips or for airport transfers.
+Moreover in this survey the authors discovered that free-floating services were more spontaneously used than their station-based counterparts: a majority of members of station-based planned their trip at least a day in advance while free-floating members usually planned their trip less than one hour in advance.
+Finally, an additional difference between station-based carsharing users and free-floating carsharing users is noticed: if a car is not available for the trip the majority of the station-based group either postpone or cancel their trip while the majority of the free-floating group use public transport instead of the carsharing service.
+Overall, this study shows a large difference in usage patterns between station-based and free-floating members, arising the need to differentiate both types of carsharing when trying to model them.
+
+\paragraph{Exogenous Influences.}
+The usage of carsharing is also dependent on factors external to the service itself, such as traffic, weather conditions or city topology.
+As presented before, in~\cite{becker_modeling_2017} the authors note that users of station-based carsharing service rely more on public transportation than users of free-floating carsharing services. 
+Indeed in their analysis, a higher concentration of public transport stations correlates with a higher usage of a station-based carsharing service for an area.
+However the same correlation is not present for free-floating carsharing services. 
+Instead, the authors found that free-floating customers use this kind of service in order to make trips that would be difficult with only public transportation.
+
+Moreover, even if it is not clear whether weather can be a determining factor in the utilization rate of a free-floating carsharing service as stated by the authors of~\cite{schmoller_empirical_2015}, they investigated the role of sudden change of weather during the day and notably sudden rain.
+According to this work, the presence of precipitation in the early evening from 5 p.m. to 8 p.m. when there was no precipitation at all during the morning or early afternoon would have a small (6\%) but significant impact on the number of reservations compared to a day without any precipitation at all.
+
+Thus both papers show an impact of external factors such as stations of the public transport system and the weather, to a small extent, making the modeling of those features interesting for the modeling of user behavior.
+
+\paragraph{Temporal Analyses.}
+In order to temporally model carsharing data, it is necessary to know the users temporal habits, on a daily and weekly scale.
+The authors of~\cite{ampudia_electric_2020} conducted a study on both free-floating carsharing services \textit{Car2Go} and \textit{Emov} in Madrid.
+The results of this study show that the use case for this type of service is different depending on the day of the week.
+During the weekdays, from Monday to Friday included, one of the main usage is related to commuting to and from work.
+Three peaks are observed during these days, a first one at 8 a.m., a second one around 1 p.m. and a last one around 8 p.m. It is worth noting that the authors of~\cite{ampudia_electric_2020} indicate that the second peak, at 1 p.m., is observable only for the carsharing service in Madrid and not for other services in other cities.
+Finally, a different use is made of the service at weekends, during these days the usage is stable with small peaks of usage at around 1 p.m. and 8 p.m., leading to the conclusion that a more leisure-oriented usage is made of these carsharing services.
+
+For the services located in Munich and Berlin studied in~\cite{schmoller_empirical_2015}, the authors note that the days around Christmas see a decreased usage of the service and the New Year's Eve day has a significant drop in reservations too.
+Furthermore in this work, the authors observe a lower average utilization during the summer period, between June and September, and they assume members to travel more by bike or on foot to explain this lower usage.
+On a weekly basis, both Munich and Berlin services are more used on Fridays and Saturdays than the rest of the week.
+Finally, the authors check in detail the service usage during the day, depending on the day to be a weekday or a weekend.
+They observe that during the weekdays, there are two peaks of usage one the morning between 8 a.m. and 10 a.m. and one the evening between 5 p.m. and 8p.m.
+Then during the days of the weekend, the usage rate is constant throughout the day with a slight peak between 7 p.m. and 8 p.m.
+
+Both articles show that the need to take into account a temporal dimension on multiple scales is necessary.
+Indeed since the usage rate can be different because of holiday days, because of the day of the week or even during the day, several features need to be made in order to model the variations of carsharing usage.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.8\textwidth]{figure/ch1_communauto_area.jpg}
+        \caption[Comunnauto Montreal Service]{This map from the study~\cite{wielinski_exploring_2019} shows that the \textit{Communauto} carsharing service is a combination of a station-based part and a free-floating part. Customers can use both depending on their usage, station-based cars being reserved for longer reservations and free-floating one for shorter trips. The colored areas on the map designate the areas serviced by the free-floating part of the service (\textit{FFcs}). Different colors denote the spatial extensions added as the service expended. The dots are stations of the other part of the service (\textit{SBcc}). Metro lines have been added by the authors since they are used in their data analysis.}
+        \label{fig:ch1_communauto_area}
+\end{figure}
+
+\paragraph{Activity Analysis.}
+The utilization of a carsharing service is dependent of the customer usage consistency.
+The presence of regular users, i.e. people who often use the service, means that a non-negligible part of the demand can be estimated.
+Indeed the aim is to know whether carsharing trips can be modeled or not, i.e. the trips are not mainly random trips.
+If one service wants to answer this question, the study~\cite{wielinski_exploring_2019} helps to understand the habits of carsharing customers.
+This study has been done on the \textit{Communauto} carsharing service in Montreal, with trip and customer data from 2010 to 2018 knowing that the carsharing serviced area has evolved during this period has shown in Figure~\ref{fig:ch1_communauto_area}.
+The authors have segmented the customers into four categories according to the usage frequency of the customer.
+The four categories that have been created are: Low Frequence (LF), Medium Frequency (MF), High Frequency (HF) and Ultra Frequency (UF).
+The limit between each category is computed as the number of active day on a period of 90 days for each customer, with an active day being a day where the customer has made at least one trip.
+Then the limit between LF and MF has been fixed by the authors to 4 days of activity on 90 days, the limit between MF and HF as 10 days of activity and the limit between HF and UF as 26 days of activity.
+For example, the meaning of the Ultra Frequency group of customers is that each customer in this group uses the service around at least once per three days on average.
+This segmentation helps to differentiate the usage patterns between different types of customers.
+For example, the authors found that HF and UF categories are more likely to make chained trips, i.e. one trip after another with a time break between both trips, and more likely to make symmetric trips, i.e. chained trips with the start of the first trip corresponding to the end of the second trip.
+Furthermore the distribution of time difference between two consecutive symmetric trips shows a spike for all categories at around 1.5h between two trips that are explained as leisure trips by the authors.
+But the UF category has an additional spike in the distribution around 8.5h corresponding to time spent at work between two trips, as explained by the authors.
+This work in particular shows that the study of the customer regularity leads to confirm the possibility of modeling the customer demand since it is not only random trips.
+
+\paragraph{Conclusion.}
+On top of the spatial modeling of the carsharing service, the usage patterns of the customer has to be modeled too in order to grasp the information available in the customer trips.
+Thus external features influencing the usage of the service has to be taken into account such as the availability of the public transport system and the weather in the city.
+Moreover, additional temporal features are needed if the customer demand has to be predicted, such as the day of the week or the hour of the day.
+The several articles presented also show that station-based and free-floating carsharing services cannot be modeled identically as customer usage patterns are different.
+
+
+% \subsection{Trip Prediction}
+
+% \toadd{
+%     Ici il faut faire une passe sur l'état de l'art en ce qui concerne la prédiction de liens dans des graphes dynamiques. Il n'est pas nécessaire de rentrer trop dans le détail. Cependant il est important d'indiquer qu'on a essayé de faire de modéliser notre service avec un graph dynamique pour utiliser des méthodes de l'état de l'art. Bien que notre modèle de graph soit théoriquement possible, dans les faits cette modélisation n'était pas viable car une très grande partie du graph était juste constitué de vide et les méthode existantes pour la prédiction de lien sont principalement basées sur la topologie du graph ce qui n'est pas compatible avec notre problème en pratique.
+% }
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Shared-Vehicle Service Improvements}
+\label{sec:share_vehicle_service_improvements}
+
+Carsharing services can be improved, for example to increase its usage or make the system profitable for the operator, by numerous means such as offering specific discounts, extending the area serviced by the system, augmenting the fleet offer with different types of vehicles for different types of usages and so on.
+Most of the experimental settings to test those possibilities require to have a direct control on the service, to make the changes in the real service and observe them for reporting. 
+That is why the focus of the thesis about the improvement of shared-vehicle services has been made on the strategies that can be evaluated from past recorded data without the need to intervene on the real service, like strategies about optimizing the cars' placement in the city.
+
+The main contribution of this thesis is based on the resolution of combinatorial problems, notably by finding the best distribution of cars in the city according to a criterion.
+Thus in this section, we first present the mathematical prerequisites needed to understand the formulations often used in the works about the optimal placement of vehicles in the city.
+Then in a second part will be explained several works from the literature that study how operators of vehicles-sharing serviced can induce changes on the behavior of their users.
+The aim is to encourage certain types of behaviors to make the vehicle distribution more balanced without the direct intervention of the operator's staff. 
+Finally, other articles not intervening on the user's behavior but directly on the vehicle distribution through the usage of the operator's staff, with a focus on doing that effectively, are going to be presented.
+
+\subsection{Mathematical Optimization}
+
+Mathematical optimization is a formal approach used for the resolution of practical problems.
+For example, the \textit{Knapsack Problem} or the \textit{Traveling Salesman Problem} can be modeled using mathematical optimization.
+If those problems are small enough, they can be solved by specialized algorithms called \textit{solvers} such as \textit{Gurobi} or \textit{SCIP}\footnote{\textit{SCIP} is the abbreviation of the solver called \textit{Solving Constraint Integer Programming}.}.
+
+\paragraph{General Approach.}
+In the general approach, a model of mathematical optimization is made of three main parts: the set of decision variables, the set of constraints on the decision variables, the objective function.
+The objective of the model is to describe the solution that should be found and returned by the solver.
+First, the set of decision variables are the unknown variables which the values are needed for the resolution of the problem.
+In the general approach of the mathematical optimization, the value of each decision variables is real ($\in \mathbb{R}$).
+For example, if a company needs to make concrete (made of water, sand, cement, aggregates and admixtures), the decision variable would be the proportions of each ingredient to pour in the mix to make concrete, knowing that two different ingredients could be used for the same purpose, e.g. two different admixtures.
+Then a set of constraints in the model can be declared to restrain the domain of the values for each decision variable.
+Following the example, constraints could be added to create intervals of proportions for each ingredient to make sure that the concrete obtained can be safely used for the construction of buildings.
+Finally, the objective function is the function that should be either maximized or minimized in order to find the \textit{best} solution, i.e. the best values for each decision variable.
+For the same example, one possible objective function could be the minimization of the ingredient costs.
+
+To sum up with a mock-up problem, in the following example Equation~(\ref{eq:ch1_ga_4}) and Equation~(\ref{eq:ch1_ga_5}) are defining two decision variables $x$ and $y$ that should be strictly positive, i.e. zero is not a valid value for $x$ and $y$. 
+The value of those decisions variables are constrained to a subset of values such that they respect the inequality defined by the constraints in Equation~(\ref{eq:ch1_ga_2}) and Equation~(\ref{eq:ch1_ga_3}).
+Finally Equation~(\ref{eq:ch1_ga_1}) is the objective function, stating that the value of $x$ and $y$ is searched in order to maximize the value of this function.
+In this example, the maximum value of the objective function would be $10.33$ with the values $x=3.93$ and $y=3.2$.
+As a practical representation of this example, the 2D space of solutions is shown in Figure~\ref{fig:ch1_ilp}.
+One can note that four areas are distinguishable.
+The first half-plane in blue is the space excluded by the domain definition of $x$ in Equation~(\ref{eq:ch1_ga_4}).
+The second half-plane in green is the space excluded by the domain definition of $y$ in Equation~(\ref{eq:ch1_ga_5}).
+Then the two additional Equation~(\ref{eq:ch1_ga_2}) and Equation~(\ref{eq:ch1_ga_3}) are respectively excluding the cyan and magenta half-plane. The resulting space of solutions is in the center of the figure and the blue point represents the value of $x$ and $y$ such that they maximize the objective function while staying within the bounds of each decision variable's domain of definition.
+
+\begin{align}
+    \underset{x, y}{argmax} \quad & x + 2 y \label{eq:ch1_ga_1}\\
+    s.t. \quad & x + 5 y < 20 \label{eq:ch1_ga_2}\\
+    & 3 x + y < 15 \label{eq:ch1_ga_3}\\
+    & x \in \mathbb{R}^{+}_{*} \label{eq:ch1_ga_4}\\
+    & y \in \mathbb{R}^{+}_{*} \label{eq:ch1_ga_5}
+\end{align}
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.8\textwidth]{figure/ch1_ilp.jpg}
+        \caption[ILP Example]{Graphical representation in 2D of the optimization model stated by Equations~(\ref{eq:ch1_ga_1}) to~(\ref{eq:ch1_ga_5}). Each colored half-plane represents a part of the 2D space with invalid tuples $(x, y)$. The blue point is the optimal solution if $x$ and $y$ are in $\mathbb{R}$. The gray and red points are the set of point representing possible solutions if $x$ and $y$ are restricted to be integers, with the red point being the optimal solution in this case.}
+        \label{fig:ch1_ilp}
+\end{figure}
+
+The previous example was a mathematical optimization formulation with a unique best solution.
+However if the formulation were to be ill-defined, the problem could be either unbounded or over-constrained and no best solution could be found.
+In the former case, the maximization of the value of a decision variable without any upper bound is not possible since the value of the unbounded variable would tend towards $+\infty$ which is not a valid value.
+And in the latter case, one model could have a decision variable with no value satisfying all constraints of the formulation.
+Furthermore, it should be noted that minimizing a function is the same as maximizing the opposite function, like shown in Equation~(\ref{eq:ch1_equivalence}), without the need to change other parameters of the mathematical model.
+
+\begin{equation}
+    \label{eq:ch1_equivalence}
+    \underset{x \in \mathbb{R}}{\mathit{min}} \;\; f(x) \quad = \quad \underset{x \in \mathbb{R}}{\mathit{max}} \;\; -\!f(x) 
+\end{equation}
+
+\paragraph{Linear Programming.}
+It can be noted that the solution space drawn in Figure~(\ref{fig:ch1_ilp}) is convex.
+Indeed this example is a \textit{Linear Programming} (LP) formulation, a subtype of \textit{Mathematical optimization} formulations.
+Like in mathematical optimization, the aim is still to either minimize or maximize an objective function while choosing the values of decision variables.
+However unlike in the general setting of mathematical optimization, LP disallows the usage of nonlinear constraints and nonlinear objective function.
+Indeed, all constraints and the objective function should be a linear combination of the decisions variables.
+\textit{Linear Programming} models have a major advantage since they can be reduced to convex optimization problems~\cite{boyd_convex_2004} with a solution that can be found with polynomial-time algorithms~\cite{nesterov_interior_1994}.
+Thus, they can be used for practical problems unlike most formulations for NP-Hard problems in the more general mathematical optimization field~\cite{murty_complete_1987}.
+
+\paragraph{Decision Variable Types.}
+Some problems require the decision variables to be inside a different set than $\mathbb{R}$. 
+In the case of the \textit{Knapsack Problem}, one could want to restrict each object to be put as a whole without being split; indeed putting half a computer would make no sense. 
+Subtypes of mathematical optimization are thus defined such as the \textit{Integer Programming} which all decision variables are constrained to be inside $\mathbb{N}$, the \textit{Mixed-Integer Programming} which has some decision variables being integers and other being real and the \textit{Binary Integer Programming} restricting all decision variable to be either zero or one.
+
+If the previous example from Equations~(\ref{eq:ch1_ga_1}) to~(\ref{eq:ch1_ga_5}) is used, it can be cast into a \textit{Integer Linear Programming} model by replacing $\mathbb{R}^{+}_{*}$ as $\mathbb{N}$ in Equations~(\ref{eq:ch1_ga_4}) and~(\ref{eq:ch1_ga_5}) and keeping the linear constraints and function objective. 
+In Figure~\ref{fig:ch1_ilp} the half-plane is kept as is since the constraints have not changed. 
+However instead of having all the 2D space being of set of possible solutions, only the points in gray and the best solution in red is the set of possible solutions.
+Thus the maximum value of the objective function is now $9$ with $x=3$ and $y=3$.
+
+\paragraph{Conclusion.} The mathematical optimization and especially the variant called \textit{Integer Linear Programming} are tools that can be used to search for an optimal solution according to an objective function. 
+In the case of carsharing, an operator could use this approach if the operator wants to find the best distribution of cars inside a city depending on a given criterion. 
+
+\subsection{User-Based Relocations}
+
+Trips made by the customers move the vehicles around the city and change the distribution of the fleet in the city over time.
+However the global utilization of the cars by customers is imbalanced: for example commuting customers might take the car only for their morning trip or for their evening trip and thus drop the car in an area where the car will be left unused.
+The consequence is that the fleet distribution in the city becomes imbalanced as well.
+In the scientific literature, many works seek to influence the users of the service in order to reduce the imbalance of the fleet.
+This type of relocation approach is called \textit{user-based relocations}.
+For example, services could either set different prices depending on the start and/or end point of the trip or give price cuts for cars that have been unused for too long.
+
+In the article~\cite{febbraro_one_2012}, the authors design a user-based relocation approach by first modeling a one-way carsharing service with a Discrete Event-based Simulation and then by creating an Integer Linear Programming model to take into account the relocation process.
+In this work, the authors decided to only act on the end location of a customer trip, i.e., they make the user change the station where to end his trip by offering to drop off the car at another close station right after finishing the trip.
+The practical study is made to simulate a one-way carsharing service in the city center of Turin for thirty days and with each day split into three periods to estimate the demand for such a service.
+Since the probability that a customer will accept a different end location of his trip is a parameter of the simulation, the authors have studied different values of this parameter with different fleet size to observe the impact of those two parameters on the number of trip served by the service.
+The authors are capable of demonstrating that the rate of reservations refused by the service is lowered by raising the relocation acceptance and the fleet size.
+This allows them to approximate which fleet size would roughly optimize the rate of accepted reservations against the cost to run the service.
+However this methodology has a limitation for its applicability to a real carsharing service.
+Indeed the probability that a customer is willing to accept to modify the location of the end of his trip is directly made by the authors and not linked to any study on how much it would actually cost to have  the given probability of acceptance.
+
+The article~\cite{brendel_decision_2017} proposes a methodology to design pricing areas to involve the user in the relocation process through incentives.
+The aim is to counterweight the natural disequilibrium of the system ; in normal carsharing sharing services, the cars can be freely picked in areas with a high demand and dropped within an area with a low demand without repercussion for the customer.
+The main idea is that the customer should be rewarded by the operator if he chooses to take or park the car in an area rather than another.
+Thus the authors choose to implement different pricing areas within the studied city, by studying the demand in the city and creating three areas with different pricing: a low-demand area, a medium-demand area and a high-demand area.
+Customers taking cars in low-demand areas are given a high reward while customers putting cars in low-demand areas face a high penalty.
+Like for low-demand areas, in the case of medium areas a small reward is given to customers renting cars while a small penalty is given if a car is dropped off there.
+In high-demand areas, no reward or penalty is given.
+The aim is to both encourage customers to take cars from low-demand areas to high-demand areas while keeping the incentive rewards and penalty balanced so the operator's costs are not too much affected.
+This methodology is tested by the authors on a carsharing service during a month and a half (Feb. 2017 and Mar. 2017), with a reference period of the month and a half just before (Jan. 2017 and Feb. 2017), with pricing areas as shown in Figure~\ref{fig:ch1_user_brendel}.
+The authors show that their methodology is capable of reducing the need for operator-made relocations by 15\% while increasing the overall number of trips by 11\%. 
+In particular, the medium-area sees an important decrease in pick-up and drop-off of cars (resp. -11\% and -13\%) while the high-demand area sees an important increase in both pick-up and drop-off of cars (resp. +13\% and +15\%).
+However the main limitation of this approach is the customer willingness to make relocations and not to abuse from this reward/penalty system.
+Indeed in the past years, the service named \textit{Multicity} in Berlin had made the decision to reward its customers for plugging-in the electric cars of the service for recharge. 
+However the operational team from Multicity discovered that several customers were having a non-standard usage of the cars as they were taking cars only to recharge them and not to make normal trips, and thus messing with the distribution of cars in the city since the already recharged cars were put next to the recharge hubs.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.8\textwidth]{figure/ch1_user_brendel.jpeg}
+        \caption[Pricing Area Example]{Map from~\cite{brendel_decision_2017} of the pricing areas in the city studied by the authors. Three areas are defined by clustering the squares of a grid by taking into account the number of rentals departing the squares. Then the operator smooths the areas in order to make the areas fit geographical constraints and to make the pricing areas more understandable for customers. \textit{Area 1} is the high-demand area, \textit{Area 2} is the medium-demand area and \textit{Area 3} is the low-demand area.}
+        \label{fig:ch1_user_brendel}
+\end{figure}
+
+A third approach made by the authors of~\cite{schiffer_polynomial_2021} is a methodology to compute how many trips could be served by a service if all the customers were willing to modify their trip to facilitate picking up of cars for the customers after them.
+This is similar to creating an upper-bound on the performance of the carsharing service if only the customer behavior could be modified.
+Knowing all the future true trips made by the customers, the main idea is to modify with slight changes all customers trips so that as many as possible requests are served by the cars.
+To do this, the authors define a sequence of trips as the sequence in which each trip following another should begin at a distance less than $\delta$ meters of the end of the previous trip and should begin after the end of the previous trip, with the objective to serve all trips of the chain with the same car.
+In this work, the authors allow themselves to modify the trip so that only one of its characteristic is modified, i.e either the spatial point of departure or arrival or the time of departure or arrival.
+These sequences of trips are generated from all the customer trips through the use of an Integer Linear Programming model in which an additional hypothesis is made: customers do not refuse the modifications made to their trip.
+To avoid using a generic solver and to reduce the time needed to find a solution, the authors propose a graph-based reformulation with a custom solver to find its solution.
+In order to validate their methodology, the trips from the service \textit{Car2Go} in Vancouver are used to generate different scenarios by sampling.
+The aim is to assess the performance of the reformulation depending on the number of customer requests and the number of cars.
+With their experiments, the authors show that their proposed graph-based reformulation improves as much the revenue of the service as the first \textit{ILP} formulation, but the solution found with less time by the proposed reformulation than the first \textit{ILP} counterpart.
+However, this model works when all the parameters are known in advance and most notably the customer trips.
+Thus is not usable to study which rate should be applied for a real carsharing service.
+Indeed it can only provide the upper-bound to expect if the service relies only on relocations made by modifying users' behavior.
+This method can be useful for a service, thanks to the possibility of using this methodology in a simulator, as this creates only the upper-bound that can be expected if a user-based relocation method is adopted without any additional intervention of the carsharing operator.
+Moreover if one is interested in knowing a more global upper-bound on the performance of the service, such as a combination of user-based and operator-base relocations, this method does not give such an upper-bound.
+
+\paragraph{Conclusion.} The methods presented here have the advantage of leveraging the user as a source for rebalancing the carsharing service.
+However while some works address one blocking point individually, such as defining areas to apply rewards or price penalties, or by studying how many more trips could be served if the customers were more or less forced to modify their trip, none provides a methodology directly applicable for a carsharing operator.
+Indeed, the incentives' amount to give to customers has to be studied with more precision and how to apply them in the general case of a carsharing service too, with the aim to have incentives appealing enough for normal customers to follow them and not too generous to remove abuses by malicious customers.
+On a more operational point of view such works could be integrated only in the long-term into the \textit{Free2Move} services since the operator has not, yet, the capabilities to offer precise incentives to customers based on their trips. 
+
+\subsection{Operator-Based Relocation}
+
+Instead of making the user perform the relocations to rebalance the fleet of vehicles inside the city, the operator can directly act on the fleet.
+Indeed by hiring a dedicated team, the operator can assign each team member to the relocation of misplaced cars.
+The main advantage of this kind of approach is the guarantee that when a car is in need of relocation, then a staff member will relocate it, whereas in user-based relocations the operator can only hope for the cars to be moved to a better position.
+Furthermore an operator-based relocation can combine maintenance trips and recharge trips for electric vehicles too discharged to be taken by customers.
+In this part, three types of works are going to be presented.
+The first one concern the study of bikesharing services as studied in~\cite{chen_dynamic_2016}, \cite{tomaras_modeling_2018}, \cite{hulot_towards_2018}, \cite{raviv_static_2013}, \cite{ghosh_dynamic_2017} and \cite{ghosh_improving_2019}.
+The second one is about the station-based carsharing serviced as presented in~\cite{guo_vehicle_2020}, \cite{boyaci_integrated_2017} and \cite{zakaria_car_2014}.
+Lastly free-floating carsharing works of \cite{weikl_practice_2015} and \cite{folkestad_optimal_2020} are going to be shown.
+
+% \toignore{
+%     The ability to efficiently relocate cars is often heavily associated with the optimized usage of the operator's resources. Indeed, a vehicle sharing operator often hires only small team of workers to relocate the misplaced vehicles : if the operator is able to perfectly forecast the customer demand, a poor vehicle relocation management leads to a poor performance overall. Thus, there is the need to develop algorithms effectively telling which worker need to relocate a designated car to a specific location. This kind of relocation strategy done by the operator can be called \emph{operator-based relocations} and is presented in first.
+    
+% }
+
+\paragraph{Relocations for Bikesharing.} Extensive research has been done for operator-based relocations in the context of bike-sharing.
+Like a carsharing service, a bike-sharing service is a transportation mean with vehicles, in this case bikes, freely available against monetary payment, usually in predefined stations.
+The users ride the rented bike and once their trip is finished, they lock the bike on a dock in order to let other users have the possibility to rent the shared bike.
+Since numerous bike-sharing services uses predefined stations within the city, the issue of demand and supply of bikes in each predefined station arise.
+Indeed if a person comes to rent a bike but find that none is available or if a person desires to end her bike trip but find no dock where to secure the rented bike, then the \emph{customer demand} is not met.
+
+The issue of demand and supply of bikes in a station is often associated with the rebalancing needed to be done by the service operator to keep each bike station with enough bikes to serve customers and enough free docks to let other customers park their rented bikes.
+For example, the authors of~\cite{chen_dynamic_2016} propose to regroup stations of the service into clusters so stations with the same behavior are regrouped into the same cluster.
+Then, the authors propose to predict the probability of an over-demand for bikes or under-supply of free docks, i.e whether there will not be enough bikes in a station or too many bikes in the station.
+This study provides the knowledge that the bike demand is influenced by time and weather features, including the temperature, as well as other less frequent occurrences of traffic or social events.
+This is further acknowledged by the authors of~\cite{tomaras_modeling_2018} who propose to consider the large city events that affect the distribution of bikes in the stations around the city.
+This exogenous information will also be important for a carsharing problem.
+Finally, the authors of~\cite{hulot_towards_2018} anticipate the over-demand/undersupply by training regression methods to predict the number of trips to be demanded in every station as well as the number of bikes to be returned.
+This allows them to detect whether a station needs bikes to be added or removed by the staff.
+
+By early detecting possible over-demand for bikes, the previously presented works suggest a first relocation strategy.
+However, knowledge about how many bikes should precisely be taken from which station to be delivered to stations in need of bikes could further improve the strategy. 
+The authors of~\cite{raviv_static_2013} present a method to directly create the route for two vehicle services to rebalance the bikes.
+They consider that the bike relocation is done exclusively when the service is the least busy, e.g., during the night.
+In the same way, the authors of~\cite{ghosh_dynamic_2017} and~\cite{ghosh_improving_2019} adopt an optimization formulation in order to address the imbalance in the number of bikes in stations and create the routes for the staff vehicles. 
+Compared to~\cite{raviv_static_2013} the authors add the support for relocation of bikes during the normal activity hours by globally anticipating the demand, with a probabilistic model of the demand.
+
+Even with the addition of a finer relocation process, e.g. by adding the areas from which bikes should be taken to which area they should be dropped off, the methodologies cannot be directly applied to the free-floating carsharing problem faced in this thesis.
+Indeed the assumption about the relocation capacity is different between bike-sharing and carsharing services.
+
+\paragraph{Relocations for Carsharing.} Unlike for bike-sharing services, the relocation process for carsharing service is constrained by physical limitations induced by the size of the vehicles.
+Indeed when a single staff member of a bike-sharing service can relocate a dozen bikes all at once by driving a truck with bikes inside, staff members of carsharing services need to drive each vehicle to relocate.
+This adds an additional overhead as at the end of the relocation, the staff members have potentially no other means of transportation to reach the next cars to relocate, whereas in bike-sharing relocations the staff member would always use his truck.
+Thus the general concept of relocation is the same for bike-sharing and carsharing services, but the relocation processes from the bike-sharing literature cannot be directly applied to carsharing services.
+
+For station-based carsharing services, in the article~\cite{guo_vehicle_2020}, the authors seek to rebalance a fleet of electric vehicles in order to minimize the waiting time of customers queuing at each \textit{passenger stations}.
+This methodology is split into two parts to take into account the imbalance of the demand and the need for recharge of the electric vehicles, made exclusively in \textit{charge stations} that customers cannot access. 
+The first part concerns the relocation of vehicles between \textit{passenger stations} so that no station is in shortage or oversupply of cars.
+The authors proposes a nonlinear integer programming formulation to find the number of cars to move from one station to another.
+The second part is about the movement of vehicles by the staff of discharged vehicles from \textit{passenger stations} to \textit{charge stations} or the inverse for fully charged vehicles present in \textit{charge stations}.
+This part is divided into two sub-parts, first the computation of the charging schedule for discharged electric vehicles, from \textit{passenger stations} to \textit{charge stations}, and second the placement of fully charged vehicles, from \textit{charge stations} to \textit{passenger stations}.
+To find the solution to the scheduling of vehicle recharging the authors proposes a nonlinear integer formulation and a mixed-integer linear formulation for the placement of fully charged vehicles back in \textit{passenger stations}.
+It should be noted that both rebalancing problems are in real time, i.e. the relocations can be done at any time of the day.
+
+In~\cite{boyaci_integrated_2017}, the objective of the authors is to rebalance electric vehicles too.
+However contrary to~\cite{guo_vehicle_2020} an additional information about the staff responsible for each relocation is given by the authors' model. 
+Furthermore, stations of the service are not split into two categories as each station can both serve customers with vehicles and recharge them, the recharging problem is not addressed in this case.
+By using a hierarchical optimization formulation, i.e. an optimization formulation that has multiple objective function with a hierarchical order of importance, the authors seek to improve the usage rate of the vehicle first and to reduce the cost of relocations in second.
+In addition to this optimization formulation, the authors developed a simulator in order to confirm that the proposed relocation plan returned by the solver can be done with the electric charge level of the vehicles before the relocation.
+If not, constraints are added to force vehicles to stay in a station and recharge and the optimization is launched again.
+Thus this approach addresses both the relocation of vehicles and the trips made by the jockeys, i.e. a staff member making relocations, in order to relocate the vehicles.
+However this is done for a service that can relocate the cars during the whole day, as opposed to the operator of the services provided by \textit{Free2Move} which has an additional constraint of being able to relocate cars only once per day.
+
+Another approach in~\cite{zakaria_car_2014} studies the effectiveness of a greedy algorithm in the relocation of vehicles for station-based carsharing services.
+Unlike previous articles, the recharging of electric vehicles is not taken into account and the objective of the authors is to elaborate a greedy algorithm to replace the usage of exact solvers for the \textit{ILP} proposed.
+This \textit{ILP} formulation seeks to minimize the number of rejected demands and the number of relocations made by the staff to do in order to reach this minimum of rejections.
+To do so, the service is modeled as a directed graph in which nodes represent a given station at a given time.
+In this graph, each arc between two nodes denotes a trip between a station at the departure time to another station at the arrival time of this trip.
+This representation makes an explicit \textit{Integer Linear Programming} formulation to be given to an exact solver to get an optimal solution.
+However, the authors observe during their experiments that the time needed for the exact solver to find a solution can take several hours, for a service that would consist of 50 stations with 10 parking lot and 6 trips per car, starting from only 4 jockeys.
+Moreover, the greedy algorithm developed by the authors consistently runs under a second, with the same material configuration, while keeping the number of rejected demands comparable to the optimal solution found by the exact solver.
+Thus for the current thesis while the formulation cannot be taken as is because its main hypothesis is that the demand is known beforehand, the need to have a greedy algorithm to find a solution should be taken into account in the thesis experiments.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.7\textwidth]{figure/ch1_reloc_weikl.jpeg}
+        \caption[Operator-based Relocation Example]{Example (from~\cite{weikl_practice_2015}) of relocations proposed by the methodology~\cite{weikl_practice_2015}. Two macro-zones are shown in dashed lines and the micro-zones are the hexagons. Red hexagons are considered as hotspots, with a high demand, while blue hexagons are cold spots, with a low demand. Green squares are vehicles selected for maintenance, e.g for recharging or refueling. Arrows show relocations between micro-zones either from two different macro-zones or from the same one.}
+        \label{fig:ch1_reloc_weikl}
+\end{figure}
+
+Using methods designed for station-based services can lead to different results on free-floating carsharing services.
+Indeed, as seen previously in Section~\ref{sec:user_behavior} the customer usage of services depends on the type of service, whether it's a station-based or free-floating one.
+Effects like the demand for cars for commute trips might lead to make station-based methods not optimal for free-floating carsharing services.
+
+Thus dedicated relocations methods have been developed specifically for free-floating carsharing services.
+For example, in~\cite{weikl_practice_2015}, the authors present a methodology developed for a carsharing operator in Munich.
+In this article, the authors propose an online five-step methodology that is supposed to monitor the distribution of cars continuously to keep it as close as possible to an ideal car distribution in the city.
+A preliminary step made before the five other steps is the discretization of the geographical area serviced.
+This discretization is first made by clustering the GPS points of trips departure into ten virtual stations, the objective is to create a small number of areas, also called \textit{macro-zones}, to be used for a later optimization step.
+Since each resulting macro-zone is wide, an additional grid consisting of hexagons, also called \textit{micro-zones}, is applied to represent the maximum distance a user is willing to travel to reach a car.
+Thus the operator is able to place a given number of cars inside the macro-zone by distributing them within the grid of micro-zones.
+Furthermore at the end of this preliminary step, the authors have discretized the geographical area serviced by the operator.
+The example of discretization given by the authors is illustrated in Figure~\ref{fig:ch1_cells_district}.
+
+After this preliminary step, the methodology is applied and its first step is run.
+The objective of the first step is to analyze historical data in order to categorize each micro-zone as either a hot spot, i.e. a zone with a short car idle time and a lower than average number of vehicles, a cold spot, i.e. a zone with a long car idle time and a higher than average number of vehicles, or a neutral spot, as shown in Figure~\ref{fig:ch1_cells_district}.
+Note that this categorization is done for every time period of the day, knowing that the day is split into periods of 3 hours.
+Then knowing a period of time in which relocations should be made, the second step computes the lower and upper bounds of the number of vehicles that should be present inside each macro-zone based on the historical number of reservations.
+Depending on the number of cars present in each macro-zone for the given time period, this step makes relocations of vehicles between macro-zones through an optimization formulation and based on the additional profit made by relocating the cars.
+Then the third step is run and take the relocations that have been proposed in step two in order to precise from which micro-zone to which micro-zone the relocation should be made.
+The vehicles are individually selected at this step and the selection is based on the list of high-priority vehicles depending on their idle time or the need for recharging.
+Furthermore, depending on the status of the car the relocation is potentially linked with a maintenance trips, such as plugging the selected vehicle in an EV station.
+Then the fourth step rebalance internally each macro-zone, with the aim to relocate cars from cold spots to hotspots.
+Unlike for step two, this is done based on operator-decided rules rather than an optimization formulation, with the objective of relocating an operator-decided maximum number of cars to take most idle cars in a macro-zone to the hotspots of the same macro-zone.
+It should be noted that macro-zones are rebalanced internally only if it is visited by a relocation from step two, i.e. a staff member is actually moving to this macro-zone.
+Finally, the last step is about the remaining maintenance trips that could not be associated with a relocation trip, e.g. for an electric car with a battery too low.
+An example of output given by this methodology is shown in Figure~\ref{fig:ch1_reloc_weikl} on a subspace of the service.
+This methodology has been tested on a real service in Munich during the months of October 2013, February 2014 and May 2014 and shows an increased in profit made by the serviced of 6\%.
+While this methodology is well designed for the operator running its service in Munich with its own constraints and capabilities, this methodology cannot be taken off the shelf to be directly applied and to leverage the capabilities of \textit{Free2Move} services, notably in the service located in Madrid.
+Indeed in the case of \textit{Free2Move}, the operator is capable of relocating the cars only during the night and EV charging stations of the city cannot be used, meaning that the operator has only one station available at its local headquarters from which all fully charged cars leave and all discharged cars arrive.
+Thus the constraints of this operator does not fit well the hypothesis of the presented methodology.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.5\textwidth]{figure/ch1_weikl_practice.jpeg}
+        \caption[Weikl Methodology Steps]{Summary of the methodology proposed in~\cite{weikl_practice_2015}. The methodology has one preliminary step to define the macro-zones and micro-zones used for the relocation of cars. Then all other steps are applied one after another for each period of three hours to generate the relocations to do by the operator. This process is repeated for each period in the day, such that the service is continuously rebalanced by the operator.}
+        \label{fig:ch1_weikl_practice}
+\end{figure}
+
+A different approach is made by the authors of~\cite{folkestad_optimal_2020} to improve a free-floating carsharing service.
+The objective of the authors is to optimize the recharging of electric vehicles, in order to both recharge them and relocate them in the same time.
+The main idea is to relocate only the car with a low battery state, i.e. those that cannot be used by the customers, to charging stations all around the city and not only to the nearest charging station.
+Furthermore, the authors make the hypothesis that the carsharing operator has vehicles dedicated to the staff members to move them around the city, e.g. to drop them near a battery depleted car or take a staff member who had finished a relocation job.
+Thus the methodology could propose to move a low battery vehicle to a station located far away if it allows the jockey to be taken by a staff vehicle right after.
+To do this, the authors propose a \textit{Mixed Integer Linear Programming} (\textit{MILP}) formulation to make the relocations while taking into account the route of the jockeys.
+As commercial \textit{MILP} solvers cannot solve instances of this problem, due to the complexity of managing both the route of jockeys and the route of staff vehicles, the authors propose a genetic algorithm to find a near-optimal solution.
+The experiments made show that the commercial solver used by the authors fails to find the optimal recharging and relocation for instances of eight vehicles with three staff vehicles and eight jockeys.
+Moreover, the algorithm proposed by the authors can solve instances with more cars, from 60 up to 200, and more staff vehicles and jockeys while keeping a computation performance under the hour.
+However this methodology does not fit the problem currently faced in the service in Madrid of \textit{Free2Move}.
+Indeed the authors only consider the relocation of cars with a low battery state, meaning that the cars to relocate can be picked up anywhere in the city but dropped only in charging stations or in their immediate vicinity once fully charged.
+But in the case of \textit{Free2Move}, the operator possess only a single charging station, thus optimization of car recharging location cannot be done.
+Moreover, the authors consider only the relocation problem with the strong hypothesis that all the customer demand is known in advance, which is not the case of \textit{Free2Move}'s service.
+
+\paragraph{Conclusion.} Fleet rebalance methods have been proposed for both bike-sharing and carsharing services.
+Bike-sharing focused methods cannot be directly applied to carsharing methods even if the global service can be similar, first because most works are designed around station-based services whose customer use cases are different than free-floating carsharing customers use cases.
+Second because constraints for bike-sharing are not strictly similar to those encountered in carsharing services, such as each vehicle as to be separately relocated by one staff member.
+Moreover methods from the station-based carsharing literature considered a low number of stations in their service.
+As shown later in this thesis, the number of ``virtual'' stations that should be made for free-floating carsharing make those methodology not scalable.
+In all cases, none considered the placement of vehicles for the next morning as the only way to address the fleet imbalance.
+Indeed, most methods table on ``online'' relocations meaning that those relocations are made throughout during the day, even if this might enter in conflict with customer usage since customer usage is more important during the day than during the night.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Transportation Simulation}
+
+The evaluation of either user-based or operator-based relocations by the service itself is limited by costs and by the difficulty to make the test each time a new strategy has to be tested.
+Indeed the operator of the service needs to redesign internal tools to incorporate the methodology to evaluate, not knowing if it will actually improve the service, and the operator needs to alter either how the service works for customers for user-based relocations or how the staff is managed to do the relocations in the other case.
+Furthermore in all cases, the operator needs to be able to take financial loss if the methodology has a poor performance.
+Thus, the practical evaluation of the methodology is capable of returning results to confront the methodology against the reality but needs resources to be done.
+It is in this context that the usage of simulators to emulate the functioning of real carsharing services has been developed.
+
+The simulation of carsharing services can be carried out by two different approaches.
+The first can be done without an a priori knowledge of the carsharing service itself, but requires to model the city and its inhabitants.
+This approach is based on multi-agent simulators in which the objective is to observe how each agent, e.g city dwellers, reacts to the presence of a carsharing service as an alternative to other public transportation systems or private vehicles.
+A second approach exists and models the usage of the service not as the product of agents usage but as a chain of events that modify the state of the carsharing service itself.
+In this second approach, less contextual data about the city or its inhabitants is needed to obtain a usable carsharing simulator.
+However the calibration of the simulator needs historical data about the carsharing service to choose which events and how many to take into accounts.
+Both approaches are presented here.
+
+\toignore{
+    Constraints:
+    \begin{itemize}
+        \item We don't have contextual datasets, e.g., fine population count.
+        \item There is little data, at most 300 entries.
+    \end{itemize}
+
+    Finally, the last part of the bibliographic review shows the current state of art about urban mobility simulator. 
+    Since the aim of the internship is to optimize a carsharing service, it is mandatory to be able to compare which relocation strategies are good candidates for the free floating carsharing type of service.
+
+    The simulation of car-sharing services can approached by two different types of methodologies. 
+    First, it is possible to use an event-based approach. In this case a series of events are prepared in advance and ordered chronologically to be applied to update an internal state. 
+
+    A second approach consist in creating a model of the simulated environment and add agents with their own objectives and internal rules. Agent-based simulations work by making its agent react to their needs and the modeled environment. In~\cite{ciari_modeling_2016}, the authors model the transportation infrastructure of Berlin, e.g., roads and public transportation tracks and stops, and its inhabitants. Parts of the population is chosen to be able to use a car-sharing service by being members. In this work, the authors don't need to explain how the car-sharing trips have been generated beforehand since all agents decide on-the-fly which transportation mode is used.
+
+    Furthermore we lack data about contextual information, notably precise data about population residency, workplace and leisure places, precise spatial distribution of car owners in the city and so on. Thus works using such information to either directly generate city trips~\cite{goulias_recursive_1991}~\info{Il y a pas vraiment besoin de garder cette citation là.} or generate accurate agents to be used in multi-agent based simulation~\cite{ciari_modeling_2016} cannot be applicable in our current case.
+}
+
+\subsection{Multi-agent Based Simulation}
+
+Multi-agent simulations are used in transportation research in order to simulate the behavior of individuals facing the need to move from a spatial point to another.
+Different modes of transportation are offered such as buses, planes, trains, bikes or cars, with their own infrastructure such as bus/train lines, bus/train stops, city streets, highways or airports.
+By modeling as realistically as possible the real environment in which each agent will evolve, the aim is to compare the behavior of agents, e.g. people, when they need to plan their trips according to the transportation offers.
+Thus by changing parameters of the transportation system, such as adding bus lines or adding roads, the behavior of the agents changes accordingly and help to design better transportation infrastructure.
+With such simulators, the impact of adding a carsharing system can be studied as well as how the number of cars or the deserved areas affect the customer behavior.
+Furthermore different relocation strategies and how are placed the cars of the system can be simulated and their impacts studied.
+
+\paragraph{SimMobility.} The first example of multi-agent simulator is \textit{SimMobility}~\cite{adnan_simmobility_2016,azevedo_simmobility_2017} which has been developed as a tool to understand the evolution of a city through time depending on its transportation infrastructure. This simulator is split into three scales of simulation, each scale receiving and feeding information to another.
+
+The first and short-term scale simulates the behavior of the car traffic, its interaction with possible buses, bikes and pedestrians.
+This scale describes in detail the movements of the vehicles like lane changes, vehicles behaviors in intersections and so on.
+The aim of this short-term scale is to retrieve a realistic travel time needed for each simulated trip done by a person, for example to go from a building to another.
+Thus it is possible to simulate the details about the time needed for a person to travel from a certain building to another one in the city.
+The aim is to know how long it takes to travel through a given path at a certain hour of the day, e.g. for commute trips in the morning or evening, notable to be able to decide which path or what kind of transport to take.
+
+The second scale is a medium-term one : it simulates the schedule of each inhabitant in order to motivates the trips made during the day.
+For example, if the simulated person is an office worker, the simulator decides at what time he needs to leave home to go to work and which path and transportation mode to take in order to be quickly at work.
+The simulator could decide that the person then needs to move from the office at the end of the day to the nearest shop to buy some groceries before going home.
+This medium-term scale highly relies on the short-term scale to compare the time needed to travel a path in the city for each trip. 
+All the trips created in this medium-term scale are used by the short-term scale simulation to simulate the traffic and the reasons behind the traffic, i.e. each car has a point of spatiotemporal point of departure and a given desired arrival.
+
+The last scale is a long-term scale which simulates the city real estate dynamics and the vehicle ownership dynamics.
+The aim of this scale is to simulate the change of home or office locations of the city inhabitants, depending on the inhabitants needs and financial capabilities, e.g if a worker's home is too far away from his workplace he might want to move to a nearer house. 
+Thus, this scale needs the average schedule and the average time used to move in the city of each person to decide if this person needs either to buy a car or not, or to change for another job.
+Furthermore, it gives to the medium-term scale essential information about each inhabitant such as where he lives or works or if he owns a car or need to take the public transportation system.
+
+All those interactions between each scale is summarized in the Figure~\ref{fig:ch1_sim_mobility}, where all the information exchanged between each part of the simulation is represented.
+As is, the proposed simulator is not related to any carsharing service simulation since it only simulates the city's inhabitants trips dynamics.
+But it is possible to add a carsharing service to the simulator and offer it as a possible transportation system for the city dwellers.
+
+\begin{figure}[!ht]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.4\textwidth]{figure/ch1_simu_simmob.jpeg}}
+        \caption{Diagram from~\cite{adnan_simmobility_2016} showing the interactions between each scales in the simulator framework. The long-term scale gives the location of the inhabitants and their jobs to the medium scale while receiving information about the time needed for a person to do the daily trip from his home to his job. The short scale receives from the medium scale the path the inhabitant has to take and gives back to the medium scale the time needed to do this path.}
+        \label{fig:ch1_sim_mobility}
+\end{figure}
+
+Indeed as studied by the authors of the article~\cite{marczuk_autonomous_2015}, an application of this simulator is presented for the city of Singapore in which the inhabitants are simulated.
+A carsharing service is added to the city in order to evaluate its effectiveness, but with several conditions:
+\begin{itemize}
+    \item The service is only working in the Central Business District of Singapore.
+    \item Private car traffic is forbidden in the Central Business District, but the public transports and the taxis are allowed to freely run in this district.
+    \item The carsharing service is run with \textit{autonomous} vehicles, meaning that the constraint of relocating the vehicles with jockeys is avoided. The vehicles moves without the need to have a human driver.
+\end{itemize}
+In this study, two types of carsharing are tested to compare which is the most efficient in this configuration.
+The first type is station-based, where vehicles always have to park to the nearest stations after a trip was completed. The stations have no limit on the number of parking lots.
+The second type is the free-floating type, where vehicles starts at the same stations, but without the need to get back to any station after a trip. It is sufficient for the car to be parked on the nearest standard parking lot.
+In this setting, the authors show that the free-floating type of carsharing performed better than the station-based service in terms of the number of trips serviced.
+The explanation given is based on the fact that cars of the station-based service needed to get back to a station before serving the next trip, thus leading to many more empty vehicles on the road and increasing the overall street congestion.
+
+\textit{SimMobility} provides a detailed simulation of how inhabitants reacts to the transportation offers in a city, such as shown by~\cite{marczuk_autonomous_2015}, and allow to study the impact of a new (autonomous) carsharing system.
+Relocation strategies could be studied in the case of free-floating carsharing, such ash the number of trips serviced or the profitability of the service.
+However this kind of simulator needs a lot of data to be correctly calibrated and thus to be as realistic as possible.
+Indeed the city roads, the building types and capacity, the public transportation system, current inhabitants detailed statistics and many other parameters are needed as exogenous data to do the calibration of the simulator.
+Thus in practice for the current thesis, this methodology is not usable because of the lack of such exogenous data.
+
+\paragraph{MATSim.} If one does not need to simulate the housing and workplace expectations from city inhabitants, MATSim~\cite{axhausen_multi_2016} is an available second state-of-the-art approach to run multi-agent simulation to study how inhabitants uses the transportation infrastructure.
+Instead of simulating housing and workplace expectations of each inhabitant according to their daily lives, MATSim takes as an input demand data about the population in the form of a place of residency and chains of activities to do in the city for each simulated agent, i.e a person.
+To do this, MATSim is split into five submodules with three main modules searching for the route each agent should take to do their chain of activities, as shown in Figure~\ref{fig:ch1_matsim}.
+
+\begin{figure}[!ht]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch1_matsim.jpg}}
+        \caption[MATSim modules]{Diagram showing how the modules of MATSim interact to simulate the inhabitants and their daily activities. Five modules are necessary to run the simulation, with three used in a main simulation loop to find through co-evolutionary method the optimal way to do each activity the chain of activity for each agent.}
+        \label{fig:ch1_matsim}
+\end{figure}
+
+The first module \textit{Demand Initialization} is about the modeling of the data provided to MATSim to fit the simulator.
+Chains of activity are defined by this module as well as the agents.
+A chain of activity is defined by the ideal time of departure or arrival in each point of interest (\textit{POI}), such as \textit{Home} or \textit{Shop}, visited by the agent in the city, with additional information about the means of transportation used to go from \textit{POI} to \textit{POI}.
+Each agent is capable of memorizing up to a user-defined number of variations of the chain of activity with an associated score to reflect how optimal this particular chain of activity is for its owner.
+
+The second module \textit{Mobility Simulation} simulate the actual movement of agents, with the aim to compute the time needed for each agent to reach  their destination knowing their departure location.
+Each agent chooses from its memory the best plan to be used during the simulation, i.e. the chain of activity with the highest objective score.
+An agent can be either the driver of its vehicle, e.g. a driver of a private car, or be the passenger of a vehicle, e.g. a bus passenger, and each agent interacts with the other agents to share the transport infrastructure, e.g. the road or the number of available seats in a bus.
+
+In the third module named \textit{Agent's Scoring}, each agent evaluates and updates the objective score of its trips and how well they allowed the agent to respect the initial chain of activity.
+It should be noted that each agent seeks to maximize the score of its own chain of activity.
+The score is computed according to observations between the actual chain of activity and the ideal chain of activity provided at the beginning, such observations as the waiting time (e.g arrived too early), or that activities are performed at the ideal time and with the ideal duration.
+This leads to have a simulation with a co-evolutionary approach in which each agent optimize its objective score without the aim to optimize a global score over all the other agents.
+
+The fourth module \textit{Agent Activities Replanning} concerns 10\% of the agent, by default, who are able to copy the chain of activity previously simulated and modify the copy to be simulated afterward.
+The modification can be done only on the departure time from a \textit{POI}, the route taken from a \textit{POI} to the next one, the mode of transportation used between two \textit{POI}s and the destination.
+While the time of departure and the mode of transportation are random based modifications, the route can be modified by the agent to be the best route possible knowing the traffic conditions of the last simulation.
+
+Finally, the fifth module \textit{Analysis} exposes the solution found by the simulator with analyses about them, such as the average score of each agent through the rounds.
+This fifth module is reached only when the user-defined number of simulation loops are done or when the average score of each agent during the simulation has converged.
+From this module the analysis of carsharing service usage is possible.
+
+This simulator has been used to study carsharing services as in~\cite{ciari_modeling_2016}. 
+In this study the authors present a summary on how to model different types of carsharing services in \textit{MATSim}, either station-based (round-trip or one-way) or free-floating, with notably the management of members’ subscriptions.
+In other to check whether the simulator is realistic, the authors make a comparison between the rental data of a round-trip carsharing operator in Zurich and the simulated counterpart with all the exogenous data needed to represent the city and its inhabitants.
+The comparison shows that with a simulator calibrated with traffic count data from this city, both the distribution of rental length and the distribution of carsharing trips start time of the simulation matched the distributions observed in real.
+In the study~\cite{ciari_modelling_2015}, \textit{MATSim} is used on the same simulated city and round-trip carsharing service to evaluate the impact of using different pricing strategies and observe how is affected the spatial repartition of the trips as well as their purposes.
+
+However even if \textit{MATSim} can be used as an accurate tool to simulate the utilization rate of carsharing services, several design constraints make it unable to be used in the current thesis.
+Indeed census data about inhabitants residency and workplaces are needed to build the chain of activity previously described.
+Furthermore, the simulator itself is used to simulate one day at a time: relocations acting on a weekly term cannot be accurately simulated without major modifications on the simulator or without using a computing intensive workaround to simulate all days at once.
+
+\subsection{Event Based Simulation}
+
+\textit{Discrete Event-based Simulation} (\textit{DES}) is a type of simulation in which the system is affected by a chain of events that modify the internal state of the simulator.
+Each event to be simulated has a timestamp attached, so that the chain of events correspond to a temporal order of events to resolve.
+During the simulation, the events are simulated one after another until there is either no event or no time left to simulate.
+Furthermore, each event changes the state of the simulation in a discrete manner, meaning that time skip can occur between two events since the simulation state does not change between two time-consecutive events.
+In this case the \textit{DES} is a \textit{next-event time progression}.
+Indeed the first event of the chain is simulated, then the internal clock of the simulation is set to the next event to simulate before simulating it.
+If time is incremented by fixed time slices, e.g. one minute, instead of jumping from event to event, with events within the time slice been resolved by the simulator, then the \textit{DES} is called a \textit{fixed-increment time progression}.
+The flow of event resolution is summarized in Figure~\ref{fig:ch1_des}.
+It should be noted that some agent-based simulations are \textit{DES}, indeed the mesoscopic simulation of the vehicles of \textit{MATSim} is based on chains of events such as a car entering a road or leaving it.
+However other agent-based models are not \textit{DES} because time skips cannot happen due to the physical phenomenons to simulate, such as the microscopic simulation of travel time on roads from \textit{SimMobility} which is done to replicate the physics involved in the vehicles driving on the road, e.g. accelerating or breaking, and interacting with others, i.e traffic dynamics.
+
+\begin{figure}[!ht]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch1_des.jpg}}
+        \caption[\textit{DES} Event Resolve Flow]{Execution flow of \textit{Discrete Event-Based Simulation} (\textit{DES}) depending on whether it is a \textit{next-event time progression} (top) or a \textit{fixed-increment time progression} (bottom). In all cases, the events are in a time-ordered stack and drawn from the earliest to the latest event to resolve. In the \textit{next-event time progression}, a timestamp is kept to resolve the events, the next event's timestamp is set as the current timestamp before resolving the given event. This is repeated for all the events until no events are left in the stack. In the \textit{fixed-increment time progression} a period is initialized as the start, all events in this period are resolved simultaneously and the period is incremented after. This is repeated until there is no event left in the stack.}
+        \label{fig:ch1_des}
+\end{figure}
+
+Transportation simulation uses \textit{DES} and one example of \textit{DES} utilization is the article~\cite{cats_mesoscopic_2010} about the simulation of a public bus system in Tel Aviv.
+In this work, the authors use a simulator called \textit{Mezzo} which simulates vehicles and roads on mesoscopic scale, for example roads are represented as two parts: the first one listing moving vehicles and the second part listing the queued vehicles waiting to leave the road while taking its source at the running traffic from the first part.
+In the case of public buses and with this scale of representation, important phenomenons such as traffic jams and bus bunching are kept.
+The authors also defines the events linked to the public bus behavior on top of the ones necessary for traffic simulation, such as \textquote{Enter Bus Stop} or \textquote{Trip Departure} to take respectively into account a bus arriving at a bus stop to let people enter/leave the bus and a bus that begin its trip on a given line.
+Even if the bus simulation developed by the authors cannot be used by itself for the simulation of a free-floating carsharing simulator, this example of \textit{DES} is one example of the state-of-the-art about the design of a \textit{DES} and the nature and orders of events to take into account if a \textit{DES} has to be designed for a transportation simulator specifically for carsharing.
+
+Some carsharing works presented in Section~\ref{sec:share_vehicle_service_improvements} have developed their own \textit{Discrete Event-Based Simulation}, usually on a macroscopic level. 
+For example, to evaluate their relocation and recharge methodology, the authors of~\cite{boyaci_integrated_2017} uses a \textit{DES} developed by the authors of~\cite{repoux_simulation_2014}.
+In~\cite{repoux_simulation_2014}, the authors developed a \textit{DES} to simulate a station-based, one-way, carsharing service with its customers’ reservations, either spontaneous or in advance, and staff-based relocations.
+These actions on the system are modeled by events both for customer-based trips, such as \textquote{Rental Start} or \textquote{Rental End}, and for staff-based relocation events, such as \textquote{Personnel Availability Start} or \textquote{Relocation End}.
+It should be noted that the customers-based events and the staff-based relocation events are fed to the simulator as parameters.
+Thus contrary to previously mentioned simulators, to feed this simulator with rental events a user needs to have a historical dataset of the customers trips.
+Indeed the creation of the rental events, notably \textquote{Rental Request}, is done before the simulator is run unlike agent-based simulators such as \textit{SimMobility} or \textit{MATSim}.
+Furthermore, the status for stations and their parking lots as well as for each electric vehicle are modeled in the internal state of the simulation.
+Thus the events act on those states and modify them, with for example an event \textquote{Rental Start} modifying the state of a car from \textquote{Available} to \textquote{Occupied} and the state of the parking lot where the car was located too.
+In this particular simulation, the authors of~\cite{repoux_simulation_2014} decided to loosen arrival constraints linked to station-based services by allowing customers to drop off vehicles near the station on extra parking lots, with these extra-spots unable to serve as charging points for electric vehicles.
+
+Another example of \textit{Discrete Event-Based Simulation} usage for carsharing is presented in~\cite{fassi_evaluation_2012}.
+The authors seek to select the best growth strategy for a station-based, round-trip, carsharing service, i.e. how the set of stations and their size should evolve in the service.
+Several possibilities are studied by the authors such as the increase of the station's capacity, the creation of new stations or the merging of several stations.
+The objective is to answer the demand growth with the modification of the carsharing service, while keeping a high customer satisfaction and minimizing the number of vehicles required for that.
+Several assumptions are made for the \textit{DES} designed by the authors, such as reservations possibly made six days in advance, the customer station selection linked to his distance to the station or the absence of distinction between weekday and weekend demand.
+Once the events representing the actions that customers can do in the system and the internal state are defined by the authors, with similarities to~\cite{repoux_simulation_2014} without the need for relocation events since the simulated service is round-trip, the authors define the three actions the carsharing operator can make.
+The \textquote{increment capacity of existing stations} is the first action a carsharing operator can do when the demand increase, affected by a demand growth input at the start of the simulation, and this action consist in adding additional car to the station.
+This action is usually done when the station is in a dense area or when it would be viable to add a new station close by.
+The second action is the \textquote{merging of station} and merge at least two stations into a unique station whose capacity is the sum of the merged stations.
+This action is taken by the operator when a close group of station each has a low utilization rate.
+Finally, the last action is the \textquote{new station implementation} which create a new station in the service with its size determined according to the demand in the area.
+This action is used when a new spatial demand emerges in a place not deserved by carsharing stations.
+These three actions are combined and creates a scenario of growth for the carsharing operator, with the objective to simulate the performance of different scenarios when simulating the same demand growth.
+This approach is tested on the case of \textit{Communauto}, a station-based service in Montreal.
+The authors evaluate four scenarios, after an initial data analysis phase to determine the current utilization and its expected growth in each region of the city.
+The first two scenarios are created from the data analysis made by the authors while the third scenario is based on actual service growth actions made by the operator \textit{Communauto}.
+The fourth scenario studies the actions made by \textit{Communauto} and actions added by the authors to counteract the negative effects observed for the third scenario.
+While the results of this study in themselves are not of interest in the scope of this thesis, the presentation of the \textit{DES} created by the authors, like for~\cite{repoux_simulation_2014}, is an accurate vision of how \textit{Discrete Event-Based Simulation} is used to study either relocation or growth strategy for carsharing services.
+
+\paragraph{Conclusion.} \textit{Discrete Event-based Simulation} is a type of simulation in which each action is modeled by events that modify the internal state of the simulation.
+This type of simulation can be used to simulate transportation modes and in particular carsharing services, notably when contextual data about the reasons why each customer individually wants to use the service lacks.
+Examples such as~\cite{fassi_evaluation_2012,repoux_simulation_2014,boyaci_integrated_2017} study actions that can be made by the operator of carsharing services such as relocations or station modifications to improve the utilization rate of the carsharing service.
+However the presented simulations have flaws about the generation of demand during the simulation; indeed the demand is fixed at the start of the simulation and does not adapt itself to a possible new state of the distribution of cars in the city: the demand is inflexible and constrained to historical data.
+Furthermore in those studies, the past trip data is considered de facto as a demand data.
+While past trip data is near the real demand data, the real demand data is often unobserved since in the presented cases stations can be empty, a demand cannot be satisfied and cannot be registered into the trip data, or can be full, a demand for \textit{A to B} can be mis-registered in the trip data as \textit{A to C} if station \textit{B} is full and the customer needs to find another station to drop off his car.
+These studies serves as the base for the design of a \textit{DES} specifically designed for free-floating carsharing and designed to reduce the flaws linked to the historical trips being considered as demand data.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Regression Algorithms}
+
+The contribution of this thesis, presented in Chapter~\ref{ch:method}, is based on the utilization of regression algorithms to predict the daily usage of cars.
+Regression is a type of \textit{Supervised Learning}, meaning that the algorithm will use the data present in a \textit{training set} in order to learn a model, i.e. a mathematical function, to predict the numerical value of the \textit{label}.
+Each label is associated with a set of \textit{features}, i.e. values of interest, present in the data.
+The aim is to learn a function ($\hat{f}$) as close as possible to a real (and unknown) function ($f$) with only a set of observations, the values $f(x)$ of label given a set of features' value $x$.
+This is done by tuning the internal parameters of the model used.
+For example, in the case of a simple linear regression $\hat{f}(x) = a \cdot x + b$ only $a$ and $b$ are the internal parameters of the model and their value need to be tuned so the trained function $\hat{f}$ is close to the real function $f$.
+This \textquote{closeness} is assessed by the use of a loss function ($\mathcal{L}$) measuring the distance between the learned function's prediction and the real observations in the \textit{training set}.
+For example, the \textit{Mean Squared Error} (\textit{MSE}) is a common loss function and is defined as 
+$$MSE = \mathcal{L}(f, \hat{f}) = \frac{1}{n} \sum^{n}_{i = 1} \left(f(x_i) - \hat{f}(x_i)\right)^{2}$$
+with $f$ being the real (unknown) function to model, with the approximate and learned function $\hat{f}$, with $n$ observations, with $x_i$ the $i^{th}$ set of features' value and with $f(x_i)$ the label of the $i^{th}$ observation.
+Thus, training the model means to find the best parameters of the models such that the loss function is minimized.
+It should be noted that if there is only one label to predict ($\hat{f} \in \mathbb{R}^n \mapsto \mathbb{R}$), then it is a \textit{single output regression} model, and if multiple labels are to be predicted at the same time ($\hat{f} \in \mathbb{R}^n \mapsto \mathbb{R}^p$), then it is a \textit{multi-output regression} model. 
+
+Once the model is trained with the \textit{training set}, the performance of the model is then evaluated with a set of data called the \textit{test set}.
+This different set of data is used to assess the prediction behavior of the model on data it has never encountered before.
+This is to detect if it has \textit{underfitted} the data, i.e. it does not have enough data to learn a precise enough generalization, or \textit{overfitted} the data, i.e. it has \textit{learned by heart} all the observations.
+For regression models, the most used evaluation functions are the \textit{Mean Absolute Error} (\textit{MAE} in Equation~(\ref{eq:ch1_mae})), the \textit{Mean Squared Error} (\textit{MSE} in Equation~(\ref{eq:ch1_mse})) and the \textit{Root Mean Squared Error} (\textit{RMSE} in Equation~(\ref{eq:ch1_rmse})).
+In the three equations, $n$ is the number of instances of the label to predict in the test set, $y_i$ is the true value of the $i^{th}$ instance of the label and $\hat{y}_i$ is the value predicted by the model of the $i^{th}$ instance of the label.
+Each function is used depending on the wanted interpretation of the error.
+\textit{MAE} is used to linearly measure the mis-estimation made by the model, for example an error of $4$ between the real value and the prediction counts for twice the error of $2$ in the final \textit{MAE} value.
+With \textit{MSE} the measure impact of an error is quadratic, for example an error of $4$ between the real value and the prediction counts 8 times more than an error of $2$ in the final \textit{MSE} value.
+This measure is used to give a high single error more importance than multiple small errors in the final measure.
+The measure \textit{RMSE} is the root square of the \textit{MSE}.
+In the next chapters, \textit{MSE} and \textit{RMSE} have not been chosen and only \textit{MAE} is kept because this measure is less sensitive to outliers in the errors.
+Indeed, higher errors due to outliers are less affected by the function \textit{absolute} than \textit{square}.
+
+\begin{equation}
+    \label{eq:ch1_mae}    
+    MAE = \frac{1}{n} \sum^{n}_{i = 1} \left|y_i - \hat{y}_i\right|
+\end{equation}
+
+\begin{equation}
+    \label{eq:ch1_mse}    
+    MSE = \frac{1}{n} \sum^{n}_{i = 1} \left(y_i - \hat{y}_i\right)^2
+\end{equation}
+
+\begin{equation}
+    \label{eq:ch1_rmse}    
+    RMSE = \sqrt{\frac{1}{n} \sum^{n}_{i = 1} \left(y_i - \hat{y}_i\right)^2}
+\end{equation}
+
+For models having hyperparameters, i.e. parameters controlling the training process of the model, a \textit{validation set} different from the test set can be used, like a test set but with the only objective, to find the best hyperparameters. 
+The best hyperparameters are the ones maximizing the performance of the model in the prediction of labels from the features with both from the validation set.
+
+\begin{figure}[!ht]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.6\textwidth]{figure/ch1_decisiontree.jpg}}
+        \caption[Example of Decision Tree]{An example of a \textit{Decision Tree}. The problem is to predict the number of inhabitants in a city district. Square nodes are decision nodes with their criterion and circles are the leaves with their returned value. This Decision Tree needs to know two \textit{features} of the district to predict the number of inhabitants, the type of the area, e.g. residential, and the number of bus stations. For example, if the district is a residential area with only two bus stations, then the Decision Tree will predict that $1205$ inhabitant live in this district.}
+        \label{fig:ch1_decisiontree}
+\end{figure}
+
+\paragraph{Gradient Boosting Decision Trees.} The first regression algorithm that is going to be used in the Chapter~\ref{ch:method} is the \textit{Gradient Boosting Decision Tree} in its regression variant.
+It is an \textit{ensemble method}, meaning that it combines multiple occurrences of the same type of simple regression model in order to create a model more accurate than one occurrence of this simple regression model alone.
+In the case of the \textit{Gradient Boosting Decision Tree}, an ensemble of \textit{Decision Tree} is made.
+A \textit{Decision Tree} is a machine learning model based on a binary tree in which each leaf is a numerical value in the case of a regression.
+All other nodes, including the root node, is a decision node with a criterion based on the data features to decide whether to select the next node on the left or right.
+If the next node is a decision node the same action is applied, if the next node is a leaf then its value is returned as the prediction.
+It is during the training of the model that the Decision Tree is created.
+Each decision node is created by taking the features and labels assigned to the node and finding the best test to separate all the values of the label, for the root node all the training set is assigned, while for the remaining decision node it is only the data answering (or not) to the test.
+The test is created such that the entropy of the data fitting the test on both sides, test true or false, is as low as possible, i.e. the data regrouped for the creation of the children decision node (or leaf) is as homogeneous as possible.
+If the maximum tree depth is reached, or not criterion can be found, a leaf is created by taking the mean value of the label (in the case of the regression variant).
+In the Figure~\ref{fig:ch1_decisiontree} a mock-up decision tree of depth three is represented as an example of the prediction of the inhabitant number living in a district according to the features \textquote{area} and \textquote{nb\_bus\_station} of the district.
+One can note that the tree is not required to be balanced, one branch is shorter than the others.
+In this example only the type of area and the number of bus stations are necessary to predict the number of inhabitants.
+
+\textit{Gradient Boosting Decision Tree} is a set of \textit{Decision Trees} that are trained one after the other.
+The first Decision Tree is trained with the data from the training set like detailed previously, and then the following Decision Trees are trained to take into account the error made by its precedents Decision Treed trained.
+This means the data given to each subsequent Decision Tree for its training is altered by how well the preceding Decision Trees had trained on the data from its training set.
+When used, the first tree of the ensemble of tree is given the value of the features to predict the value of the label.
+Then the second tree is given the same features' value to predict the \textit{correction} to apply to the value of the label predicted by the first tree.
+This operation is repeated until all the tree of the ensemble have been used.
+As explained previously, the main advantage of this approach is the increase in the accuracy of the ensemble model.
+However this regression model needs more computational power to be trained and used for the prediction.
+
+\paragraph{Support Vector Regressor.} The second regression algorithm used in the Chapter~\ref{ch:method} is the \textit{Support Vector Regressor} (\textit{SVR}).
+Instead of creating binary trees which result to split the dimension of features into subspaces, the Support Vector Regressor algorithm aims to find a plane to describe the data.
+It is inspired by the \textit{Support Vector Machine} (\textit{SVM}) which is its counterpart in classification, i.e. to predict categorical labels instead of numerical labels.
+The \textit{Support Vector Machine} can either have a linear or a nonlinear kernel.
+In the case of a linear kernel, the algorithm seek a plane to separate the points according to their features such that one subspace of the feature space correspond to a label and the other subspace to the other label.
+The plane is chosen so the margin between the plane and the data points of each type of label is symmetric and as wide as possible.
+In the case of a nonlinear kernel, the plane is chosen in a higher dimensionality then the space of features, creating a \textit{hyperplane} to separate the data points of the training set.
+
+In the case of the \textit{Support Vector Regressor}, instead of separating the space of features to have two distinct subspaces, one for each categorical label, the algorithm seek a plane to cover the most data points within the user-given symmetric margins.
+The objective is to take into account as many data points as possible to create the plane whose function will serve as the function to make the prediction.
+Like the Support Vector Machine, a linear or a nonlinear kernel can be used depending on the complexity of the phenomenon to model.
+In the first case, the plane will be in the same dimension as the feature space while in the second case like for the SVM a hyperplane will be learned from the data.
+For both SVM and SVR models, categorical features, such as the type of area in the previous example, cannot be used as is since the space of features needs to be $\mathbb{R}^n$, with $n$ the number of features.
+In order to use those categorical features with SVM and SVR, they can be cast into multiple numerical features in a one-hot-encoding fashion.
+Thus for a categorical feature, each possible value of this feature will be transformed into one numerical feature being either $1$ if the categorical feature has this value else $0$.
+For example, if in the previous example there is three types of \textquote{area}, e.g \textquote{residential}, \textquote{commercial} and \textquote{office}, then three numerical features are created \textquote{area\_residential}, \textquote{area\_commercial} and \textquote{area\_office}. 
+If a district is a residential area then the feature \textquote{area\_residential} is $1$ while the remaining two are $0$.
+
+\paragraph{Conclusion.} Regression algorithms seeks to create a model capable of predicting a numerical label given a set of features and their value.
+The objective is to model a real phenomenon whose exact mathematical function is unknown, e.g. the number of inhabitants in a district according to the type of district and its public transportation offer.
+To create the model, a dataset is necessary and split into three subsets, one for the \textit{training} of the model, one for the \textit{validation} of the model's hyperparameters and one to \textit{test} the performance of the model.
+Two state-of-the art regression models are the \textit{Gradient Boosting Decision Tree} and the \textit{Support Vector Regression}, both model the label's values according to the space feature by different means, the first by creating subspace answering criteria while the second create a (hyper)plane to model the data points.
+Other regression models have been tested (such as \textit{k-nearest neighbors} or \textit{multi-layer perceptrons}), however since only the \textit{Gradient Boosting Tree Regressor} and the \textit{Support Vector Regressor} are the best models found, only them are going to be used and tested to predict the utilization of cars in a carsharing service, as presented in Chapter~\ref{ch:method}.
--- a/2_chapitre2.tex
+++ b/2_chapitre2.tex
@ -0,0 +1,506 @@
+
+\chapter{Free2Move Carsharing Data Analysis}
+\label{ch:data_analysis}
+
+\textit{Free2Move} service improvement requires an increase in car usage while decreasing operator costs.
+This requires to understand car usage in historical service data.
+However before a developing a methodology intended to be used by \textit{Free2Move}, it is necessary to make a preliminary analysis of the functioning services.
+First an overview is made on the three available datasets used in this thesis as well as the abstraction made of the area serviced.
+Then an exploration of each dataset is presented with the analysis of the characteristics of the fleet utilization, such as the hot spots of each service or the average usage during the day depending on the day of the week.
+Finally exogenous datasets are explored, details about the weather features creation are presented as well as the reason why characterizing the city with only open-data can be difficult.
+
+\section{Carsharing Service Modeling}
+
+For this thesis, \textit{Free2Move} has provided three carsharing trip datasets, corresponding to customer trips of three cities.
+The first studied service is located in \textit{Madrid} with the dataset coming from the service \textit{Emov}.
+The second dataset has been provided by the service \textit{Free2Move} located in \textit{Paris}.
+The last is from \textit{Free2Move} for its service located in \textit{Washington}.
+Each dataset is selected such that the global usage is constant in time, i.e the services are not in a warm-up phase with a noticeable and steady increase of usage, and that the perimeter of the service stays the same, i.e no city area is added or removed.
+In all three cases, details on the customer trip dataset split and usage is given as well as how the service area is abstracted into a grid made of hexagons, to represent the maximum distance a customer is willing to travel to retrieve a car.
+
+\subsection{Daily Trip Dataset Description}
+
+The three datasets are tabular data containing one trip per row and one information about the trip per column.
+From all the datasets, the common columns have been kept: they are the \textit{timestamp of departure}, the \textit{timestamp of arrival}, the \textit{GPS coordinates of departure}, the \textit{GPS coordinates of arrival}, the \textit{car ID}, the \textit{customer ID} and the \textit{distance} traveled.
+It should be noted that the \textit{car ID} and \textit{customer ID} have been pseudonymized, i.e. the IDs are the hashes of the real IDs to comply with requirements about data treatment from the GDPR\footnote{GDPR stands for \textquote{General Data Protection Regulation} and is a European Union regulation about data protection and privacy.} while keeping the information about how many trips the same car has made during the day or the reoccurrence of trips from the same customer.
+
+\paragraph{Datset Content.} The first dataset is coming from the service \textit{Emov} in Madrid and accounts for $1\,138\,246$ trips.
+They have been made between the 1$^{st}$ August 2018 and the 31$^{st}$ March 2019 included, this accounts for $243$ days of trip data.
+Thus on average around $4\,700$ trips par day have been made during this period.
+They have been made with $578$ cars, mostly \textquote{Peugeot Ion} and \textquote{Citroën C-Zero} which are both electric city cars.
+From now on, this service and its dataset will be denoted by the name \textquote{Madrid}.
+
+The second dataset is coming from the service \textit{Free2Move} located in Paris and records $130\,219$ trips made between the 1$^{st}$ April 2019 and the 31$^{st}$ January 2020 included.
+Thus for the $306$ days of trip data, there is an average of $425$ daily trips.
+They have been made with $475$ electric cars, with \textquote{Peugeot Ion} and \textquote{Citroën C-Zero} like in Madrid.
+From now on, this service and its dataset will be denoted by the name \textquote{Paris}.
+It should be noted that there are in \textit{Paris} dataset 10 times less daily trips than for \textit{Madrid}, this information is to be kept in mind as it might have an impact on the performance of the car utilization prediction.
+
+The last dataset is coming from the service \textit{Free2Move} in Washington and holds the information of $136\,095$ trips.
+They have been made during the period between the 1$^{st}$ August 2019 and the 31$^{st}$ March 2020 included, this accounts for $244$ days of trip data.
+Thus, on average around $557$ daily trips have been made during this period.
+Those trips have been made with a fleet of $400$ \textquote{Chevrolet Cruze}, an internal combustion engine (ICE) compact car, and $200$ \textquote{Chevrolet Equinox}, an ICE crossover utility vehicle.
+From now on, \textit{Washington} is the name with which this service and its dataset are going to be denoted.
+As well as the dataset \textit{Paris}, this service has been less used than the service in Madrid and the performance of prediction algorithms based on this dataset might be poorer than in the Madrid case.
+Table~\ref{tab:ch2_summary_content} summarizes the content of the three datasets provided by \textit{Free2Move} with all previously given information.
+
+Note that in both \textit{Madrid} and \textit{Paris} the fleet is homogeneous with small electric vehicles ; \textquote{Peugeot Ion} and \textquote{Citroën C-Zero} are comparable cars.
+In the case of \textit{Washington} service, the fleet is heterogeneous since both types of vehicles are not comparable, however, during the following thesis the assertion is made such that the difference in use case for these cars is negligible.
+Indeed, even if their subcategory is different their main purpose is still to transport people or small goods, contrary to utility vehicles used to move furniture between two houses for example.
+
+\begin{table}[!tb]
+    \centering
+    \small
+    \begin{tabular}{|l|c|c|c|c|c|}
+        \hline
+        City & Nb Trips & Period & Nb Days & Nb Daily Trips & Nb Cars \\ \hline
+        Madrid & 1\,138\,246 & 2018-08-01 to 2019-03-31 & 243 & 4\,700 & 578 \\ \cline{1-1}
+        Paris & 130\,219 & 2019-04-01 to 2020-01-31 & 306 & 425 & 475 \\ \cline{1-1}
+        Washington & 136\,095 & 2019-08-01 to 2020-03-31 & 244 & 557 & 600 \\ \hline
+    \end{tabular}
+    \caption{Summary of the datasets characteristics, for the three services. The total number of trips, the dataset period, the number of days in the dataset and the average number of daily trip gives an insight on the global usage of each service.}
+    \label{tab:ch2_summary_content}
+\end{table}
+
+
+\paragraph{Dataset Split.}
+For the method explained in Chapter~\ref{ch:method} to work, the usage of cars has to be predicted by a machine learning model.
+However as detailed in Chapter~\ref{ch:background}, a dataset is needed to train the model, so it can make predictions related to the problem at hand, and then a dataset is needed to evaluate the prediction performance of the trained model.
+Additionally, to find the best hyperparameters of the model to train, another dataset might be required.
+Often this is done by splitting a dataset into three sub-datasets~\cite{shalev_understanding_2014}.
+First a \emph{training set} is taken as the first split of the global dataset, to train the model.
+Then a \emph{validation set} is given by the second slice of the global dataset, to search the best hyperparameters of the model to train.
+Finally a \emph{test set} is taken from the remaining global dataset, to evaluate the prediction performance of the model.
+Those three slices of dataset are made such that the model cannot ``cheat'' by learning by heart the values to predict: the model should learn on a distinct part of the trip's data than the ones used to make the evaluation.
+To do so, a \textit{training set} is created as well as a \textit{validation set} and \textit{test set} from the whole customer trip dataset.
+To create balanced subsets with the least possible annual seasonality bias, the whole dataset for each city is first split into weekly trip data.
+Then each week is assigned to one of the three subsets, such that for every 8 continuous weeks, the first 6 weeks are assigned to the \textit{training set}, the next week is assigned to the \textit{validation set} and the last week is assigned to the \textit{test set}.
+If it is not possible to form a complete week at the end of the whole dataset, e.g there are only data for Monday to Wednesday, then this incomplete week is discarded.
+Moreover, if the dataset for a city begins a Wednesday, this means the \textquote{week} is from the Wednesday to the next Tuesday (included) for this dataset.
+
+Thus for the service located in \textit{Madrid}, the daily trip dataset is split to have $26$ weeks of data assigned to the \textit{training set}, $4$ weeks of data for the \textit{validation set} and $4$ weeks of data for the \textit{test set}.
+For the service located in \textit{Paris}, the daily trip dataset is split such that $33$ weeks are assigned to the \textit{training set}, $5$ weeks of data are given for the \textit{validation set} as well as $5$ weeks of data for the \textit{test set}.
+Finally for the service present in \textit{Washington}, the daily trip dataset is split to have $26$ weeks of data assigned to the \textit{training set}, $4$ weeks of data for the \textit{validation set} and $4$ weeks of data for the \textit{test set}.
+Table~\ref{tab:ch2_summary_split} summarizes the number of weeks assigned to each set for each service.
+
+\begin{table}[!ht]
+    \centering
+    \small
+    \begin{tabular}{|l|l|l|l|}
+        \hline
+        \multirow{2}{*}{City} & \multicolumn{3}{l|}{Data Partition} \\ \cline{2-4} 
+        & Training Set & Validation Set & Test Set \\ \hline
+        Madrid & 26 weeks & 4 weeks & 4 weeks \\ \cline{1-1}
+        Paris & 33 weeks & 5 weeks & 5 weeks \\ \cline{1-1}
+        Washington & 26 weeks & 4 weeks & 4 weeks \\ \hline
+    \end{tabular}
+    \caption{Summary of the data partition into training set, validation set and test set. Each set is made from multiples weeks of data, i.e data pieces of seven consecutive days.}
+    \label{tab:ch2_summary_split}
+\end{table}
+
+\FloatBarrier
+\subsection{Service Area Modeling}
+
+The datasets for each city have been temporally split into three subsets. 
+However in the trip's dataset for each city, the GPS positions of the pick-up and drop off of the car in each trip need to be modeled.
+Following the state-of-the-art methods, see Chapter~\ref{ch:background}, a discretization of all the gps points in the datasets is made with the help of a grid.
+Thus for each city the serviced area, i.e the area where the customer can pick up and drop off reserved cars, is taken and discretized with a grid made of hexagons, with the objective of representing the maximum distance a customer is willing to walk.
+
+The set of all the \textit{hexagons}, also called \textit{cells}, is noted $K$ and is made of hexagons with a radius of $500m$.
+Thus, all the parking spaces within the surface of one cell is labeled with the index $k$ of the corresponding hexagon.
+To create this grid of hexagonal cells of the city, a first GPS position is chosen.
+This GPS position $\theta$ is used as the origin and anchor of the grid.
+A 2D grid ($\theta$, $\vec{i}$, $\vec{j}$) is formed, such that $\vec{i}$ is a vector of length $\|\vec{i}\|=500m$ and pointing toward the east and $\vec{j}$ is a vector of length $\|\vec{j}\|=500m$ and pointing toward the south.
+The limit on the number of cells that can be placed horizontally is called $\lambda$, i.e it is not possible to have more than $\lambda$ ``columns'' in the 2D grid.
+Then to define a hexagon on the grid, the GPS position of its center is computed. 
+That position is found through the translation of the point $\theta$, where a $\Delta x$ and a $\Delta y$ is the offset in meters to the coordinates of the origin.
+It is possible to determine $\Delta x$ and $\Delta y$ for a cell of ID $k \in K$ with the formula :
+
+\begin{equation}
+    \begin{cases}
+        \Delta x = \|\vec{i}\| \cdot \left(k_i + \frac{1}{2}\right) \text{~~,~~} \Delta y = \frac{3}{2} \cdot \|\vec{j}\| \cdot k_j &\text{~~~~if~} k_j \bmod{2} = 0\\
+        \Delta x = \|\vec{i}\| \cdot k_i \text{~~,~~} \Delta y = \frac{3}{2} \cdot \|\vec{j}\| \cdot \left(k_j + 1\right) &\text{~~~~else}
+    \end{cases}
+\end{equation}
+
+$$\text{With: } k_i = k \bmod{\lambda} \text{~~~~and~~~~} k_j = \left\lfloor \frac{k}{\lambda} \right\rfloor$$
+
+The number of cells to use in the grid and the maximum number of columns of this grid ($\lambda$) have to be manually tuned by trial and errors to cover the whole serviced area.
+In the case of \textit{Madrid}, the serviced area covers roughly a quarter of the total city surface and is centered on the city center.
+To discretize all the possible parking spaces within the serviced area, the covering grid has a width of $19$ hexagons and a height of $24$ hexagons for a total of 456 hexagons. 
+However this \textquote{rectangular} grid made of hexagons include numerous hexagons that can be discarded since they are not included in the serviced area. 
+Thus only 155 hexagons are kept as cells within the serviced area.
+The Figure~\ref{fig:ch2_grid_madrid} shows the perimeter of the area serviced by \textit{Emov} in blue with the black hexagons represents the used part of the grid.
+
+\begin{figure}[!bt]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.55\textwidth]{figure/ch2_grid_madrid.jpeg}}
+        \caption[Hexagonal Grid for Madrid]{Map of Madrid, in blue is the geographical limit of the \textit{Emov} service. The hexagonal grid covering the the serviced area is displayed on top of Madrid and is made of 155 cells. Each cell has a radius of 500m.}
+        \label{fig:ch2_grid_madrid}
+\end{figure}
+
+With the same process in the case of \textit{Paris}, the grid should cover the whole city of Paris and include the \textit{Issy-les-Moulineaux} in the south-west of Paris.
+Thus a grid of a $20$ hexagons width and $15$ hexagons height has been made, for a total of 300 cells.
+Like for \textit{Madrid}, only a subset of $209$ cells actually covering the precise serviced area is kept.
+The precise perimeter of the serviced area is displayed with a blue line in Figure~\ref{fig:ch2_grid_paris} as well as the hexagon grid used to discretize it.
+
+\begin{figure}[!bt]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.75\textwidth]{figure/ch2_grid_paris.jpeg}}
+        \caption[Hexagonal Grid for Paris]{Map of Paris, in blue is the geographical limit of the \textit{Free2Move} service. The hexagonal grid covering the the serviced area is displayed and is made of 209 cells. Each cell has a radius of 500m.}
+        \label{fig:ch2_grid_paris}
+\end{figure}
+
+Finally, for the case of the service present in~\textit{Washington}, the grid has to cover the whole district of Columbia as well as the Arlington County, apart from a north-west part of the Arlington County.
+To do so a grid of $27$ hexagons wide and $31$ hexagons high has been made, for a total of $837$ cells.
+To cover the serviced area, only $411$ cells has been kept.
+It should be noted that the dataset for \textit{Washington} has roughly the same number of trips as the dataset of \textit{Paris}.
+However the serviced area in the city of \textit{Washington} is twice wider than \textit{Paris}, meaning that the average density of trip departures and arrival might be lower than for both \textit{Madrid} and \textit{Paris} and thus offering three datasets for three carsharing services having different environments.
+The Figure~\ref{fig:ch2_grid_washington} displays with a blue line the limits of the area serviced by \textit{Free2Move} and the hexagonal grid covering it.
+Note that the hexagons seem smaller even if their size is the same across the three cities: the map's scale is not the same for the three visualizations.
+
+\begin{figure}[!bt]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.75\textwidth]{figure/ch2_grid_washington.jpeg}}
+        \caption[Hexagonal Grid for Washington]{Map of Washington, in blue is the geographical limit of the \textit{Free2Move} service. The hexagonal grid covering the the serviced area is displayed and is made of 411 cells. Each cell has a radius of 500m.}
+        \label{fig:ch2_grid_washington}
+\end{figure}
+
+Once the grid is made, each GPS position in a trip is converted.
+It means that the GPS position of the departure and arrival are transformed into a cell ID ($k \in K$) for the departure and the arrival.
+This is done by using a dichotomy to reduce the number of Euclidean distances to compute between the gps point and the hexagons center, precisely two dichotomies on both north-south and west-east axis are done simultaneously.
+The grid is thus split into four quarters, north-west, north-east, south-west and south-east of the hexagonal grid and the gps points is defined  as being in one of the sub-grids.
+This dichotomy continues by selecting one quarter of the remaining sub-grid until only one cell remains.
+In this case, the gps point is assigned to this cell.
+However, if the distance between the gps point and the assigned hexagon center is superior to $500m$, it means the point is not inside this hexagon and thus outside of the grid : it needs to be rejected.
+When a GPS point is rejected, the whole trip is rejected too and considered as an outlier, for example a customer not respecting the limits imposed by the operator and dropping off the vehicle outside of the allowed perimeter.
+It should be noted that it represents a negligible number of trips.
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\FloatBarrier
+\section{Data Statistics}
+
+The city modeling has been made to ensure the abstraction of the spatial data.
+With this model, the carsharing usage in each of the three cities has to be studied from a spatial and a temporal point of view in order to know whether knowledge can be extracted from those datasets.
+Indeed if the dataset consists only of random noise, for example because people have an erratic usage of the service, then nothing of value might be learned by modeling the usage of the fleet.
+This basic verification can be done by using visualizations in order to check the existence of patterns visible to the naked eye.
+In the following section will be presented both a spatial analysis of the fleet usage patterns as well as a temporal analysis.
+Finally, customers are regrouped in order to check if differentiation can be made between regular and irregular users of the service.
+
+\subsection{Spatial Usage Patterns}
+
+To determine if the city modeling helps to model the user's usage patterns, a brief spatial analysis is done on the spatial distribution of the trip departures.
+For the whole period of the dataset, the number of times a trip has started from each cell has been counted.
+Then an additional indicator has been made to check the spatial imbalance in the customer demand, notably by taking the difference for each cell between the number of arrivals and departures.
+Thus for the three cities two heatmaps have been made to visualize where are located the hot spots of usage and where the staff have focused their efforts to counterbalance the unbalanced customer demand.
+
+In the case of \textit{Madrid}, the Figure~\ref{fig:ch2_depdes_madrid} shows on the left heatmap that two main hot spots exist in the city center.
+The presence of these two close hot spots is partially explained by the \textquote{Low Pollution Zone} that was created in the city center of Madrid in late 2018.
+This disallows most of the private vehicles (cars \& motorcycles) the access to the city center, unless the vehicle is an electric one.
+The usage of carsharing in those cells are being boosted by this \textquote{Low Pollution Zone} regulation.
+With the help of the departure/arrival disequilibrium heatmap (on the right) in the same figure, an excess of departures is on average observable in those hot spots while the peripheral cells with a lower departure counts see an unbalanced demand with more arrivals than departures.
+Thus in the case of \textit{Madrid} the city abstraction makes visible spatial differences in the customer demand that can be modeled.
+
+For the service in \textit{Paris}, the Figure~\ref{fig:ch2_depdes_paris} presents two heatmaps.
+With the departure heatmap, two clear hot spots surrounded by high usage cells are visible.
+Overall the west and north-west side of Paris see a high rate of departure than in the east or south-east of Paris.
+The park \textquote{Bois de Vincennes} in the east of Paris see fewer than one thousand trip departures on the whole dataset (>300 days) while a single cell in the district \textquote{XVI$^{eme}$ Arrondissement} can surpass this usage.
+The location of the hot spots can be explained by the socio-economic distribution of the population inside the city: in the west and north-west of the city reside wealthier dwellers than in the east.
+The customer demand unbalance in noticeable in three spots, the west and north-west where an average the cells see more departures than arrivals, while the east and the south-west see more arrivals than departures.
+Overall like in the \textit{Madrid} case, the abstraction of the city space with a hexagonal grid makes visible the differences in the customer usage pattern that can be learned by using machine learning models.
+
+In the city \textit{Washington}, same heatmaps are shown in Figure~\ref{fig:ch2_depdes_washington}.
+A large hot spot is observable mostly in the downtown of Washington, partially due to the high density of offices and commercial oriented buildings.
+Indeed as it will be shown in the next subsection, the use case for the service in Washington is both for commuting during the week and leisure during the weekend.
+As for the customer demand, most of the hot spots in Washington's downtown are balanced with the exception of several cells with a deep imbalance where the service's jockey focused their relocations.
+It is noticeable that all the Arlington County is imbalanced as almost all cells see more trip arrivals than trip departures compared to the Washington area.
+Like for \textit{Madrid} and \textit{Paris}, spatial models of the customer demand can be made.
+
+\begin{figure}[!b]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch2_depdes_madrid.jpeg}}
+        \caption[Heatmaps for Madrid]{Departure count (left) and departure/arrival disequilibrium (right) heatmaps for the trip dataset of \textit{Madrid}. In the left heatmap, the greener a cell is, the less departure has been recorded for the whole dataset. The number in each cell is the total number of departures from this cell. In the right heatmap, the greener a cell is, the less unbalanced the cell is between the number of cars rented by customers and the number of cars dropped off. On the contrary, redder cells tend to have a shortage of cars while bluer cells tend to have a surplus of cars. The number in each cell is the difference between the number of arrivals and the departures. Negative numbers denote more departures than arrivals and positive numbers is the inverse. For example \textquote{-46} denotes 46 more departures than arrivals.}
+        \label{fig:ch2_depdes_madrid}
+\end{figure}
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[height=0.75\textheight]{figure/ch2_depdes_paris.jpeg}}
+        \caption[Heatmaps for Paris]{Departure count (top) and departure/arrival disequilibrium (bottom) heatmaps for the trip dataset of \textit{Paris}. In the top heatmap, the greener a cell is, the less departure has been recorded for the whole dataset. The number in each cell is the total number of departures from this cell. In the bottom heatmap, the greener a cell is, the less unbalanced the cell is between the number of cars rented by customers and the number of cars dropped off. On the contrary, redder cells tend to have a shortage of cars while bluer cells tend to have a surplus of cars. The number in each cell is the difference between the number of arrivals and the departures. Negative numbers denote more departures than arrivals and positive numbers is the inverse. For example \textquote{22} denotes 22 more arrivals than departures.}
+        \label{fig:ch2_depdes_paris}
+\end{figure}
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[height=0.75\textheight]{figure/ch2_depdes_washington.jpeg}}
+        \caption[Heatmaps for Washington]{Departure count (top) and departure/arrival disequilibrium (bottom) heatmaps for the trip dataset of \textit{Paris}. In the top heatmap, the greener a cell is, the less departure has been recorded for the whole dataset. The number in each cell is the total number of departures from this cell. In the bottom heatmap, the greener a cell is, the less unbalanced the cell is between the number of cars rented by customers and the number of cars dropped off. On the contrary, redder cells tend to have a shortage of cars while bluer cells tend to have a surplus of cars. The number in each cell is the difference between the number of arrivals and the departures. Negative numbers denote more departures than arrivals and positive numbers is the inverse. For example \textquote{-402} denotes 402 more departures than arrivals.}
+        \label{fig:ch2_depdes_washington}
+\end{figure}
+
+
+\FloatBarrier
+\subsection{Weekly Customer Usage}
+
+With the data abstracted on a spatial level thanks to the usage of a grid made of hexagons, there remains the need to split the data temporally.
+Indeed, splitting the data with hourly, daily or weekly periods do not model the same information.
+Hourly periods help to detect whether the service has been used more for commuting than leisure.
+Daily periods allow to better assess how a fleet of vehicles should be relocated inside the city to counteract the demand imbalance.
+Weekly periods make visible seasonal effects in the customer demand, such as holiday period decreasing the usage of the service.
+For the three cities, the information of the average customer usage along the day for each is presented, in Figure~\ref{fig:ch2_tripdayhour_madrid} for \textit{Madrid}, in Figure~\ref{fig:ch2_tripdayhour_paris} for \textit{Paris} and in Figure~\ref{fig:ch2_tripdayhour_washington} for \textit{Washington}.
+In those figures, the plain curves represent the workdays and the dashed lines are the day of the weekend.
+The number of trips each day of the dataset is also shown in a second type of figure for all three cities, see Figure~\ref{fig:ch2_nbtrip_madrid} for \textit{Madrid}, Figure~\ref{fig:ch2_nbtrip_paris} for \textit{Paris} and Figure~\ref{fig:ch2_nbtrip_washington} for \textit{Washington}.
+
+For \textit{Madrid}, the average number of trips made along each workday (Figure~\ref{fig:ch2_tripdayhour_madrid}) is linked in part to commute trips in the morning (around 9 a.m.) and in the evening (around 8 p.m.).
+There exists another peak in the middle of the day that could correspond to a leisure-oriented usage of the service.
+Unlike the other days of the week, the evening usage peak in the Friday is distributed on a longer period in the evening, meaning that the cars of the fleet are used for a leisure purpose too, with for example people coming back from bars or other similar amenities.
+During the weekend the service is also used for non-work-related trips, with a peak in the afternoon (around 2 p.m.) and the in late evening (10 p.m.).
+It should be noted that overall the service is more used during the workdays than during the weekends.
+This is further shown in Figure~\ref{fig:ch2_nbtrip_madrid} where each weekend sees a drop in customer usage.
+Seasonal effects, such as holidays, are also shown: the usage of the service is decreased during the summer time (in August 2018).
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.95\textwidth]{figure/ch2_tripdayhour_madrid.jpeg}}
+        \caption[Trip distribution in Madrid]{Distribution of the customer trips along the day of the week and the hour of the day for the service \textit{Emov} in \textit{Madrid}. Each curve represent the average number of trip (in y axis) for each hour (in x axis) depending on the day (color \& type of the curve). The day of the week, Monday to Friday, are the plain curves while the day of the weekends are the dotted curves.}
+        \label{fig:ch2_tripdayhour_madrid}
+        \makebox[\textwidth][c]{\includegraphics[width=0.95\textwidth]{figure/ch2_nbtrip_madrid.jpeg}}
+        \caption[Trip Count Madrid]{Count of the number of trip made by the customer in \textit{Madrid} (y-axis) each day of the dataset (x-axis), such that one point is the value for a day and each x-tick is a Sunday.}
+        \label{fig:ch2_nbtrip_madrid}
+\end{figure}
+
+For \textit{Paris}, Figure~\ref{fig:ch2_tripdayhour_paris} first shown that the service is used more during the weekends than during the workdays.
+On average, the customer usage never falls below 30 trips/h between 9 a.m. and 5 p.m. Saturday and Sunday, while the peak of usage in the Friday evening (5 p.m.) is around 32 trip/h only for this hour in particular.
+Even if the service is also used for commute, confirmed by the morning peak (6 a.m.) early enough to avoid traffic jams and the evening usage peak (5 p.m.).
+During the workdays, the fleet is also used for leisure-related trips between 8 p.m. and 11 p.m. as shown by a medium plateau rate of usage.
+The higher usage during the weekend is also visible on Figure~\ref{fig:ch2_nbtrip_paris}, with regular peak of usage the Saturdays and Sundays.
+It should be noted that a drop in usage is visible for the last week of November 2019, but without any known explanation.
+Right after, during the months of December 2019 and January 2020, an increase in the service utilization is observed and can be linked to strikes in the public transports.
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.95\textwidth]{figure/ch2_tripdayhour_paris.jpeg}}
+        \caption[Trip distribution in Paris]{Distribution of the customer trips along the day of the week and the hour of the day for the service \textit{Free2Move} in \textit{Paris}. Each curve represent the average number of trip (in y axis) for each hour (in x axis) depending on the day (color \& type of the curve). The day of the week, Monday to Friday, are the plain curves while the day of the weekends are the dotted curves.}
+        \label{fig:ch2_tripdayhour_paris}
+        \makebox[\textwidth][c]{\includegraphics[width=0.95\textwidth]{figure/ch2_nbtrip_paris.jpeg}}
+        \caption[Trip Count Paris]{Count of the number of trip made by the customer in \textit{Paris} (y-axis) each day of the dataset (x-axis), such that one point is the value for a day and each x-tick is a Sunday.}
+        \label{fig:ch2_nbtrip_paris}
+\end{figure}
+
+Finally, for \textit{Washington}, the customer usage has some similarities with the usage in \textit{Paris} from a temporal point of view.
+The service is mostly used for commute trips during the workdays, with a peak of utilization in the morning (8 a.m.) and a peak of utilization in the evening (5 p.m.).
+Unlike the customers in \textit{Paris}, the constant decrease in usage during the evening might indicate that leisure-related trips are rarer during the workdays.
+During each day of the weekends, from 10 a.m. to 4 p.m., the customer usage rate is on par with the peak utilization during the workdays, but for leisure-oriented usages.
+It should be noted that overall the usage of the service is higher during the Fridays and Saturdays as seen on Figure~\ref{fig:ch2_nbtrip_washington}.
+Several peaks or drop of utilization are observed as well, some are linked to holidays such as the peak of utilization the 2019-12-31, but others cannot be linked to such events.
+Large social events may be associated with them, but no data about this information was available.
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.95\textwidth]{figure/ch2_tripdayhour_washington.jpeg}}
+        \caption[Trip distribution in Washington]{Distribution of the customer trips along the day of the week and the hour of the day for the service \textit{Free2move} in \textit{Washington}. Each curve represent the average number of trip (in y axis) for each hour (in x axis) depending on the day (color \& type of the curve). The day of the week, Monday to Friday, are the plain curves while the day of the weekends are the dotted curves.}
+        \label{fig:ch2_tripdayhour_washington}
+        \makebox[\textwidth][c]{\includegraphics[width=0.95\textwidth]{figure/ch2_nbtrip_washington.jpeg}}
+        \caption[Trip Count Washington]{Count of the number of trip made by the customer in \textit{Washington} (y-axis) each day of the dataset (x-axis), such that one point is the value for a day and each x-tick is a Sunday.}
+        \label{fig:ch2_nbtrip_washington}
+\end{figure}
+
+Overall in the three services, the customers use the service for different purposes.
+There are always at least two peaks associated with commute oriented trips and one plateau of usage during the late morning and afternoon during the weekends.
+Seasonality effects can be observed, most notably in the case of \textit{Madrid}, as well as large social events or punctual holidays.
+Then from a temporal point of view, splitting the data to model the utilization each day is an acceptable approach, with the need to keep information such as the day being a holiday or a workday/weekend.
+
+\FloatBarrier
+\subsection{Customer Groups}
+
+In order to complete the results from the previous analysis, customers have been regrouped according to the criteria as presented by~\cite{wielinski_exploring_2019}.
+As detailed in Chapter~\ref{ch:background}, the authors of~\cite{wielinski_exploring_2019} have separated customers into four categories: \emph{Low Frequency} (\emph{LF}), \emph{Medium Frequency} (\emph{MF}), \emph{High Frequency} (\emph{HF}) and \emph{Ultra Frequency} (\emph{UF}).
+These categories are defined depending on the frequency at which the customer uses the service.
+Customers using the most the service are in \emph{Ultra Frequency} while occasional customers are in \emph{Low Frequency}.
+Since in the three trip datasets the customers in \emph{LF} and \emph{MF} categories behaves similarly, they have been regrouped into a \textit{LMF} (\textit{Low \& Medium Frequency}) category.
+Additionally, the customers in \emph{HF} and \emph{UF} categories have the same behavior, they have been also regrouped in the same category called \textit{HUF} (\textit{High \& Ultra Frequency}).
+Between the two categories to be used in this analysis, \emph{LMF} and \emph{HUF}, the limit is set such that customers being active less than 11\% of the days are put in \emph{LMF} while the others are put in \emph{HUF}.
+
+For the three cities, the spatial distribution of the trips made by both \textit{LMF} and \textit{HUF} categories are similar.
+However as shown in the following figures, the usage pattern between active members of the service and less active ones are different when looked from a temporal point of view.
+In the following figures, all the days inside the dataset have been studied.
+
+In the case of the service located in \textit{Madrid}, $86\,770$ customers ($90\%$ of customers) have been assigned to the \textit{LMF} category while $9\,696$ customers have been assigned to the \textit{HUF} category.
+In the Figure~\ref{fig:ch2_customer_study_madrid} is reported the daily proportion of trips made by both categories in this city, as well as the total number of trips made each day.
+As for the proportion of trips made by both categories, it is possible to observe that during the summer holidays and the winter holidays the dominant category of users for the service are the \textit{LMF}, while fewer users from \textit{HUF} are using the service during these periods.
+A complementary observation is made for the other periods where \textit{HUF} customers are doing more daily trips than \textit{LMF} customers.
+By checking the total number of trips made during the day for both periods, the conclusion is that a drop in the number of total trips made by \textit{HUF} customers is changing the proportion.
+That is, trips often made for commute purposes by the \textit{HUF}, observable by the previously done temporal analysis, are not done during those holidays periods.
+Thus days being a workday (or not) has an impact on the utilization of the service.
+Furthermore, a temporal pattern on a weekly basis confirms that \textit{HUF} customers are using the service for commute trips.
+Indeed for most of the weeks in the non-holiday period, the category \textit{HUF} is the predominant one during the weekdays while the category \textit{LMF} is the predominant during the weekends.
+This observation is associated with fewer trips being made during the weekends as observed previously.
+
+\begin{figure}[!tb]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch2_customer_study_madrid.jpeg}}
+        \caption[Customer Study in Madrid]{Distribution of trips made with the service in \textit{Madrid} by two categories of customers: LMF (Low \& Medium Frequency) and HUF (High \& Ultra Frequency) as described by~\cite{wielinski_exploring_2019}. The top curves are representing the distribution (as a percentage) of the number of trips made by the two customer types along the whole dataset for each day, the blue curve is the proportion (\%) of trips made by the LMF while the red curve is for HUF. The bottom histogram is the absolute number of trips made each day by all the customers. Thus for example the 2019-03-31 around 3000 trips has been made and 40\% of them were made by customers categorized as HUF and 60\% by customers categorized as LMF.}
+        \label{fig:ch2_customer_study_madrid}
+\end{figure}
+
+For the service located in \textit{Paris}, the \textit{LMF} category contains $6\,861$ customers ($87\%$ of customers) while the \textit{HUF} category has $1\,044$ customers.
+In the Figure~\ref{fig:ch2_customer_study_paris} is reported the same information of daily trip proportion made by \textit{LMF} and \textit{HUF} customers.
+As for the service in \textit{Madrid}, the \textit{HUF} customers are more predominant during the weekdays than during the weekends.
+However contrary to the \textit{Madrid}'s service, the change in the ratio of trips made between \textit{HUF} customers and \textit{LMF} customer is due to an increase in the usage from the category \textit{LMF}.
+This means that during the weekends, the demand for cars is more likely to be from decisions by many more customers but less frequent ones.
+Thus this demand might appear to be more \textquote{random}.
+Since the data contains the year 2019, the impact of public transportation strikes can be noticed too.
+For example, a strike from the workers of \textit{RATP}\footnote{\textit{RATP} stands for \textit{Régie Autonome des Transports Parisiens}} (a public transportation provider) the Friday 2019-09-13 had an impact on the demand for carsharing usage.
+During this day the demand for carsharing doubled, around 800 trips were made when compared to the average of 400 trips the three previous Fridays.
+Moreover the proportion of \textit{LMF} users leads to the conclusion that people not taking carsharing often had used the service in replacement of public buses or subways.
+This increase of carsharing usage following a strike in the public transportation can be seen during the whole month of December 2019 and January 2020, with numerous buses lines or subways lines being stopped.
+However in this case, the increase of trip is predominantly made by \textit{HUF} customers, that is customers actively using the service.
+While the information about a public transportation malfunction should be noted, be it caused by a strike or natural phenomenons, no precise annotation was available in the datasets about which days were concerned by public transportation malfunction at the time where the data was retrieved.
+Furthermore, it is not yet clear to know if usage patterns can be learned for such exceptional events.
+For these reasons, strikes and other exceptional natural events are not taken into account for the following work.
+
+\begin{figure}[!tb]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch2_customer_study_paris.jpeg}}
+        \caption[Customer Study in Paris]{Distribution of trips made with the service in \textit{Paris} by two categories of customers: LMF (Low \& Medium Frequency) and HUF (High \& Ultra Frequency) as described by~\cite{wielinski_exploring_2019}. The top curves are representing the distribution (as a percentage) of the number of trips made by the two customer types along the whole dataset for each day, the blue curve is the proportion (\%) of trips made by the LMF while the red curve is for HUF. The bottom histogram is the absolute number of trips made each day by all the customers. Thus for example the 2020-01-26 around 700 trips has been made and 75\% of them were made by customers categorized as HUF and 25\% by customers categorized as LMF.}
+        \label{fig:ch2_customer_study_paris}
+\end{figure}
+
+The last service in \textit{Washington} has $5\,124$ customers ($75\%$ of customers) assigned to the \textit{LMF} category and $1\,665$ customers assigned to the \textit{HUF} category.
+As for the two other cities, the Figure~\ref{fig:ch2_customer_study_washington} presents both the proportion of the daily number of trips made by \textit{HUF} and \textit{LMF} customers on top, and the total number of daily trips made on bottom.
+Contrary to the services in \textit{Madrid} and \textit{Paris}, the predominant category of customers is the regular ones since almost no day see the \textit{LMF} customers make more trips than the \textit{HUF} ones in proportion.
+Weekly cycles can be observed like for the two previous services, with in general a higher proportion of trips made by \textit{LMF} customers and a lesser proportion made by \textit{HUF} customers, even if the \textit{HUF} category stays predominant for each day.
+For several isolated days, the usage of the carsharing service is either the triple or the third of the average usage.
+Exceptional events can be the origin of those peaks/holes in the histogram of the total number of daily trips made.
+However it is not possible to link those exceptional values to any events.
+Indeed even with holidays taken into account, the peak of usage cannot be explained for dates like the 31$^{st}$ January 2020.
+However one can notice a decrease in the usage of the carsharing service during mid-March 2020, which is the month seeing an increase present of covid-19 in the US and thus in Washington too, leading to less demand for transportation, such as carsharing services.
+
+\begin{figure}[!tb]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch2_customer_study_washington.jpeg}}
+        \caption[Customer Study in Paris]{Distribution of trips made with the service in \textit{Washington} by two categories of customers: LMF (Low \& Medium Frequency) and HUF (High \& Ultra Frequency) as described by~\cite{wielinski_exploring_2019}. The top curves are representing the distribution (as a percentage) of the number of trips made by the two customer types along the whole dataset for each day, the blue curve is the proportion (\%) of trips made by the LMF while the red curve is for HUF. The bottom histogram is the absolute number of trips made each day by all the customers. Thus for example the 2020-03-29 around 400 trips has been made and 70\% of them were made by customers categorized as HUF and 30\% by customers categorized as LMF.}
+        \label{fig:ch2_customer_study_washington}
+\end{figure}
+
+Overall for the three selected services several common properties of the customer usage are observed.
+First the usage of those services are in accordance with observations made for other free-floating carsharing services, as presented in Chapter~\ref{ch:background} and Section~\ref{sec:user_behavior}, that is this kind of service is often used by regular customers, often for commute trips.
+Second, for the three services, the weekends see an increase usage of the customers not using the service often, meaning that the global behavior of the demand might be more random during these days than for the weekdays and workdays.
+Third, holidays and exceptional have an impact on the demand since commute trips are either less or more necessary for those days, depending on whether people commute less during holidays or need cars more if the public transport system is less available.
+This information is necessary to model the daily usage of the fleet.
+State wide holidays can be found easily but other exceptional events have not been precisely logged by the operator and thus unavailable for study.
+Finally, even if the surface occupied by the services in \textit{Madrid} and \textit{Paris} are comparable, there is overall less usage for the service in \textit{Paris}.
+Furthermore, the usage in the services located in \textit{Paris} and \textit{Washington} are comparable but the surface occupied is at least twice as big in \textit{Washington} than in \textit{Paris}.
+Thus even if the demand in each service has similarities, it should not be forgotten that each the demand for each service has a unique component that will have an impact in the performance of algorithms modeling it.
+
+\begin{table}[!tb]
+    \begin{tabular}{|c|c|c|c|c|c|}
+    \hline
+    City & Category & NbCustomer & NbTrip & Trip/Customer & Avg TimePerTrip \\ \hline
+    \multirow{2}{*}{Madrid} & LMF & 86 770 & 583 298 & 6.7 & 20 mins \\
+     & HUF & 9 696 & 542 069 & 56 & 20 mins \\ \hline
+    \multirow{2}{*}{Paris} & LMF & 6 861 & 51 863 & 7.5 & 29 mins \\ 
+     & HUF & 1 044 & 77 191 & 74 & 24 mins \\ \hline
+    \multirow{2}{*}{Washington} & LMF & 5 124 & 38 088 & 7.4 & 114 mins \\ 
+     & HUF & 1 665 & 97 177 & 58 & 60 mins \\ \hline
+    \end{tabular}
+    \caption{Summary of the utilization made by the two types of customers (\textit{LMF} and \textit{HUF}) for the three services (\textit{Madrid}, \textit{Paris}, \textit{Washington}). The first column is the number of customer in each category for each service. The second column precises the number of trips made by all the customers from each category. The third column gives the average number of trip made by each person from a category in a city. The last column represents the average time a trip has lasted for each customer of a category in a city.}
+    \label{tab:ch2_allcustomers}
+\end{table}
+
+The presented information about the customer usage of the service for the three cities are summarized in Table~\ref{tab:ch2_allcustomers}.
+For every service, only a minority of customers is making either half of the trips done in the case of \textit{Madrid} or 70\% of the trips in the case of \textit{Washington}.
+Thus, the usage of the carsharing service's fleet could be modeled since it is not only made of \textquote{usage noise} made by irregular users.
+This is further confirmed by the average number of trips made by each customer depending on its category: in all the cities the average number of trips made by \textit{HUF} customers is around ten times the average of \textit{LMF} customers.
+However the modeling of the fleet usage might be made difficult by the impact of the irregular users in the \textit{Washington} service.
+Indeed in this case the irregular customers use on average the car for around 2 hours while regular users are using the car for only one hour.
+If a model had to predict the usage of the fleet, the randomness added by \textit{LMF} customers might have a higher impact since they use the car twice as long as a regular user while the spatial demand for cars is not different for regular users, i.e a car placed somewhere might see a higher variation depending only on the type of customer renting it.
+
+Moreover, it should be noted that this study is made only with data of past trips, i.e. only the trips that actually happen are recorded and used here.
+This differs from the real user demand.
+Indeed, the users might not be able to take the service because the fleet was not positioned well enough to offer a car nearby.
+This means that even if a customer actually demanded a trip, i.e. a car to move from somewhere to somewhere else, this demand did not translated into a recorded trip.
+Modeling the real customer demand with the information about typical customer behaviors from each empirical category can be tricky because of this reason.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\FloatBarrier
+\section{Supplementary Datasets}
+
+On top of the spatial and temporal distribution information of the customer usage, external factors have to be taken into account to model the general utilization of a carsharing service.
+To do so, two axes have been considered: the daily weather in the city and the buildings type in each cell of the hexagon grid.
+First is presented how the weather dataset has been retrieved and pretreated to be used in the methodology presented in Chapter~\ref{ch:method}.
+Second an approach to use open datasets from OpenStreetMap about the type of buildings is presented, as well as the reasons why this dataset could not be used for the methodology in Chapter~\ref{ch:method}.
+
+\subsection{Weather Information}
+
+Three additional weather datasets are used as exogenous data to train our prediction models. 
+They have been retrieved from the hourly SYNOP weather broadcasts from the nearest weather station for each service and for each day in the trip dataset of the service.
+In the case of \textit{Madrid}, the weather from the station in the \textit{Adolfo-Suarez Madrid-Barajas Airport} is used.
+For \textit{Paris}, the station in the \textit{Orly Airport} provides the weather data.
+Finally, for \textit{Washington}, the \textit{Ronald Reagan Washington National Airport} has a station providing the weather data.
+
+From the SYNOP broadcasts, several features have been extracted: the temperature ($^\circ$C), the relative humidity (\%), the pressure (hPa), the wind speed (km/h), the cloud cover (\%) and the amount of rain (mm). 
+If hourly report entries were missing, missing values were imputed considering a linear change between the hour before and after each missing entry.
+Then an average over all hours for each day has been made for each feature, with the objective to have a daily value for them.
+Those particular features have been used since some combinations helps to better understand the weather occurring each day.
+For example, temperature and relative humidity help to know whether high temperatures are more bearable for someone or not, e.g 28°C with a relative humidity of 20\% is more bearable than 28°C with a relative humidity of 60\%.
+The aim is to take into account which are the days when customers are more keen to use the service because using a bike-sharing service or other means of transport would be too inconvenient because of the weather.
+
+\FloatBarrier
+\subsection{City Buildings Usage}
+
+Inside the city, trips are motivated by the need for customers to travel towards places where activities are done or to go back home.
+As described before, customer usage of a free-floating carsharing service is linked to either commuting to work or going to leisure places.
+An axis that has been explored is the characterization of each area of the city, i.e. know whether each cell includes more building of residential/commercial orientation or workplaces.
+To retrieve the usage type of each building, the \textit{OpenStreetMap} (\textit{OSM})\footnote{The \emph{OSM} database has been queried through the \emph{Overpass API}: \url{https://overpass-api.de/}} database can be used.
+It is an open-source equivalent of Google Maps, i.e. a map of the world where buildings, roads, rivers and other points of interest for cartography are filled out by volunteers.
+When volunteers decide to map real buildings into the OpenStreetMap database, additional information can be added such as the type of building, the number of floors or the usage made of the building.
+
+For the following characterization, four categories of usage type for buildings have been selected: \textquote{Residential}, \textquote{Commercial}, \textquote{Office} and \textquote{Leisure}.
+The first category, \textquote{Residential}, is made up from apartments and individual houses.
+The category \textquote{Commercial}, is made of buildings dedicated to sell goods or services, e.g. hotels, marketplaces, shops or supermarkets.
+The \textquote{Office} category regroups the offices and other tertiary workplaces.
+The last category, \textquote{Leisure}, regroups every leisure-related buildings such as cafes, gyms, libraries or cinemas.
+
+One approach to characterize the city area would use the hexagon of the grid: each hexagon could be characterized with the help of the buildings inside of it.
+However when the buildings with known usage types are displayed on a map, it is possible to see that most of the cities are not documented enough.
+For example, Figure~\ref{fig:ch2_osm_madrid} is the representation of the information contained in the \textit{OpenStreetMap} database for \emph{Madrid}.
+The map itself comes from the available data and is provided by \textit{OSM}.
+The available information about the usage type of the buildings is displayed on top as colored points, and numerous areas are not covered with such points.
+This means areas like in the south-east of \textit{Madrid} is well documented but the west or the north-east is less documented.
+Figure~\ref{fig:ch2_osm_paris} shows the same information about \textit{Paris}, only buildings on avenues and boulevards with a high frequentation have their usage type given in the database.
+This can be observed as purple and blue dots, i.e \textquote{Leisure} and \textquote{Commercial} buildings, are much more represented than residential buildings.
+The same phenomenons can be observed in Washington in Figure~\ref{fig:ch2_osm_washington}, large areas in Washington have buildings but not usage type documented, as seen in the north-east and south-east.
+This kind of incomplete data cannot be used to characterize cells.
+Indeed if a cell is categorized as composed mainly of \textquote{Commercial} or \textquote{Office} buildings in the center of \textit{Paris}, this might hide the high presence of residential buildings where inhabitants could be interested in carsharing services.
+Thus even if theoretically with complete data additional abstraction could be made for the training of models for \textquote{Residential}, \textquote{Commercial}, \textquote{Office} or \textquote{Leisure} focused areas, in practice the lack of data makes this approach hazardous.
+
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.75\textwidth]{figure/ch2_osm_madrid.jpeg}}
+        \caption[Building Type Madrid]{Map of \textit{Madrid} with the display of buildings usage according to the OpenStreetMap database. The perimeter of the service is the blue line. Each colored dot is a building which has its usage type filled out in the OpenStreetMap database. A green dot denotes a residential building, blue dots are commercial building, cyan dots are offices and purple dots are leisure oriented buildings.}
+        \label{fig:ch2_osm_madrid}
+\end{figure}
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch2_osm_paris.jpeg}}
+        \caption[Building Type Paris]{Map of \textit{Paris} with the display of buildings usage according to the OpenStreetMap database. The perimeter of the service is the blue line. Each colored dot is a building which has its usage type filled out in the OpenStreetMap database. A green dot denotes a residential building, blue dots are commercial building, cyan dots are offices and purple dots are leisure oriented buildings.}
+        \label{fig:ch2_osm_paris}
+\end{figure}
+
+\begin{figure}[!p]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch2_osm_washington.jpeg}}
+        \caption[Building Type Washington]{Map of \textit{Washington} with the display of buildings usage according to the OpenStreetMap database. The perimeter of the service is the blue line. Each colored dot is a building which has its usage type filled out in the OpenStreetMap database. A green dot denotes a residential building, blue dots are commercial building, cyan dots are offices and purple dots are leisure oriented buildings.}
+        \label{fig:ch2_osm_washington}
+\end{figure}
--- a/3_chapitre3.tex
+++ b/3_chapitre3.tex
@ -0,0 +1,667 @@
+\chapter{Prediction-based Fleet Relocation}
+\label{ch:method}
+
+The state-of-the-art presented in Chapter~\ref{ch:background} shows that one of the main axes to improve a carsharing service is to relocate cars from areas with less usage to areas with a higher rate of customer usage.
+The analysis of carsharing trips datasets in Chapter~\ref{ch:data_analysis} unveil that the three services have hot spots and cold spots, i.e. areas with respectively high and low demand.
+On top of those spots, there are locations with a \textquote{deficit} or an \textquote{excess} of cars, meaning that relocation of cars in those cities is needed.
+Since two services of the three available for study in this thesis have a fleet of electric vehicles, with different recharging strategies, the focus was made only on making a strategy for the relocation of misplaced vehicles.
+That is, no discharged vehicle has to be taken into account and the operator should still manage manually the recharging of electric vehicles.
+Thus, vehicles charged enough to be used by customers but placed in a low demand area is the focus of the relocation methodology presented as the main contribution of this thesis.
+
+In this Chapter is presented an improvement of the methodology published as the main contribution of this thesis in the \emph{IEEE International Conference on Tools with Artificial Intelligence 2021}~\cite{martin_prediction_2021} and in the \emph{Conférence Nationale en Intelligence Artificielle 2022}~\cite{martin_optimisation_2022}.
+This methodology is split in two parts that are detailed one after another.
+The first part is about the daily customer usage prediction of a car.
+That is the number of minutes the car is going to be spent rented by all the customers depending on the place where the car has been located in the morning at 6 a.m.
+The second part is the automated decision of where cars should be placed.
+It relies on the knowledge of where the cars were located before the \emph{relocation} phase is made, i.e the period when cars are being moved around by the staff.
+It also relies on predictions of the customer daily usage for the next day.
+Indeed in the case of the thesis, one of the constraints of the \emph{Free2Move} service is the inability to relocated cars during the day, i.e. vehicles of the service can be moved only during the night.
+
+\section{Prediction of the Utility}
+
+The first part of the methodology is the prediction of the daily usage of a car depending on which area it is placed.
+As it should take into account that the cars can only be relocated at night, i.e. when a car is relocated during the night the staff won't be able to act on it until the next night, the objective is to predict the daily usage of the car, under the hypothesis that a car can be used multiple times during the day.
+Placing the car at the right spot can enable it to make \textquote{chains of trips} during the day.
+First is presented how the daily customer usage is modeled by the concept of \emph{Utility}.
+Then the evaluation of different regression models is made, with the objective to find the type of model to use for the second part of the methodology.
+
+\subsection{Modeling the Utility}
+
+From the historical datasets of \emph{Free2Move}, a set of tuples to represent the user trips (\emph{trips}) has been created such that: 
+$$\mathit{trips} = \{(\mathit{id}_{\mathit{vehicle}}, \mathit{id}_{\mathit{customer}}, \mathit{ts}_{\mathit{from}}, \mathit{ts}_{\mathit{to}}, \mathit{gps}_{\mathit{from}}, \mathit{gps}_{\mathit{to}}, \mathit{duration})\}$$
+In these tuples, $\mathit{id}_{\mathit{vehicle}}$ is the unique ID of the vehicle used to make the trip.
+The unique ID of the customer doing the trip is denoted $\mathit{id}_{\mathit{customer}}$.
+The timestamp of departure and arrival of the trip are respectively referred as $\mathit{ts}_{\mathit{from}}$ and $\mathit{ts}_{\mathit{to}}$.
+The information of the departure and arrival locations are respectively noted $\mathit{gps}_{\mathit{from}}$ and $\mathit{gps}_{\mathit{to}}$.
+They are provided as GPS coordinates, which can be mapped to the geographical cells defined earlier.
+From the timestamps (e.g. $\mathit{ts}_{\mathit{from}}$) additional features can be derived, such as whether the day is a workday or a holiday, the day of the week or weather information.
+The number of minutes driven ($\mathit{duration}$) is an additional trip information that is deduced from the difference between $\mathit{ts}_{\mathit{from}}$ and $\mathit{ts}_{\mathit{to}}$.
+Trips made by the staff, i.e. trips by employees with the purpose of relocating vehicles overnight, are excluded from the data.
+
+\paragraph{Actual utility.} Based on the historical data of a single day, the total utility $U_k(n)$ is computed for each cell $k$ given that there were $n$ cars available in that cell at the start of the day.
+It is insufficient to only count the utility of the trips from this cell to other cells, as the cars may have been used for subsequent trips.
+Based on this observation, the total utility of each vehicle is computed as the total time the vehicle has been rented during the day: 
+$$U(\mathit{id}_{\mathit{vehicle}}) = \sum_{t \in \mathit{trips}(\mathit{id}_{\mathit{vehicle}})} \mathit{utility}(t)$$
+where $\mathit{trips}(\mathit{id}_{\mathit{vehicle}})$ is the set of trips that have this vehicle ID and the utility $\mathit{utility}(t)$ of a trip $t$ is its duration.
+
+Let $\mathit{initial\_vehicles}(k)$ be the set of vehicle identifiers that were in a cell $k$ at the start of the day, with $|\mathit{initial\_vehicles}(k)|=n$.
+Then the \emph{actual utility} of a cell $k$ which had $n$ initial vehicles is measured as: 
+$$U_k(n) = \sum_{\mathit{id}_{\mathit{vehicle}} \in \mathit{initial\_vehicles}(k)} U(\mathit{id}_{\mathit{vehicle}})$$ 
+Note that this value can only be computed if $n = |\mathit{initial\_vehicles}(k)|$, i.e for a given day in a cell $k$ with $\mathit{initial\_vehicles}(k)$ it is not possible to know $U_k(x)$ if $x \neq |\mathit{initial\_vehicles}(k)|$.
+
+\begin{figure}[!p]
+    \centering
+
+    \includegraphics[width=0.8\linewidth]{figure/ch3_method_summary.jpeg}
+
+    \caption[Graphical summary of the method]{Overview of the approach: a regression model is trained for each cell to predict, from a historical dataset, the utility of each car (by order of decreasing utility), knowing both time and weather-related features. The optimization solver uses both the set of models learned to predict the utility of each car and the night car positions to return the best next morning car positions.}
+    \label{fig:method_summary}
+\end{figure}
+
+\paragraph{Estimated utility.} The goal now is to predict $U_k(n)$ for any cell $k$ and value $n$, without knowing which trips will be demanded and which vehicles will be used for them.
+This can be cast as a regression problem, where the training data are the actual utilities defined above.
+However, only the $U_k(n)$ for one $n$ per cell is observed per day, this makes a very sparse training set, unlikely to lead to a good regression model for arbitrary values of $n$.
+
+Thus the regression problem of $U_k(n)$ is decomposed, in order to have more training data and more refined estimates. 
+First the cars of a cell are ordered in a deterministic way, namely by daily decreasing utilization order (most used car in 1$^{st}$, second most used car in $2^{nd}$, etc.). 
+The rank of a car in this ordered list is denoted $i$, such that the 1$^{st}$ car in the order has a rank $i=1$.
+The gain made by adding a $i^{th}$ vehicle in the area can now be defined as $U^i_k = \mathit{utility}(\mathit{id}_{\mathit{vehicle}})$ with $id_{vehicle}$ the initial vehicle in cell $k$ with its rank $i$ as its rank in the ordered car list.
+It is done such that: $U_k(n) = \sum_{i \in 1..n} U^i_k$. 
+Based on this reformulation, the goal of predicting the \emph{gain of utility} $U^i_k$ brought by the $i$th car in a region $k$ is identified. 
+This decomposition of the utility value and the more fine-grained regression problem allows to have more training data, and thus to better estimate $U_k(n)$ for different values of $n$ (especially for $n < |initial\_vehicles(k)|$).
+
+\paragraph{Prediction}
+The objective is to predict $U^i_k$ for each cell $k$ and rank $i$ using the previously defined training labels, this prediction is called $\hat{U}^i_k$. 
+As the cars utilization in each cell has different behavior, one function $f_k$ is learned per cell.
+As features, the rank $i$ is used as well as a set $\mathit{fs}$ of features including weather features, temporal features and lagged observations of the previous days (as the data is sequential). 
+Hence, the aim is to learn a regression function $f_k$ per cell such that: 
+$$\hat{U}^i_k = f_k(i,\mathit{fs}).$$
+Regression methods to use are discussed in Section~\ref{sec:utility_experiment_regressors}.
+
+\paragraph{Clustering cells}
+Instead of learning one predictor $f_k$ per cell, similarities in utility behavior for different cells can also be used.
+By grouping cells, more data becomes available for every function that are going to be learned, and ideally more accurate predictions are made.
+Hence investigations are made on the effect of clustering the cells which behave similarly in terms of their car utility, and learn one function $f_c$ per cluster $c$.
+The two clustering approaches used during the experiments are detailed in Section~\ref{sec:utility_experiment_clustering}.
+
+\subsection{Utility Prediction Evaluation}
+\label{sec:utility_experiment}
+
+The evaluation of the prediction of the utility of cars is done for the three services detailed in Chapter~\ref{ch:data_analysis}, that is \emph{Madrid}, \emph{Paris} and \emph{Washington}.
+The aim of this evaluation is to assess which kind of regression model and which parameters should be used for the second part of the methodology to be detailed in Section~\ref{sec:ch3_fleet_relocation}.
+First is defined the three cell-clustering strategies that are going to be used during the experiment.
+Then the regression models to be used during the following experiments are detailed.
+In the third part is detailed the metrics that are going to measure the performance of the regression models.
+Finally, the case study on the three cities is exposed.
+
+\subsubsection{Cell clustering}
+\label{sec:utility_experiment_clustering}
+
+For this case study, three cell clustering strategies are compared.
+A clustering strategy is the grouping of cells together to train a single regression model for all the cells of each cluster, instead of learning one regression model per cell.
+The first strategy is a baseline strategy where no clustering is done at all (\emph{No Cluster}).
+In this strategy, each function $f_k$ is modeled independently for all the cells $k$ of the city grid.
+
+The second clustering strategy is based on a \emph{K-Means}~\cite{least_1982_lloyd} clustering where each cell is represented by a \emph{vector}.
+This vector is created such that the $i^{th}$ coordinate of the vector is the average utility (in minutes) of the car of rank $i$.
+The optimal number (for prediction purpose) of clusters is found using the validation set for each model and feature set.
+The utility of a cell $k$ becomes $U^i_k = f_c(i, \mathit{fs})$ where $f_c$ is the predicted utility of the cluster $c$ which contains the cell $k$.
+To compute the K-means distances, all vectors have the same fixed size which is the highest car ranked $i$ encountered in the dataset.
+This size is set to $i=20$ for the three datasets.
+If a cell has no average driven time for a rank $i$, the $i^{th}$ coordinate is filled with a $0$ in the corresponding vector.
+In the experiments, the Scikit-learn implementation~\cite{scikit_learn} of K-Means is used with the Euclidean distance.
+
+The third clustering strategy is based on the same \emph{vectors} of utility.
+However, this approach with a \emph{K-Medoids} uses a different distance computation strategy.
+First, the size of the vector describing each cell is the highest car rank encountered in this particular cell.
+Thus all the cell vectors do not have the same size. 
+The distance between two such vectors of different sizes $i$ and $i+h$ (with $h>=0$) is only computed on the first $i$ common car ranks between both vectors.
+It is the Euclidean distance between these two common parts divided by its length $i$.
+In the experiments, the Pyclustering~\cite{Novikov2019} implementation of the K-Medoids algorithm is used.
+
+
+\subsubsection{Utility Prediction Algorithms}
+\label{sec:utility_experiment_regressors}
+
+The performance of two baselines and three regression models is evaluated in this section: two baselines consisting of a \emph{Zero} predictor, the \emph{Mean} of historical values, \emph{Gradient Boosting Decision Tree} regressors (\emph{GBR})~\cite{GBT}, \emph{Support Vector Machine} regressors (\emph{SVR})~\cite{SVR} and linear \emph{ElasticNet} regressors~\cite{zou2005regularization} are considered.
+Two sets of features (\emph{Rank} and \emph{All}) were used to train the models.
+Other machine learning algorithms and learning strategies were tested but the results were unconvincing and are not presented here. 
+
+\paragraph{Features} The first set of features \emph{Rank} contains no exogenous features, such that only the rank $i$ of each car is used to train all the regression models. 
+The second set of features \emph{All} contains all the exogenous information available: the \emph{day of week}, whether this day is a \emph{workday or not}, \emph{weather information} and two \emph{historical features}.
+As detailed in Chapter~\ref{ch:data_analysis}, the weather information is obtained by pre-processing the hourly weather broadcasts data.
+To fit the daily prediction, an average view of the weather for each day has been created.
+For each day, one weather vector is then created containing the normalized values of the computed hourly averages (temperature, relative humidity, pressure, wind speed, cloud cover, rain and snow).
+Then these vectors are used in a clustering model to create $4$ different weather clusters, with the idea to associate utilization to general weather conditions rather than each individual weather information.
+For the regression methods, the cluster value is used as a feature in a one-hot encoding fashion. 
+Then, the two historical features are given for each cell $k$ with a given rank $i$: the last two (in time) utility values $f_k(i)$ known (not predicted) in the data.
+
+\paragraph{Models} The first baseline (\emph{Zero}) is returning the value \textquote{0} when the prediction of the utility of a car is asked.
+This baseline predictor draws the line that should not be crossed by a predictor, i.e. a predictor whose performance is worse than always predicting a null value is not of interest.
+The second baseline model (\emph{Mean}) is the average of the utility values computed for each cell, for a given rank $i$ for all days in the training set.
+It is the equivalent of what is used in the previous works presented in Chapter~\ref{ch:background}, that is taking into account only the average past customer utilization.
+These two baselines only use the feature set \emph{Rank}. 
+
+In the following experiment is used the regression variants of Scikit-learn implementations of the Gradient Boosting Tree (\emph{GBR}), of the Support Vector Machine (\emph{SVR}) and of the \emph{ElasticNet} with their default hyperparameters, other values have been tested without significant performance change.
+Each model directly predicts the utility of the car in its corresponding (group of) cell(s) for the two previously mentioned feature sets.
+Each attribute had been normalized by subtracting the average and dividing by the standard deviation.
+Experiments were not done with other well-known machine learning models (such as Neural Networks or Random Forests) because of the small number of attributes available to describe the data and the presence of the mandatory \emph{car rank} attribute.
+Mandatory attributes are not suitable for the Random Forests algorithm.
+
+Other machine learning algorithms and learning strategies were tested but gave unconvincing results. 
+This includes an alternative two-step approach: first a classification model was used to predict whether a car leaves the cell or not and, in case it does leave, a second model is used to predict the utility of this car.
+Since the first classification model was far from perfect (around 70\% accurate), it propagated a lot of errors in the second step and the whole two-step process obtained worse results than previously proposed current predicting strategy.
+The second approach tested did not use $i$ as a feature and directly predicted the whole vector of utility for each cell, i.e it used multi-target regression models. 
+These approaches both performed worse than the baseline model \emph{Mean}, and hence they are not reported in the following.
+It should also be noted that when a type of model is chosen, then all the (group of) cells has the same type of regressor to predict the utility.
+An approach selecting first the type of model to train for each cell, including \emph{Zero} and \emph{Mean}, has been tested but shown no better result than in the best case with all the (group of) cells using the same type of model.
+
+\subsubsection{Evaluation Metrics}
+
+The regression model performance is assessed through two different metrics, an absolute and a relative one.
+The first one, a \emph{Mean Absolute Error}~(MAE) is the average number of minutes either overestimated or underestimated by the trained regression models on each area for one day.
+In the following case study, the daily average of the MAE is computed such that the values reported in the tables are calculated by :
+
+$$ \mathit{MAE} = \frac{1}{|T|} \cdot \frac{1}{|K|}  \sum_{t \in T} \sum_{k \in K} \sum_{i \in I} |obs_{tki} - pred_{tki}| $$
+
+With $T$ the set of timestamps in the data, $obs_{tki}$ the real utility of the i$^{th}$ car in cell $k$ at timestamp $t$ and $pred_{tki}$ the predicted utility of the i$^{th}$ car in cell $k$ at timestamp $t$.
+
+Then as a relative measure to better understand the regression model performances, a ratio between the MAE and the average real driven time per cell of the same day is also used, its is called here the \emph{Ratio Mean Absolute Error}~(RMAE).
+As for the previous metric, the daily average of its values is reported such that the values are computed by : 
+
+$$\mathrm{RMAE} = \frac{1}{|T|} \cdot \sum_{t \in T} \frac{\frac{1}{|K|} \sum_{k \in K} \sum_{i \in I} |obs_{tki} - pred_{tki}|}{\frac{1}{|K|} \sum_{k \in K} \sum_{i \in I} obs_{tki}} $$
+
+\subsubsection{Case Studies}
+
+The evaluation of the utility prediction is done for the three services that are presented in Chapter~\ref{ch:background}.
+The objective is to evaluate how well can $f_k(i, \mathit{fs})$ be predicted for a cell $k$, a car rank $i$ and an exogenous feature set $\mathit{fs}$.
+This is done for the three regressions approaches (\emph{ElascticNet}, \emph{GBR} and \emph{SVR}), with their comparison to the two baselines (\emph{Zero} and \emph{Mean}) described earlier.
+
+\begin{table*}[!tbh]
+    \hspace*{-1.75em}
+    % \footnotesize
+    \small
+    \centering
+    \begin{tabular}{|l|l|m{1.8em} m{2.8em}|m{1.8em} m{2.8em}|m{1.8em} m{2.8em}|m{1.8em} m{2.8em}|m{1.8em} m{2.8em}|}
+    \hline
+    \multirow{3}{*}{Clustering} & \multirow{3}{*}{Features} & \multicolumn{10}{l|}{Madrid} \\ \cline{3-12} 
+        &  & \multicolumn{2}{l|}{Zero} & \multicolumn{2}{l|}{Mean} & \multicolumn{2}{l|}{ElasticNet} & \multicolumn{2}{l|}{GBR} & \multicolumn{2}{l|}{SVR} \\ \cline{3-12}
+        &  & MAE & RMAE & MAE & RMAE & MAE & RMAE & MAE & RMAE & MAE & RMAE \\ \hline
+    \multirow{2}{*}{No Cluster} & Rank & 775  & 100\% & 263 & 36\% & 276 & 38\% & 263 & 36\% & 273 & 37\% \\ %\cline{2-6} 
+        & All & N/A & N/A & N/A & N/A & 247 & 34\% & 245 & 33\% & 288 & 39\% \\ \hline
+    \multirow{2}{*}{KMeans} & Rank & \multirow{2}{*}{N/A} & \multirow{2}{*}{N/A} & 266 & 36\% & 280 & 38\% & 265 & 36\% & 266 & 36\% \\ %\cline{2-6} 
+        & All & & & N/A & N/A & 250 & 34\% & 240 & 33\% & 247 & 34\% \\ \hline
+    \multirow{2}{*}{KMedoids} & Rank & \multirow{2}{*}{N/A} & \multirow{2}{*}{N/A} & 266 & 36\% & 280 & 38\% & 266 & 36\% & 266 & 36\% \\ %\cline{2-6} 
+        & All & & & N/A & N/A & 249 & 34\% & \textbf{239} & \textbf{33\%} & 243 & 33\% \\ \hline
+    \hline
+    \multirow{3}{*}{Clustering} & \multirow{3}{*}{Features} & \multicolumn{10}{l|}{Paris} \\ \cline{3-12} 
+        &  & \multicolumn{2}{l|}{Zero} & \multicolumn{2}{l|}{Mean} & \multicolumn{2}{l|}{ElasticNet} & \multicolumn{2}{l|}{GBR} & \multicolumn{2}{l|}{SVR} \\ \cline{3-12}
+        &  & MAE & RMAE & MAE & RMAE & MAE & RMAE & MAE & RMAE & MAE & RMAE \\ \hline
+    \multirow{2}{*}{No Cluster} & Rank & 85 & 100\% & 69 & 86\% & 83 & 104\% & 64 & 77\% & 66 & 79\% \\ %\cline{2-6} 
+        & All & N/A & N/A & N/A & N/A & 75 & 92\% & 64 & 76\% & 73 & 86\% \\ \hline
+    \multirow{2}{*}{KMeans} & Rank & \multirow{2}{*}{N/A} & \multirow{2}{*}{N/A} & 69 & 86\% & 85 & 107\% & 64 & 77\% & 64 & 77\% \\ %\cline{2-6} 
+        & All & & & N/A & N/A & 77 & 94\% & \textbf{62} & \textbf{74\%} & 67 & 79\% \\ \hline
+    \multirow{2}{*}{KMedoids} & Rank & \multirow{2}{*}{N/A} & \multirow{2}{*}{N/A} & 71 & 89\% & 87 & 110\% & 65 & 78\% & 65 & 78\% \\ %\cline{2-6} 
+        & All & & & N/A & N/A & 78 & 96\% & 64 & 76\% & 67 & 79\% \\ \hline
+        \hline
+    \multirow{3}{*}{Clustering} & \multirow{3}{*}{Features} & \multicolumn{10}{l|}{Washington} \\ \cline{3-12} 
+        &  & \multicolumn{2}{l|}{Zero} & \multicolumn{2}{l|}{Mean} & \multicolumn{2}{l|}{ElasticNet} & \multicolumn{2}{l|}{GBR} & \multicolumn{2}{l|}{SVR} \\ \cline{3-12}
+        &  & MAE & RMAE & MAE & RMAE & MAE & RMAE & MAE & RMAE & MAE & RMAE \\ \hline
+    \multirow{2}{*}{No Cluster} & Rank & 198 & 100\% & 226 & 121\% & 252 & 136\% & \textbf{173} & \textbf{87\%} & 176 & 88\% \\ %\cline{2-6} 
+        & All & N/A & N/A & N/A & N/A & 250 & 132\% & 174 & 88\% & 187 & 94\% \\ \hline
+    \multirow{2}{*}{KMeans} & Rank & \multirow{2}{*}{N/A} & \multirow{2}{*}{N/A} & 227 & 122\% & 257 & 138\% & 175 & 88\% & 175 & 88\% \\ %\cline{2-6} 
+        & All & & & N/A & N/A & 249 & 130\% & \textbf{173} & \textbf{87\%} & 180 & 91\% \\ \hline
+    \multirow{2}{*}{KMedoids} & Rank & \multirow{2}{*}{N/A} & \multirow{2}{*}{N/A} & 233 & 125\% & 262 & 141\% & 181 & 92\% & 181 & 92\% \\ %\cline{2-6} 
+        & All & & & N/A & N/A & 251 & 131\% & 176 & 89\% & 180 & 91\% \\ \hline
+    \end{tabular}
+    \caption{MAE (lower is better) and RMAE performances (lower is better) of the car utility prediction per day and per (group of) cell(s). One table is used to described the results for one service. Each time, two baselines (\emph{Zero} and \emph{Mean}) and three regression models (\emph{ElasticNet}, \emph{GBR} and \emph{SVR}) with two sets of features (\emph{Rank} and \emph{All}) are tested.}
+    \label{tab:ch3_utility_regression}
+\end{table*}
+
+The columns of Table~\ref{tab:ch3_utility_regression} represent the regression models used, i.e the \emph{Zero} and historical \emph{Mean}, the \emph{ElasticNet}, the \emph{Gradient Boosting Tree} (\emph{GBR}) and the \emph{Support Vector Machine} (\emph{SVR}).
+The lines describe the results with or without clustering using the feature sets presented before. 
+MAE (in min) and RMAE (in \%) are related to the true number of minutes driven in each cell and day and the predicted number of minutes on the validation set.
+The performance of the baseline \emph{Zero} has been computed only once since its output does not depend on any feature, and would return the same results for each configuration.
+The historical \emph{Mean} performance has not been computed while using the feature set \emph{All} since its statistical value could be questionable.
+It should be noted that for all the combinations of settings, the training and evaluation of each regression models took less than 5 minutes.
+Each combination is thus compatible for a daily usage by an operational team in each service.
+
+\paragraph{Madrid.} 
+
+One can notice that with the feature set \emph{Rank}, the linear regression model gives one of the worst MAE on average without clustering ($276$ mins) and with any kind of clustering approach ($280$ mins).
+This shows the potential of more complex machine learning models and of our chosen exogenous features when there is enough data.
+
+Indeed, even if the models \emph{GBR} or \emph{SVR} do not perform better than the historical \emph{Mean} with only the feature set \emph{Rank}, the addition of the exogenous features from the feature set \emph{All} improves the prediction performance in most of the cases and leads to beat the historical \emph{Mean}.
+This improvement is augmented by regrouping cells together with the help of any kind of clustering approach, with \emph{K-Medoids} being slightly better than \emph{K-Means} for \emph{Madrid}.
+
+Globally it can be noticed that compared to the other services, the regression models make a higher absolute error in the case of \emph{Madrid}.
+However the historical utilization of this service is higher, with on average $775$ minutes of daily historical utility per cell.
+This high rate of absolute error should be mitigated by the relative measure (\emph{RMAE} in \%) indicating that this absolute error represents on average each day only $\sim$33\% of the average historical utility in the best case. 
+
+These results show that the configuration to choose for the regression approach is the \emph{GBR} regression model with a \emph{K-Medoids} clustering of the similar cells while using the feature set \emph{All}.
+This combination offers a $3\%$ reduction in RMAE compared to the baseline \emph{Mean}, as well as a reduction of $24$ minutes (on average per cell and per day) in the MAE.
+This setting will be used for \emph{Madrid} in the following experiments.
+
+\paragraph{Paris.}
+
+This dataset brings a bigger challenge to the prediction algorithm than the first one. 
+The first reason is that the number of cars is lower in this service ($475$ compared to $578$ in \emph{Madrid}) while the number of cells covered by the service is slightly larger ($209$ compared to $155$ in \emph{Madrid}), thus the spatial density of cars is lower.
+Moreover the global utilization of the service itself is lower than for \emph{Madrid}, with a daily historical utilization on average of $85$ minutes per cell, with a utility for each car often (at a rate of $55\%$) being null in the dataset.
+
+This could explain the relatively bad performance of the historical \emph{Mean} baseline ($\sim$69 min of MAE which corresponds to almost $86\%$ of the average per day and cell of the historical minutes driven) compared to the prediction of the machine learning models in Table~\ref{tab:ch3_utility_regression}.
+For this dataset, the \emph{GBR} algorithm performs better than \emph{SVR}. As for the first dataset, the results are better when clustering the cells except for the \emph{SVR} model.
+
+It can be noted that with or without clustering and with the \emph{Rank} feature set, the linear regression model \emph{ElasticNet} performs badly with a relative error superior to $100\%$.
+For those settings, \emph{ElascticNet} is worse than using the model \emph{Zero} which only predict $0$ for each utility, which would have on average a MAE of $86$ minutes per day and per cell and on average a RMAE of $100\%$ per day.
+
+The best prediction results provide a MAE of $\sim$62 minutes which correspond to $74\%$ of the average per day and cell of the total minutes driven for the setting using the \emph{K-Means} clustering approach with the feature set \emph{All} and a \emph{GBR} regression model.
+As it will be detailed in the next section, these far from perfect prediction results, allow us to improve the overall service anyway which gives hope for a much larger margin of improvement.
+
+\paragraph{Washington.}
+
+As for \emph{Paris}, this dataset brings a bigger challenge for the regression models too.
+Indeed while the number of cars is around the same as in \emph{Madrid} with $600$ cars, the number of cells covered by the service is twice as much, with $411$ cells for \emph{Washington} against $155$ cells for \emph{Madrid}. 
+Thus the density of cars is halved compared to the first dataset.
+Besides, while on average the utility in each cell ($198$ minutes) is higher than for \emph{Paris}, it is explained by the facts that each individual trip is on average longer than those in the \emph{Paris} and that on average half of the cars have a daily utility of zero.
+Thus the regression models need to take into account a higher variance of utilities in the dataset.
+
+As for \emph{Paris}, this explains the bad performance of the \emph{Mean} baseline which has on average has a RMAE of $121\%$ in the best setting, with \emph{No Cluster} and the feature set \emph{Rank}.
+As for the two previous datasets, the linear regression model \emph{ElasticNet} is not capable of modeling the utility better than the baseline.
+It means that using the baseline \emph{Mean} to predict the values of the utility for the cars is worse than simply predicting zeros.
+However both the models \emph{GBR} a \emph{SVR} perform better than the baseline and with any setting stays under a RMAE of $100\%$.
+
+Even if the utility is difficult to model in the case of \emph{Washington}, for the following experiments we select the setting using the \emph{K-Means} clustering approach with the feature set \emph{All} and the regression model \emph{GBR}.
+This setting performs the best with a MAE of $173$ minutes, reducing the MAE by $53$ minutes ($34$ \%) compared to the \emph{Mean} baseline.
+
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION - SECTION %
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Fleet Relocation}
+\label{sec:ch3_fleet_relocation}
+
+The second part of the methodology to relocate the vehicles is the optimization of the placement of the fleet.
+It should take into account the main constraint that the number of relocation possible is limited by the number of employees, also called jockeys, and the time and cost needed to do each relocation.
+Additionally, this means that the previous placement of the vehicles has to be taken into account too.
+This is done by using an \emph{Integer Linear Programming} formulation that is given to a solver in order to find the best placement of the cars.
+In order to design this optimization formulation, a first and simpler proposition has been formulated on the optimization of the vehicle position.
+Then its improvement to be able to take into account the jockey capabilities and the previous fleet status is detailed.
+A greedy algorithm to solve the formulation, and to act as a baseline, is also exposed.
+Lastly the evaluation, in terms of revenues, of the impact of the fleet relocation is done on the historical dataset.
+The aim is to compare the proposed method, the thesis contribution, against a greedy baseline and the historical revenues.
+
+\subsection{Naive Relocation}
+
+As a first approach, the car relocation problem boils down to determining the best placement of the fleet over the different areas.
+Indeed given $\hat{U}^i_k$, the predicted utilization of the $i$th car located in the cell $k$, and a bound $\Gamma$ on the number of cars available, 
+the determination of the best placed of the $\Gamma$ cars over the different cells can be formulated.
+Note that if $\hat{U}^i_k$ would be linear in $i$, e.g. $\hat{U}^i_k = i*w$ for some value $w$, then the relocation problem would be an unbounded knapsack problem with capacity $\Gamma$, where each cell is one item.
+However, since this simple linearity assumption does not hold, a different formulation is used to model the relocation problem.
+More specifically, an \emph{Integer Linear Programming} (\emph{ILP}) is used to model it as an assignment problem of vehicles to cells, where the total capacity $\Gamma$ and the order of the vehicles in each cell must be respected. 
+
+Boolean indicator variables $V_{ki}$ are used to represent whether there should be an $i$th vehicle in cell $k$ in the next morning.
+The total number of vehicles in a cell $k$ is hence $\sum_i V_{ki}$.
+Let $K$ be the set of all cell identifiers defined earlier.
+Let $m$ be an upper bound on the number of vehicles allowed in one cell and $I=\{1..m\}$.
+Let $V=(V_{ki})_{k \in K, i \in I}$ be the matrix of indicator variables whose value is searched with a solver.
+The \emph{ILP} formulation of the problem is then:
+
+\begin{align}
+    \underset{V}{maximize}\quad &\sum_{k \in K} \sum_{i \in I} \hat{U}_k^i \cdot V_{ki} \label{eq:objectiveNoRelocCost}\\
+          s.t.\quad &\sum_{k \in K} \sum_{i \in I} V_{ki} \leq \Gamma \label{eq:constraintSimpleCapacity}\\
+                    &V_{ki} \geq V_{ki+1} \quad &&\forall k \in K, i \in I\setminus\{m\} \label{eq:constraintSimpleGap} \\
+                    &V_{ki} \in \{0,1\} &&\forall k \in K, i \in I
+\end{align}
+
+The objective function of this problem is Equation~\ref{eq:objectiveNoRelocCost}, stating that the values of $V$ are searched to maximize the sum of the utility of the placed vehicles.
+If the indicator variable $V_{ki}$ is equal to $1$ than the value of $\hat{U}_k^i$ is taken into account in the sum, else it is not since no vehicle would be placed in cell $k$ at rank $i$.
+Equation~\eqref{eq:constraintSimpleCapacity} is the capacity constraint, meaning that no more than $\Gamma$ cars can be placed in the city.
+Equation~\eqref{eq:constraintSimpleGap} ensures that there is no `gap' in the rank of the indicator variables, that is, if $V_{ki} = 0$ then any $j > i$ has to be $0$ too.
+For example, if there is no 3rd vehicle in a cell, there is also no 4th, 5th, etc. vehicle.
+
+\subsection{Jockey-based Relocation}
+\label{sec:ch3_jockey_relocation}
+
+The above formulation computes the optimal placement but does not consider the fact that 1) at the end of the day, the vehicles are already located in certain cells, and 2) cars need to be relocated by jockeys, raising the need to take into account their limited capacity and the associated costs of each relocation.
+
+To address 1), $s_{ki} \in \{0,1\}$ denotes whether there is an $i$th vehicle in cell $k$ at the end of the previous day.
+To address 2), a limit $\gamma$ is included to limit the total number of possible car relocations that the jockeys can do.
+The number of ranks that are used in the formulation, the upper limit of $I$ called $m$, is redefined as $m = \mathit{max}(n, \nu)$, with $n$ the maximum number of cars observed in a single cell before relocation and $\nu$ a chosen upper limit to the number of cars to be placed inside each cell after relocation.
+This is done to take into account both a limit in the number of cars to be placed in each area $\nu$ and the maximum number of cars that are actually present in all cells before relocation $n$.
+Furthermore jockeys cannot teleport; hence they must be moved by a so-called `sweeper' car to the next car after dropping off a relocated vehicle.
+Operationally these sweeper trips should be minimized, $q_{kl}$ is introduced as the cost of a jockey being moved by a sweeper car from cell $k$ to cell $l$.
+A price $p$ per minute paid by users renting a car is also defined so that revenues from cars being rented and costs from the relocation can be balanced off.
+
+The goal now is to find the relocation that leads to the largest possible expected revenue minus the associated jockey sweeping costs, while being to perform at most $\gamma$ relocations.
+To model this as an \emph{Integer Linear Programming} problem, two sets of Boolean variables are used: $F_{ki}$ denotes that the $i$th car is relocated away \emph{from} cell $k$, while $T_{ki}$ denotes that a car is relocated \emph{to} cell $k$ to be its $i$th car. 
+Using these binary variables, whether or not there is a car at the $i$th position in cell $k$ is computed by $s_{ki} - F_{ki} + T_{ki}$, which replaces $V_{ki}$ from the previous formulation.
+Finally, $E_{kl}$ represents the number of sweeper trips necessary between cell $k$ and $l$, with $q_{kl}$ the cost of one such sweeper trip.
+
+Plugging that into the above optimization problem and adding appropriate extra constraints leads to:
+
+\begin{align}
+    \underset{F,T,E}{argmax}\quad &\sum_{k \in K} \sum_{i \in I} p \cdot \hat{U}_{ki} \cdot (s_{ki} - F_{ki} + T_{ki}) - \sum_{k \in K} \sum_{l \in K}  q_{kl} \cdot E_{kl}  \label{eq:relocObjective}\\
+    % minimize_{E}\quad &\sum_{k \in K} \sum_{i \in I} \sum_{l \in K} \sum_{j \in I} E_{kilj} \cdot d_{kl} \label{eq:minDistance}\\
+          s.t.\quad &s_{ki} - F_{ki} \geq 0 &&\hspace{-3.5em} \forall k \in K, i \in I \label{eq:relocDontRemoveNone}\\ % don't remove if not present
+                    &s_{ki} + T_{ki} \leq 1 &&\hspace{-3.5em} \forall k \in K, i \in I \label{eq:relocDontAddTooMany}\\ % don't add if already present
+                    &\sum_{k \in K} \sum_{i \in I} T_{ki} = \sum_{k \in K} \sum_{i \in I} F_{ki} \label{eq:relocAddSameRemove}\\
+                    &\sum_{k \in K} \sum_{i \in I} T_{ki} \leq \gamma, \sum_{k \in K} \sum_{i \in I} F_{ki} \leq \gamma \label{eq:relocLimitBudget}\\
+                    &(s_{ki} - F_{ki} + T_{ki}) \geq (s_{ki+1} - F_{ki+1} + T_{ki+1}) &&\hspace{-3.5em} \forall k \in K, i \in I\setminus\{m\} \label{eq:relocGap} \\
+                    &\sum_{i \in I} (s_{ki} - F_{ki} + T_{ki}) \leq \nu &&\hspace{-3.5em} \forall k \in K \label{eq:maxCarPerCell} \\
+                    &\sum_{l \in K} E_{kl} = \sum_{j \in I} T_{kj}&&\hspace{-3.5em} \forall k \in K \label{eq:outgoingJockey}\\
+                    &\sum_{k \in K} E_{kl} = \sum_{i \in I} F_{li} &&\hspace{-3.5em} \forall l \in K \label{eq:incomingJockey}\\
+                    &F_{ki} \in \{0,1\}, T_{ki} \in \{0,1\} &&\hspace{-3.5em} \forall k \in K, i \in I \\
+                    &E_{kl} \in \{0, \dots, |I|\} &&\hspace{-3.5em} \forall k, l \in K
+\end{align}
+
+The objective function in Equation~\eqref{eq:relocObjective} maximizes the total expected revenue from the utility over the computed car placement while taking into account the costs to reposition jockey between cars to relocate.
+Equation~\eqref{eq:relocDontRemoveNone} states that only vehicles that are present at the cell and rank coordinate at the end of the day can be moved away overnight, while Equation~\eqref{eq:relocDontAddTooMany} states that if there is already a vehicle present at this cell and rank coordinate then another vehicle cannot be moved to the location.
+These two constraints effectively state that depending on $s_{ki}$ either $F_{ki}$ or $T_{ki}$ has to be 0. 
+Equation~\eqref{eq:relocAddSameRemove} states that the number of pickups and drop-offs has to be the same, while Equation~\eqref{eq:relocLimitBudget} ensures that both the number of pickups and drop-offs is lower than the relocation budget $\gamma$.
+Equation~\eqref{eq:relocGap} ensures that there are no ``gap'' in the final assignment, obtained by substituting $s_{ki} - F_{ki} + T_{ki}$ into Equation~\eqref{eq:constraintSimpleGap} from the previous formulation.
+The constraint proposed in Equation~\eqref{eq:maxCarPerCell} makes sure that it is not possible to have more than $\nu$ cars in each cell after the relocation process.
+However it should be noted that it may make the problem infeasible with low values for $\gamma$ (provided as a constant beforehand), i.e too many cars can be already present above the $\nu$ limit and there may not be enough relocations available to move they away.
+Finally Equation~\eqref{eq:outgoingJockey} makes sure that each time a car is relocated \emph{to} $k$, then a sweeper car needs to pick up the jockey.
+Equation~\eqref{eq:incomingJockey} assures that there is a sweeper car dropping of a jockey for each car relocated \emph{from} $l$.
+
+During the experiments, in Section~\ref{sec:ch3_relocation_evaluation} a variant of this formulation is explored, it consists in replacing the objective function with Equation~\eqref{eq:relocObjectiveVariant1} and Equation~\eqref{eq:relocObjectiveVariant2}.
+In this variant, $F$ and $T$ are first searched to maximize Equation~\eqref{eq:relocObjectiveVariant1}, then without changing the values of $F$ and $T$ the values of $E$ are searched to minimize Equation~\eqref{eq:relocObjectiveVariant2}:
+
+\begin{align}
+    \underset{F,T}{argmax} &\sum_{k \in K} \sum_{i \in I} p \cdot \hat{U}_{ki} \cdot (s_{ki} - F_{ki} + T_{ki}) \label{eq:relocObjectiveVariant1}\\
+    \text{then }\underset{E}{argmin} &\sum_{k \in K} \sum_{l \in K}  q_{kl} \cdot E_{kl} \label{eq:relocObjectiveVariant2}
+\end{align}
+
+
+\subsection{Greedy Relocation}
+\label{sec:ch3_greed_relocation}
+
+The \emph{Integer Linear Programming} approach proposed above is compared against a baseline to assess the gain made by it.
+For this baseline, a greedy algorithm approach has been adopted and propose a resolution to guarantee that the constraints previously presented \emph{Integer Linear Programming} are respected too.
+
+The greedy approach takes the same input as the formulation described previously in Section~\ref{sec:ch3_jockey_relocation} and produces the same kind of output as expected from this formulation.
+For the inputs, the value of $s_{ki}$ for all cells $k$ and ranks $i$ are given in a matrix $s$.
+The utility $U^i_k$ of all combinations of cell $k$ and rank $i$ are in the matrix $U$.
+Two constants $\nu$ and $\gamma$ are similar to the ones described in the previous formulation, i.e. the first represent the maximum number of cars that should be present in each cell and the second the maximum number of relocation that can be made.
+The greedy algorithm outputs three matrices similar to the matrices of variables returned by the previous formulation.
+The matrix $F$ represents whether or not a car is taken from the cell $k$ at rank $i$ for all cells and ranks possible.
+The matrix $T$ represents whether or not a car is added to the cell $k$ at rank $i$.
+The matrix $E$ denotes the number of jockey being moved by a sweeper car between every combination of two cells. 
+A summary of the inputs required by the greedy algorithm and its output is described in Table~\ref{emphtab:ch3_summary_io_greedy}.
+
+\begin{table}[bth]
+    \centering
+    \small
+        \begin{tabular}{c p{300px}}
+        \hline
+        Input Name & Description \\ \hline
+        $s$ & Matrix ($|K|$ rows, $|I|$ columns), placement of cars before relocation. \\
+        $U$ & Matrix ($|K|$ rows, $|I|$ columns), predicted utility of all cells and rank combinations. \\
+        $\nu$ & Maximum number of cars per cell after relocation. \\
+        $\gamma$ & Number of authorized relocation \\ \hline
+        \hline
+        Output Name & Description \\ \hline
+        $F$ & Matrix ($|K|$ rows, $|I|$ columns), car removal indicator in cell $k$ rank $i$. \\
+        $T$ & Matrix ($|K|$ rows, $|I|$ columns), car insertion indicator in cell $k$ rank $i$. \\
+        $E$ & Matrix ($|K|$ rows, $|K|$ columns), number of jockey moved between two cells. \\ \hline
+        \end{tabular}
+    \caption{Summary of the inputs, matrix of utility and matrix of car placements with two additional constants, expected by the greedy algorithm described in Algorithm~\ref{alg:ch3_greedy_algorithm} and its outputs, three matrices describing the actions the staff need to do.}
+    \label{emphtab:ch3_summary_io_greedy}
+\end{table}
+    
+The greedy approach described in Algorithm~\ref{alg:ch3_greedy_algorithm} is based on two lists of ``cars coordinates'' (a couple $(\mathit{cell}, \mathit{rank})$).
+The first is called $\mathit{MC}$ (Movable Cars) to keep track of cars that could be moved represented by a list of couples $(\mathit{cell}, \mathit{rank})$.
+The second one is called $\mathit{FP}$ (Free Places) to keep track of the free places where cars could be put represented by a list of couples $(\mathit{cell}, \mathit{rank})$ too.
+Those two lists are first initialized and filled during the initialization part and then are used to set the values of matrices $F$, $T$, $E$ during the resolution part.
+
+% TOKEEP: Check that to know how to split algorithm block in multiple parts: https://tex.stackexchange.com/questions/18949/algorithm2e-split-over-several-pages
+\begin{algorithm}[!p]
+    \footnotesize
+    \LinesNumbered
+    \DontPrintSemicolon
+    % \setcounter{AlgoLine}{10}
+    
+    Set $\mathit{MC}$ to EmptyList~~\tcp{List of Movable Cars.}
+    Set $\mathit{FP}$ to EmptyList~~\tcp{List of Free Positions.}
+    Set $\mathit{nr}$ to 0.~~\tcp{Number of Relocation done.}
+    
+    \tcc{Initialize $\mathit{MC}$, $\mathit{PC}$ and $\mathit{FP}$ with relevant info from $\mathit{S}$.}
+    \ForEach{cell $k$ in $K$}{
+        Set $\mathit{rankCar}$ to highest rank from cars in $k$\;
+        Set $\mathit{rankFree}$ to $\mathit{rankCar} + 1$\;
+        \If{$0 \leq \mathit{rankCar}$}{
+            Add ($k$, $\mathit{rankCar}$) to $\mathit{MC}$
+        }
+        \If{$\mathit{rankFree} < \nu$}{
+            Add ($k$, $\mathit{rankFree}$) to $\mathit{FP}$
+        }
+    }
+    
+    \tcc{Try to move as much car as possible, by first moving cars with a rank too high first and then move the cars with lowest utility.}
+    \While{$nr < \gamma$ and $\mathit{MC}$ not empty}{
+        Set ($\mathit{carCell}$, $\mathit{carRank}$) to pair (cell, rank) of highest car rank from cars in $\mathit{MC}$\;
+        \If{$\mathit{carRank} < \nu$}{
+            \tcp{If there is no car to remove in priority, then move the one with lowest utility.}
+            Set ($\mathit{carCell}$, $\mathit{carRank}$) to pair (cell, rank) of lowest car predicted utility from cars in $\mathit{MC}$\;
+        }
+        Set $\mathit{carUtility}$ to $U[\mathit{carCell}, \mathit{carRank}]$\;
+        Set ($\mathit{destCell}$, $\mathit{destRank}$, $\mathit{destProfit}$) to tuple (cell, rank, profit) from the best profitable move possible of car ($\mathit{carCell}$, $\mathit{carRank}$, $\mathit{carUtility}$) to any location in $\mathit{FP}$ while taking into account relocation costs\;
+        Remove ($\mathit{carCell}$, $\mathit{carRank}$) from $\mathit{MC}$\;
+        \If{$\nu \leq \mathit{carRank}$ or $\mathit{destProfit} > 0$}{
+            Remove ($\mathit{destCell}$, $\mathit{destRank}$) from $\mathit{FP}$\;
+            Remove ($\mathit{carCell}$, $\mathit{carRank} + 1$) from $\mathit{FP}$\;
+            \If{$\mathit{destRank} < \nu - 1$}{
+                Add ($\mathit{destCell}$, $\mathit{destRank} + 1$) to $\mathit{FP}$\;
+            }
+            \If{$\mathit{carRank} > 0$}{
+                Add ($\mathit{carCell}$, $\mathit{carRank} - 1$) to $\mathit{MC}$\;
+            }
+            Add 1 to $\mathit{nr}$\;
+            Set $F[\mathit{carCell}, \mathit{carRank}]$ to 1\;
+            Set $T[\mathit{destCell}, \mathit{destRank}]$ to 1\;
+            Add 1 to $E[\mathit{destCell}, \mathit{carCell}]$\;
+        }
+        
+    }
+    Return F, T, E
+    
+    \caption{Greedy Algorithm}
+    \label{alg:ch3_greedy_algorithm}
+\end{algorithm}
+        
+The first part of the greedy algorithm initializes $\mathit{MC}$ by adding to it the ``coordinate'' of the car with the highest rank for each cell $k$ and in the same time initialize $\mathit{FP}$ by adding to it the ``coordinate'' of the lowest rank without a car for each cell $k$.
+Thus $\mathit{MC}$ and $\mathit{FP}$ are initialized in order to help keep the constraints Equation~\eqref{eq:relocGap} and Equation~\eqref{eq:maxCarPerCell} valid during the whole process.
+Then the second part of the greedy algorithm will try to move at most $\gamma$ cars.
+First, the car of highest rank in $\mathit{MC}$ is taken from this list, if its rank is superior to the limit $\nu$ then it is a car to remove in priority from its cell to satisfy constraint Equation~\eqref{eq:maxCarPerCell}.
+If not, then the car with the lowest expected utility is selected from $\mathit{MC}$.
+Once a car to move has been chosen, the best destination is chosen, an empty space, in $\mathit{FP}$ to place this car, i.e find where it is possible to maximize the gain of expected utility while taking into account the cost induced by this relocation.
+Since all the moves for this car are explored, it is removed from the list of movable cars $\mathit{MC}$.
+If this car is to be moved in priority, i.e its rank is higher than the maximum $\nu$, or is profitable to relocate then a relocation will be done; otherwise nothing is done. 
+In the case of a relocation, the free places list $\mathit{FP}$ is first updated by removing the ``coordinate'' of the destination where to place the car from it and by removing from $\mathit{FP}$ the empty space from the cell where the car comes from.
+This is done to respect the constraint Equation~\eqref{eq:relocGap}.
+Then if it would be still possible to place a car in the destination cell without going over the $\nu$ limit, the new lowest rank without a car is added to the list of free positions $\mathit{FP}$.
+The last list update adds the car with the new highest rank from the source cell to $\mathit{MC}$ if there is still a car in the source cell once the previously selected car has been moved. 
+Finally, the counter $\mathit{nr}$ is updated to add 1 to the number of relocation made and the output matrices to represent the actions done are updated.
+Once $\gamma$ moves have been made or if there are no cars to move anymore, the matrices $F$, $T$ and $E$ are returned.
+
+
+\subsection{Relocation Strategy Evaluation}
+\label{sec:ch3_relocation_evaluation}
+
+In this section four \emph{strategies} see their performance being evaluated.
+First, the methodology proposed in Section~\ref{sec:ch3_jockey_relocation} is studied.
+Then its performance is compared to two baselines: the historical revenue of the service and the greedy algorithm from Section~\ref{sec:ch3_greed_relocation}.
+A variant of the proposed methodology is also studied to evaluate the impact of optimizing the fleet distribution before optimizing the jockey relocations.
+The evaluation of those relocation methods is detailed, the comparison between the results returned by the relocation method and the historical revenue of the service is not trivial.
+Those evaluations are made on the city services, \emph{Madrid}, \emph{Paris} and \emph{Washington}.
+
+\subsubsection{Relocation Strategies}
+
+In order to evaluate the \emph{Integer Linear Programming} approach to relocate the cars of the fleet to particular city cells in the morning, as described previously, the comparison between the expected revenue of four different strategies is made.
+The first strategy is a ``naïve'' baseline relying on historical data, it is denoted by \emph{Historical}.
+It is based on the historical fleet position every morning, and its true utilization during the day. 
+The corresponding revenue is the number of driven minutes multiplied by the price paid by users for car renting.
+The other baseline called \emph{Greedy} is more advanced and uses the greedy algorithm described in Section~\ref{sec:ch3_greed_relocation} to maximizes the car utilities while taking into account the costs to move the jockeys around the city.
+Then the third strategy \emph{Optim DO} is the variant of the method described in Section~\ref{sec:ch3_jockey_relocation} and denotes a two steps (Double) Optimization strategy: the first one maximizes the fleet utility without taking into account relocation costs while the second one, after fixing the car location found, minimizes the jockeys' repositioning costs.
+Finally the last strategy denoted by \emph{Optim SO} and described in Section \ref{sec:ch3_jockey_relocation} is evaluated.
+It uses a joint (Single) Objective.
+This objective maximizes the service utilization revenue minus the costs to reposition the jockeys. 
+
+\subsubsection{Utility Estimation}
+
+The ground truth utility used to evaluate the revenue earned by the service with the non-historical approaches is a mix between the historical utility and the predicted one.
+When the \emph{Greedy} algorithm or the \emph{Integer Linear Programming} solver proposes a relocation solution different from the historical one, the predicted utility is used for the cars for which this value is unknown.
+For example, if the \emph{Greedy} algorithm proposes to place 5 cars in cell $k$ and only 4 cars have been placed there historically, then the evaluation takes into account the historical utility for cars of rank 1 to 4 and the predicted utility for rank 5.
+
+For all the strategies, it is assumed that the price $p$ paid by the user per driven minutes is constant.
+Thus, the fleet expected revenue is the total utility multiplied by $p$.
+The sweeper car cost $q_{kl}$, described in Section~\ref{sec:ch3_jockey_relocation}, is computed given the gross hourly salary of a jockey $c_j$ which has to be doubled (during a jockey repositioning from $k$ to $l$ there are two jockeys in the sweeper car: the one that needs to be repositioned and the driver of the sweeper car), the cost of running a sweeper car per km $c_s$, the average speed (in km/h) of a car inside the city $s$ and the distance $d_{kl}$ (in km) between the cell $k$ where a jockey is picked up by the sweeper car and the cell $l$ where he/she is dropped off :
+
+    $$
+    q_{kl} = \frac{d_{kl} \cdot (2 \cdot c_j)}{s} + d_{kl} \cdot c_s 
+    $$
+
+Furthermore, an additional hypothesis is made that there are at most 70 relocations each night, so $\gamma = 70$, which corresponds to 7 jockeys that can do 10 relocations on average per night.
+
+\subsubsection{Case Studies}
+
+The evaluation of the proposed \emph{ILP} formulation is done for the three services that are presented in Chapter~\ref{ch:background}.
+The objective is to evaluate the revenue that the service can expect by applying the proposed relocations to its fleet.
+This is done for the \emph{Greedy} baseline, the proposed \emph{ILP} formulation \emph{Optim SO} and its variant \emph{Optim DO}.
+
+From now on \emph{solver car assignment} denotes the optimal solution to the optimization problem found by the proprietary solver Gurobi. 
+The utility (number of driven minutes) is estimated using the best regression model found in the previous experiment.
+Thus in the case of \emph{Madrid}, a \emph{K-Medoids} clustering of the city cells is followed by a \emph{GBR} regressor, using the feature set \emph{All}.
+In the case of \emph{Paris}, a \emph{K-Means} clustering is applied followed by a \emph{GBR} regressor, using the feature set \emph{All}.
+Finally, for \emph{Washington}, we chose a \emph{K-Means} approach with a \emph{GBR} regression model using the feature set \emph{All}.
+
+For all approaches, the total daily revenue, costs included, expected by the three services are computed.
+These are plotted in Figure~\ref{fig:ch3_optim_madrid} for \emph{Madrid}, in Figure~\ref{fig:ch3_optim_paris} for \emph{Paris} and in Figure~\ref{fig:ch3_optim_washington} for \emph{Washington}.
+In those figures, one plot is associated with one week of the test set, one point being associated with the revenue of a particular relocation strategy for one day.
+For each day, the \emph{Historical} baseline has a base value of 100 to help make a relative comparison to the other strategies.
+Besides, the averages per day are reported for the three datasets in Table \ref{tab:ch3_optim_table}.
+It is expected that \emph{Optim SO} should give a larger revenue than \emph{Optim DO} since the one-step strategy can take into account the costs while relocating the car which should help reduce the number of unnecessary relocations costs.
+Furthermore, it is expected that the \emph{Greedy} approach should yield a lower revenue than \emph{Optim SO} because of the sub-optimal solution found by the greedy algorithm versus an optimal solution found by a \emph{Integer Linear Programming} solver.
+
+It should be noted that for the three services, all strategies converged to a solution in less than 5 minutes each, making the three strategies usable by an operational team for a real service.
+Furthermore with the \emph{ILP} formulation given in Section~\ref{sec:ch3_fleet_relocation} in the case of strategies \emph{Optim SO} and \emph{Optim DO}, the number of decision variables used is estimated to be $|K|^2 + 2 \cdot (|K| \cdot |I|)$ and the number of constraints is estimated to be $3 \cdot (|K| \cdot |I|) + 3 \cdot |K| + 3$.
+Thus in the case of \emph{Madrid}, for a day in the worst-case scenario with $|K| = 155$ cells and a maximum car rank $|I| = 30$ encountered in the data, the solver has to find the values of $33\,325$ decisions variables with the help of $14\,418$ constraints.
+For \emph{Paris} with $209$ cells and a maximum observed rank of $51$, in the worst-case scenario the solver had to find the values of $64\,999$ decision variables with the help of $32\,607$ constraints.
+Finally, for \emph{Washington} with $411$ cells and a maximum observed rank of $26$, the worst-case scenario implies the use of $190\,293$ decision variables and $33\,294$ constraints.
+
+\begin{table}[!b]
+    \centering
+    \small
+        \begin{tabular}{|l|l|l|l|}
+        \hline
+        \multirow{5}{*}{Madrid} & Strategy & Revenue & Nb Relocation \\ \cline{2-4} 
+            & Historical & 100 & N/A \\ \cline{2-4} 
+            & Greedy & 106 & 70 \\ \cline{2-4} 
+            & Optim DO & 105.9 & 70 \\ \cline{2-4} 
+            & Optim SO & \textbf{106.8} & 70 \\ \hline
+        \hline
+        \multirow{5}{*}{Paris} & Strategy & Revenue & Nb Relocation \\ \cline{2-4} 
+            & Historical & 100 & N/A \\ \cline{2-4} 
+            & Greedy & 101.8 & 62 \\ \cline{2-4} 
+            & Optim DO & 88.4 & 70 \\ \cline{2-4} 
+            & Optim SO & \textbf{105.1} & 62 \\ \hline
+        \hline
+        \multirow{5}{*}{Washington} & Strategy & Revenue & Nb Relocation \\ \cline{2-4} 
+            & Historical & 100 & N/A \\ \cline{2-4} 
+            & Greedy & 99.6 & 70 \\ \cline{2-4} 
+            & Optim DO & 95.1 & 70 \\ \cline{2-4} 
+            & Optim SO & \textbf{102.9} & 70 \\ \hline
+        \end{tabular}
+    \caption{Daily average revenue and daily average number of relocations for four relocation strategy for the three services.}
+    \label{tab:ch3_optim_table}
+\end{table}
+
+\paragraph{Madrid.} 
+
+Table~\ref{tab:ch3_optim_table} shows that for this service, on average for each day all the optimization strategies increase the revenue of the company. 
+On average for each day, approaches \emph{Greedy} and \emph{Optim DO} show a potential increase of 6\% and 5.9\% respectively while the proposed approach \emph{Optim SO} makes a potential increase of 6.8\%.
+Even if the difference between \emph{Greedy} and \emph{Optim SO} seems low, it should be noted that as stated earlier \emph{Madrid} is heavily used and a sole increase of 0.8\% makes a noticeable difference for the service operator.
+Furthermore, it explains better why the three approaches have comparable performances, since the service is heavily used the costs induced by employing jockeys to relocate the cars are negligible compared to the possible revenue increase.
+Besides, the three strategies use the jockeys to the maximum, meaning that it may even be possible to relocate other cars and be still profitable by employing more jockeys.
+
+\begin{figure}[!t]
+    \centering
+    \includegraphics[width=1\textwidth]{figure/ch3_optim_madrid.jpeg}
+    \caption{Daily revenue expected by \emph{Madrid} (one x-axis tick is a day, one plot per week of the test set), with the \emph{Historical} strategy (dashed blue line) as the base index. The red full line, the dashed yellow and dotted mauve lines are respectively the normalized revenues provided with \emph{Optim SO}, \emph{Optim DO} and \emph{Greedy} strategies.}
+    \label{fig:ch3_optim_madrid}
+\end{figure}
+
+For further details, Figure~\ref{fig:ch3_optim_madrid} shows four plots for the four weeks in the test set.
+As stated in Table~\ref{tab:ch3_optim_table}, the three approaches have almost similar performances.
+It can be noted that the approach \emph{Optim SO} manages to stay above an improvement of 5\% for more than 75\% of the days in the test set.
+Besides, that approach is under the \emph{Historical} baseline only for one day (2018-09-22) over the four weeks in the test set.
+
+\paragraph{Paris.}
+
+The results for this service are shown in Table~\ref{tab:ch3_optim_table}.
+The average daily revenue should both be increased by the use of the \emph{Greedy} and \emph{Optim SO} approaches, by respectively 1.8\% and 5.1\%, while the approach \emph{Optim DO} shows a potential daily average \emph{decrease} of 11.6\%.
+These results are coherent with the average daily number of relocation made : both \emph{Greedy} and \emph{Optim SO} are capable of choosing the right number of vehicles to relocate when \emph{Optim DO} is not because of its conception.
+Indeed, it seeks to relocate as many cars as possible without studying if each relocation is ``profitable'', i.e. the relocation costs less than the expected usage for the next day.
+
+\begin{figure}[!t]
+    \centering
+    \includegraphics[width=1\textwidth]{figure/ch3_optim_paris.jpeg}
+    \caption{Daily revenue expected by \emph{Paris} (one x-axis tick is a day, one plot per week of the test set), with the \emph{Historical} strategy (dashed blue line) as the base index. The red full line, the dashed yellow and dotted mauve lines are respectively the normalized revenues provided with \emph{Optim SO}, \emph{Optim DO} and \emph{Greedy} strategies.}
+    \label{fig:ch3_optim_paris}
+\end{figure}
+
+Figure~\ref{fig:ch3_optim_paris} shows how the three approaches perform each day for the whole 5 weeks of the test set.
+One can notice that while the \emph{Greedy} strategy performance is consistent in being slightly better than the \emph{Historical} baseline, it is not the case of \emph{Optim DO} which is heavily influenced by the car usage. 
+Indeed, since \emph{Optim DO} seeks to relocate as many cars as possible before minimizing jockey costs, it is often relocating cars that are not profitable to move once the jockey repositioning is taken into account.
+Meanwhile, the approach \emph{Optim SO} is most of the time over the \emph{Historical} baseline, only three days have a lower revenue, and always above the \emph{Greedy} baseline.
+However, those results should be mitigated by the low performance of the regression models in the case of \emph{Paris}, with more accurate prediction of the car utility the three relocation strategies would be more accurately evaluated.
+
+\begin{figure}[!t]
+    \centering
+    \includegraphics[width=1\textwidth]{figure/ch3_optim_washington.jpeg}
+    \caption{Daily revenue expected by \emph{Washington} (one x-axis tick is a day, one plot per week of the test set), with the \emph{Historical} strategy (dashed blue line) as the base index. The red full line, the dashed yellow and dotted mauve lines are respectively the normalized revenues provided with \emph{Optim SO}, \emph{Optim DO} and \emph{Greedy} strategies.}
+    \label{fig:ch3_optim_washington}
+\end{figure}
+
+\paragraph{Washington.}
+
+The last part of Table~\ref{tab:ch3_optim_table} shows the average daily revenue and number of relocation of the three approaches evaluated.
+Unlike the two previous services, \emph{Washington} is spread on a bigger area with its $411$ cells.
+With a utilization comparable to \emph{Paris} but a bigger surface to cover, the costs associated with the use of jockeys to relocate to cars are expected to be higher.
+This is verified by the average results in the table, with the \emph{Greedy} and \emph{Optim DO} approach, it seems not be possible to beat the historical baseline.
+Since the strategy \emph{Greedy} does not optimize the costs induced by repositioning the jockeys, its average performance stays around the baseline.
+As for \emph{Optim DO}, since the cars are first selected before any cost is taken into account, this means that some cars that have a very low predicted utility are moved to a very high utility empty spot and seems profitable.
+But when taking into account the costs, this apparent profitability is countered by the costs.
+However our proposed strategy \emph{Optim SO} should give a daily average increase of 2.9\% of the revenue during the 4 weeks of the test set.
+While the three approaches relocate as many cars as permitted, \emph{Optim SO} is capable of selecting the best cars to relocate compared to \emph{Optim DO} and to better optimize the jockey repositioning compared to \emph{Greedy}.
+
+Figure~\ref{fig:ch3_optim_washington} gives a precise view of the three strategies performance.
+While \emph{Optim DO} seems capable of beating the historical baseline only once (2020-01-09), our approach \emph{Optim SO} seems to yield a lower revenue than the baseline seven days over the fours weeks of the test set.
+As in the case of \emph{Paris}, those results should be mitigated by the poor prediction performance of the regression models.
+
+\FloatBarrier
+\section{Conclusion}
+In this chapter, a methodology has been proposed to optimize both the placement of the fleet during the night and the trips jockeys need to take to go from a relocated car to another one.
+This methodology is based on two steps, first it predicts the future utilization of cars in every part of the city and second its make the optimization.
+Two experiments have been made to assess the performance of the methodology.
+
+The first experiments have evaluated the error rate of regression models for the three case studies.
+In each case, the utilization of a \emph{Gradient Boosting Tree Regressor} led to a lower error rate than the use of a mean value of the utility prediction.
+For both \emph{Paris} and \emph{Washington}, this regression model has a high error rate ($\geq$ 70\%) when compared to the historical utility.
+It is explained by both the high variability of the measure of the daily utility: it depends on the chain of trips during the day for each car.
+Over a single day, the randomness associated with the car usage leads to different chains of the trip even when the car is always placed in the same location.
+
+The second experiments have evaluated the daily utility expected with the placement optimized by the proposed methodology.
+In all cases, the best regression model has been used to predict the expected utility from placing cars in every location of the city.
+The experiments shown that for the three case studies, the proposed methodology increased the daily expected profit when compared to the historical one.
+The study of the long-term effects of using the methodology has yet to be evaluated and is the subject of the next chapter.
+
+
+% The first approach both evaluated the performance of the daily utility prediction by the regression models and evaluated the gains that could be expected by the \emph{Integer Linear Programming} (\emph{ILP}) optimization model.
+% The evaluation shown that with all the available feature, for \emph{Madrid} in the best case the utility prediction error represented 33\% of the daily usage in each cell and for \emph{Paris} and \emph{Washington} this error rate was respectively 74\% and 87\%.
+% When compared to the baseline that predicts for each car the mean utility depending on the car rank, in the bast case the utilization of regression models made an improvements of 3\% for \emph{Madrid}, 12\% for \emph{Paris} and 34\% for \emph{Washington}.
+% For the three case studies, the utilization of such regression models to predict the daily utility of cars is justified.
+% Then experiments evaluated the solution proposed by the \emph{ILP} model against the historical utility performance of the service, such that the utility made by the historical placement was compared to the one that was expected from the placement found by the \emph{ILP} solver.
+% The results shown an improvement of 6.8\% in total daily utility for \emph{Madrid}, 5.1\% for \emph{Paris} and 2.9\% in the case of \emph{Washington}.
--- a/4_chapitre4.tex
+++ b/4_chapitre4.tex
--- a/5_chapitre5.tex
+++ b/5_chapitre5.tex
@ -0,0 +1,197 @@
+\chapter{A/B Testing}
+\label{ch:ab_testing}
+
+The opportunity to test in real life the theoretical models described in Chapters~\ref{ch:method} and Chapter~\ref{ch:simulation} has been granted by \emph{Free2Move}.
+The objective is to ensure that the developed methodology can be deployed and used in production by the Free2Move's carsharing services and to know its performance on a real free-floating car-sharing system. 
+This practical evaluation takes the form of an A/B Testing~\cite{trustworthy_2020_kohavi}.
+A baseline is applied for the period A and then the methodology is applied during the period B. 
+Because of practical reasons linked to the \textit{Free2Move} service in Madrid, only a simplified version of the methodology presented in Chapter~\ref{ch:method} could be used.
+
+First is detailed the experimental setting to precise which service is selected for the A/B Testing, the simplified version of the method and what the experimental conditions are.
+Then, a reminder is done on the statistical test used to assess whether the period A or the period B has a higher utilization of cars.
+Lastly, the results of the A/B Testing are presented.
+
+\section{Experimental Setting}
+
+\textit{Free2Move} accepted to do an A/B Testing in order to assess the performance of the proposed methodology for a real case in \emph{Madrid}.
+The Figure~\ref{fig:ch5_madrid_area} shows the perimeter of the service with a blue line and all the hexagons used to make the grid representing the service's area.
+Note, that it is not exactly the same perimeter as presented in Figure~\ref{fig:ch2_grid_madrid}, the one used during the experiments of Chapters~\ref{ch:method} and~\ref{ch:simulation}, which concern data about trips done four years earlier.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=0.55\textwidth]{figure/ch5_madrid_area.jpeg}
+        \caption[Free2Move Area Service 2022]{The area serviced by \textit{Free2Move} the free-floating carsharing service in Madrid when the A/B Testing has been run. The grid represents all cells and the blue line represents the actual service perimeter.}
+        \label{fig:ch5_madrid_area}
+\end{figure}
+
+The method proposed in Chapter~\ref{ch:method} could not be used as is for two practical reasons.
+First, \textit{Free2Move} has no access to public electric charging stations in Madrid, so all the cars are recharged in a single central hub belonging to \textit{Free2Move}.
+Thus for any optimal distribution of cars for the next morning, there is only one point from where the cars can be taken by the jockeys, the central charging hub.
+Second, only the discharged electric cars of the service, not bookable and therefore not usable, are the priority of the team in charge of moving and recharging them in the city.
+Indeed, considering the number of discharged cars to be recharged every night, it is generally not possible to spare staff members to relocate cars usable by customers but placed in low demand areas.
+
+Thus, no optimal relocation has been computed by the methodology to be tested.
+Only the ordered list of the \textquote{best positions} where to place the cars has been provided to \textit{Free2Move} so their jockeys know where to place the fully recharged electric car leaving the central charging hub.
+The \textquote{best positions} are the position that have the highest utility when a car is placed, thus only the matrix of utility $U^i_k$ for all cells $k \in K$ and rank $i \in I$ have been predicted according to the same process as described in Chapter~\ref{ch:method}.
+The ordered list returned by the simplified methodology is the list of couples $(k,i)$ in decreasing order of utility corresponding to its couple.
+
+It should be noted that \emph{this approach is not equivalent to the Greedy baseline} detailed in Algorithm~\ref{alg:ch3_greedy_algorithm} (from Chapter~\ref{ch:method}).
+Indeed in this \emph{Greedy} baseline, the algorithm seeks to relocate up to a fixed number of cars while taking into account costs of taking care of jockeys in sweeper cars.
+In the version used for the A/B Testing, the locations to be filled with charged cars are ordered, i.e. the methodology does not expect all returned locations to be filled with cars.
+The staff is not expected to perfectly fill the locations from most lucrative to less ones, letting jockeys to take practical necessities in consideration.
+
+For the A/B Testing two \textquote{A} periods have been provided by \textit{Free2Move}.
+The first one is the $28$ days from the whole month of February 2022, this period will be called \textquote{A1} from now on.
+The second period \textquote{A} is the $30$ days from the whole month of April 2022, this period will be called \textquote{A2} from now on.
+Finally, the period during which the test will be conducted is called \textit{B} and represents the $31$ days of March 2022.
+
+In order to evaluate the impact of this simplified method on the service, the number of trips made each day and the daily utility have been recorded for all periods of the A/B Testing.
+The aim is to compare the mean number of trips per day and the mean utility per day between the periods A and the period B.
+If the comparison of means between \textquote{A1} and \textquote{B} show that both means are similar, then it means that the distribution of cars placed following the simplified methodology gave a similar daily customer usage of the fleet.
+The comparison between \textquote{A2} and \textquote{B} is done to confirm (or not) the previous results.
+In both cases, the comparison of the mean daily number of trips and the mean daily utility between an \textquote{A} period and the \textquote{B} period is done with the help of the \emph{Test of Homogeneity}.
+
+
+\section{Homogeneity Tests}
+
+% Put commands for special numbers here
+\newcommandx{\meanA}{\bar{x}_A}
+\newcommandx{\meanB}{\bar{x}_B}
+\newcommandx{\varA}{\hat{\sigma}^2_A}
+\newcommandx{\varB}{\hat{\sigma}^2_B}
+\newcommandx{\stdD}{\hat{\sigma}_D}
+\newcommandx{\tboundary}{t_{\alpha}^v}
+
+The differences between the mean scores from \textquote{A} and \textquote{B} periods could be due to a random occurrence, for example because not enough observations have been made on a phenomenon having a high variance.
+It could also be explained by a real difference on how the service performed which lead to different outcomes.
+For the next part, the sets of scores from an observed period are called samples.
+For example, Figure~\ref{fig:ch5_distribution_example} shows two examples of Gaussian distributions with their empirical mean and variance deduced from two mock-up samples.
+The aim is to have some evidence to either keep or reject a \emph{null hypothesis} statistically, e.g. find if the mean of a sample is significantly inferior to the mean of the other sample.
+In both cases (left and right) of the figure, the difference between the means of the \textit{red} and \textit{blue} distribution is identical.
+However for a non-expert eye, the inferiority of the \textit{red} distribution mean over the \emph{blue} one seems \emph{more significant} on the left case then the right case.
+To quantify this significant difference between the mean coming from two samples, statistical tests are used such as the \textit{Welch's t-Test}, which is similar to the \textit{Student's t-Test}.
+
+\begin{figure}[!ht]
+    \centering
+        \includegraphics[width=1\textwidth]{figure/ch5_distribution_example.jpeg}
+        \caption[Probability Density Function Example]{Two examples of empirical distributions to compare, with their \textit{Probability Density Function} (\textit{PDF}) shown and deduced from mock-up observations. On the left the difference between the \textit{PDF} of the \textit{red} distribution's observation and the \textit{blue} distribution's observation is noticeable. However on the right the difference between the two distribution is less clear mainly due to the increase of the variance in the set of observations.}
+        \label{fig:ch5_distribution_example}
+\end{figure}
+
+The \textit{Welch's t-Test} is used when the two samples follow a normal distribution, with two samples of unequal size and with unequal variances (or unknown ones).
+For this test, the null hypothesis is that the mean score of the sample from a period \textquote{A} is \emph{not significantly lower} than the mean score of the sample from a period \textquote{B}.
+Since the null hypothesis consist in considering that one mean score is not lower than the other mean score, the \textit{one-sided} version of the \textit{Welch's t-Test} is used.
+
+To either keep or reject this null hypothesis, a statistic called $t_D$ is computed with $\meanA$ the mean of the sample A, $\meanB$ the mean of the sample B and $\stdD$ the empirical variance of the combined samples:
+$$ t_D = \frac{\meanA - \meanB}{\stdD} $$
+
+The value of the statistic $t_D$ depends on $\stdD$ defined by:
+$$ \stdD = \sqrt{\frac{\varA}{n_A} + \frac{\varB}{n_B}} $$
+with $\varA$ the empirical variance of sample A, $\varB$ the empirical variance of sample B, $n_A$ the size of the sample A and $n_B$ the size of the sample B.
+
+According to the statistic $t_D$, it is possible to compute the confidence of rejecting the null hypothesis.
+This is done by computing the cumulative distribution function value on $t_D$ of the \emph{Student t-law} with $v$ degrees of freedom.
+This computed value is called the p-value (or $\alpha$).
+In this case the lower the p-values is, the safer the null hypothesis can be rejected.
+Since the {Welch's t-Test} is used, the degree of freedom $v$ has to be found.
+It is defined as:
+$$v = \frac{\left[ \frac{\varA}{n_A} + \frac{\varB}{n_B} \right]^2}
+           {\frac{\left( \varA / n_A \right)^2}{n_A - 1} + \frac{\left( \varB / n_B \right)^2}{n_B - 1}}$$
+with $\varA$ the empirical variance of sample A, $\varB$ the empirical variance of sample B, $n_A$ the size of the sample A and $n_B$ the size of the sample B.
+
+From the mock-up in Figure~\ref{fig:ch5_distribution_example}, for the two distributions in the two sub-figures the null hypothesis is that the mean of the \emph{red sample} distribution is not significantly lower than the mean of the \emph{blue sample} distribution.
+For the sub figure on the left, the null hypothesis can be safely rejected with a p-value $\leq 1$ \%, i.e it is almost certain that the mean of the \emph{red samples} is lower than the mean of the \emph{blue samples}.
+However for the sub-figure on the right, the same null hypothesis cannot be safely rejected since the p-value of the \emph{Welch t-Test} is 12 \%.
+
+\section{Results}
+
+The objective of this section is to find whether there are good reasons to validate the use of the simplified methodology in the service of \emph{Madrid}.
+To do so, a \emph{Welch t-Test} has been done between the samples from the period \textquote{A1} and the period \textquote{B} as well as between the samples of periods \textquote{A2} and \textquote{B}.
+Two main indicators are studied: the daily total utilization of the cars (in minutes), also called \emph{utility} in the following experiments, and the daily number of customer trips, also called \emph{\#trips} during the experiments.
+The objective is to find out whether there is a difference between each \textquote{A} period and the \textquote{B} period and if this difference is due to the usage of the methodology.
+More in detail, the aim is to first find if the usage of the service has increased, decreased or did not change at all. 
+Then if the usage changed when the simplified methodology is used, other indicators are computed to know whether the staff has followed propositions made by the methodology or not.
+If the usage increased or decreased and the staff respected as much as possible the methodology's proposition, then the methodology had respectively a positive or negative impact on the utilization of the service.
+
+\paragraph{Welch t-Test on the \emph{utility}.}
+
+\begin{figure}[!t]
+    \centering
+        \includegraphics[width=1\textwidth]{figure/ch5_abtest_utility.jpeg}
+        \caption[Daily Utility Gaussian Distribution]{Comparison of the \textit{Probability Density Function} (\textit{PDF}) of the daily \emph{utility} distribution for three months: February, March and April 2022. On the left is the comparison between the \textit{PDF} of February (blue) and March (green), on the right the comparison is made between April (red) and March (green). The vertical colored lines visualize the mean daily utility for each month.}
+        \label{fig:ch5_abtest_utility}
+\end{figure}
+
+For each day of February, March and April 2022, the utilization of the cars are summed up to create one data point.
+For each month, a sample is created with the daily data points belonging to each month, its empirical mean and standard deviation are deduced from the sample.
+Figure~\ref{fig:ch5_abtest_utility} shows the distribution of the samples from the three periods according to their empirical mean and standard deviation, with on the left sub-figure the comparison between February and March, and on the right sub-figure the comparison between April and March.
+In the first case, the difference between the mean utility of February and March is visible: the average (and standard deviation) utilization of cars is $55\,886 \pm 12\,253$ minutes/day in February when it is $61\,630 \pm 14\,158$ minutes/day in March.
+In this case, the null hypothesis that the mean utility \emph{is not lower} in February than in March can be rejected safely (p = 5\%).
+In the second case, the difference of mean utility between April and March is less clear: the average (and standard deviation) utility is $62\,241 \pm 12\,884$ in April while it is $61\,630 \pm 14\,158$ in March.
+For this case, the null hypothesis cannot be rejected safely (p = 57\%).
+Overall, the simplified version of the methodology proposed in Chapter~\ref{ch:method} might have improved the daily utility by 10\% when March is compared to February, but without any noticeable difference between April and March.
+According to the operational team monitoring the service in \emph{Madrid}, seasonal effects on the utility could be observed in previous years (except for 2020 because of covid-19).
+Notably each year, the months between January and June see a regular increase in usage, with June being the peak.
+If the simplified methodology were not used, the usage of the service in March could be lower than the one observed.
+
+\paragraph{Welch t-Test on the \emph{\#trips}.} 
+
+\begin{figure}[!b]
+    \centering
+        \includegraphics[width=1\textwidth]{figure/ch5_abtest_trips.jpeg}
+        \caption[Daily Number Trip Gaussian Distribution]{Comparison of the \textit{Probability Density Function} (\textit{PDF}) of the daily number of trips (\emph{\#trips}) distribution for three months: February, March and April 2022. On the left is the comparison between the \textit{PDF} of February (blue) and March (green), on the right the comparison is made between April (red) and March (green). The vertical colored lines visualize the mean number of daily trips for each month.}
+        \label{fig:ch5_abtest_trips}
+\end{figure}
+
+As for the utility, the daily number of trips for February, March and April are gathered into three samples for the three months.
+For each sample, the empirical mean and standard deviation are deduced and used to represent Gaussian distributions fitted on those parameters on Figure~\ref{fig:ch5_abtest_trips}.
+As for the previous study, the sub-figure on the left represents the fitted Gaussian distribution for February and March: the mean daily \emph{\#trips} (and its standard deviation) in February is $1\,124 \pm 131$ while the mean daily \emph{\#trips} in March is $1\,197 \pm 136$.
+In this case, the null hypothesis that the mean daily \emph{\#trips} in February \emph{is not less} than the mean daily \emph{\#trips} in March can be safely rejected (p = 2 \%).
+In the second case, i.e the comparison between April and March for the same indicator on the right sub-figure, the mean daily \emph{\#trips} in April is $1\,115 \pm 235$ while it is $1\,197 \pm 136$ in March.
+So for the comparison between February and March, the null hypothesis that the mean daily \emph{\#trips} in April \emph{is not less} then the mean daily \emph{\#trips} in March can be rejected, with an associated p-value of 6\%.
+One can notice that even if the empirical mean daily \emph{\#trips} in April is ``lower'' than the same indicator for February, the higher empirical standard deviation is the reason why the p-value of the \emph{Welch t-Test} is slightly higher.
+Overall, the simplified version of the methodology might have improved the daily number of customer trips in March when both compared to February and April by respectively 6 \% and 7 \%.
+However, those first conclusions are tied down to the effectiveness of the relocation team in \emph{Madrid}: if the list of locations where to put a car was not respected, the variations observed would be due to cheer luck.
+
+\paragraph{Staff compliance with the methodology.}
+
+The tested methodology proposes a list of car locations, ordered by decreasing utility value.
+If a location is already occupied by a car it is removed from this list.
+In theory, the relocation made by the staff should end in the highest unoccupied location in this list.
+However for practical reasons, this might not always be true: when a staff member has to place a fully charged car, it is often near a discharged one to take it directly back into the central charging hub.
+Several cars can then be placed in ``suboptimal'' locations for the sake of keeping as many cars available for customers as possible.
+Hence additional measures have to be gathered on the fleet placement to assess the compliance of the staff with the suggestions of the methodology.
+This helps to explain if the increase in the mean daily number of trips and utility in March is due to the methodology or to sheer luck.
+
+\begin{figure}[!t]
+    \centering
+        \includegraphics[width=1\textwidth]{figure/ch5_abtest_optimcars.jpeg}
+        \caption[Summary Car Placement Staff]{Summary of the relocation decisions made by the staff, either by following or disregarding the proposed car location. The left sub-figure shows both the daily number of cars relocated in a proposed location (in blue \emph{Optimal Cars}) and the daily number of cars not relocated in a proposed location (in red \emph{Suboptimal Cars}). On the right sub-figure in green is the daily ratio of cars placed in the proposed locations (\emph{Optimal Cars}) over the total number of relocations made by the staff during the day. For both figures, the x-axis are the days such that each tick is a Sunday.}
+        \label{fig:ch5_abtest_optimcars}
+\end{figure}
+
+Knowing that the methodology suggested an ``optimal'' location for each car of the fleet, the number of unoccupied locations suggested by the methodology have been analyzed for each day, i.e from 10:00 p.m to 9:59 pm the next day.
+Over a fleet of around 500 cars, this account for an average of 256 ($\pm$ 42) proposed locations where to relocate cars.
+This implies that according to the ordered list of proposed locations, half of the fleet was either discharged or not in the best position.
+On those average 256 daily locations, a daily average of 82 ($\pm$ 15) relocations has been made by the staff during each day.
+All suggested relocations could not be made, mainly because of the number of staff members available, but highest priority spots where focused by the relocation team.
+It should also be noted that two kind of relocations have been made by the staff during this period: relocations to put fully charged cars back in service and relocations to better place other cars, as what the proposed methodology originally aimed at.
+
+In those daily average of 82 relocations made, a daily average of 53 ($\pm$ 14) relocations has been made in suggested ``optimal'' places while only 29 ($\pm$ 9) relocations have ended in spots not suggested by the algorithm.
+Hence, on average each day 65\% ($\pm$ 10\%) of the relocations were made with the suggestions from the methodology.
+Even if the involvement of the staff is not perfect, practical necessities and on field decisions lead to this score.
+Indeed, \emph{Free2Move} has been only recently able to know where the booking application was opened by customers.
+So the areas without the application to rent cars being opened were left out by the staff.
+Thus, the staff had not used the suggested locations when they were in these areas with no ``empirical demand''.
+Figure~\ref{fig:ch5_abtest_optimcars} gives daily details about the relocations being followed or disregarded by the staff.
+On the left sub-figure is the number of relocations depending on whether the end point is in a suggested spot or not.
+It shows that the number of relocations ending in suggested spots the Friday and Saturday is lower then the rest of the week on average while the relocations ending in other spots does not follow any particular pattern.
+One explanation might be that the staff has to focus more on discharged cars on Fridays and Saturdays, because of the increased usage on Thursdays and Fridays, and thus ``convenient'' relocations of recharged cars have been made near discharged ones in spots not suggested.
+On the right sub-figure is the ratio of relocations ending in suggested spots over the total number of relocations made.
+No clear pattern can be extracted, however one can observe that each day the ratio of cars being placed in suggested spots over all relocations is (almost) always between 60\% and 80\%.
+
+~\\
+
+Overall, even if \emph{Free2Move}'s relocation team has not perfectly taken into account the methodology output to relocate the cars, the methodology impacted a significant part of the fleet positioning.
+It is not possible to formally prove that the increase in the daily average number of trips between February-March and April-March is due to the methodology's output.
+However the level of compliance from the staff and the expected increase in usage from the methodology are two arguments in favor of the hypothesis that this methodology had slightly improved the service usage when used for real.
--- a/6_conclusion.tex
+++ b/6_conclusion.tex
@ -0,0 +1,145 @@
+\chapter*{Conclusion}
+\addcontentsline{toc}{chapter}{Conclusion}
+\chaptermark{Conclusion}
+ 
+The objective of this thesis was to increase the utilization rate of a carsharing service by modifying its fleet distribution in a city during the night.
+The operator, \emph{Free2Move}, wanted a methodology that could be used in any of its own services, i.e. a methodology that should not be dependent on the specificity of a particular service.
+The operator can modify the placement of the vehicles thanks to specialized staff called \emph{jockeys} that are hired to relocate the cars during the night, to avoid the city's traffic during the day.
+We proposed of a two-step methodology meeting this objective~\cite{martin_prediction_2021,martin_optimisation_2022}.
+
+This methodology includes an \emph{Integer Linear Programming} model to be optimized such that decisions for the relocation of misplaced cars in the city could be automated.
+It decides which is the best placement of cars for the next morning and how jockeys should move between each relocation so that they spent the least time not doing a relocation.
+This optimization model relies on the prediction of the car utilization the next day depending on its location and other exogenous parameters.
+Three approaches evaluate the methodology to show both its performance and its limitations.
+
+\paragraph{Methodology Performance and Limitations.}
+The first approach evaluated both the performance of the daily utility prediction by the regression models and evaluated the gains that could be expected by the \emph{Integer Linear Programming} (\emph{ILP}) optimization model.
+The evaluation of the daily utility prediction model shown that in the case of \emph{Madrid} the models have an acceptable prediction error while in the case of \emph{Paris} and \emph{Washington} the lack of trip data hindered the models and led to a higher error.
+The best regression models found during this experiments have then been used for the evaluation of the \emph{ILP} model.
+The evaluation of the solutions found by the \emph{ILP} solver against the historical service utilization shown that the relocation model actually improves the service utilization in the three case studies.
+The daily profit was higher for the three services when the solution of the \emph{ILP} model is used than in the historical case.
+
+These case studies revealed a limitation on this methodology.
+The prediction of the utility is dependent on the overall activity in the service, i.e. the prediction of the daily car utility is easier when the activity in the service is higher.
+The difference of activity between \emph{Madrid} and \emph{Paris} is correlated with the difference between the error rate in the utility prediction.
+This limitation prevents \textquote{small} services from using the methodology.
+
+The second approach evaluated how the methodology would modify the utilization rate of the service on the long term.
+It used a simulation model to emulate the behavior of a carsharing service's fleet.
+Depending on the vehicle's location those vehicles would be more or less used by the customers, allowing to evaluate the usage of a relocation strategy.
+Three experiments have been made with the simulation model.
+Overall these experiments allowed to draw several conclusions on how the methodology would perform on the long term.
+First the methodology can be used for services with a usage high enough to support the cost of relocations.
+In the case of \emph{Paris} the relocations are often not profitable to make because of the low utilization of the service.
+This shows that in the method, limiting the optimization to the trips made by jockeys between relocations misleads the whole relocation strategy.
+In that case the \emph{ILP} solver returns a solution that includes cars that should be moved from the point of view of the solver.
+But in reality the additional cost of relocating the car that is not included in the \emph{ILP} formulation would make the relocation unprofitable and unnecessary.
+This highlighted the need to take into account the whole path of each jockey when minimizing the costs associated with the relocations.
+Additionally, the experiments shown that customer-based relocation are always an effective alternative to operator-based relocations.
+However this conclusion is due to the simulation being imperfect: the simulated car does not need maintenance nor refuel/recharge that make staff completely optional when it is not in reality.
+
+The third approach evaluated a simplified version of the methodology, but directly in the real carsharing service of \emph{Madrid} with the help of \emph{Free2Move}.
+The objective was to assess if a simplified methodology could be implemented and used in a production environment to automate the placement of fully recharged electric cars. 
+This consisted in an \textquote{A/B Testing}, with February 2022 and April 2022 as periods A and March 2022 as period B.
+The service worked as usual during the periods A and during the period B the \emph{Free2Move}'s team used the simplified methodology proposed in this manuscript.
+The results of this A/B Testing shown a limited effect of the simplified methodology on the service.
+A first limitation had been shown before the A/B Testing itself as the methodology had to be simplified: it did not support the recharging of electric vehicles.
+The methodology did not include the recharge of electric vehicles because of the lack of information about the state of charge of electric vehicles in the trip dataset when the methodology was designed.
+The A/B Testing also highlighted that the solutions' performance is dependent on the execution of the solution by the operational team.
+Indeed, the staff might have preferences that should be taken into account by the \emph{ILP} model or it might return solutions that are feasible but not accepted by the staff, e.g. let enough room for jockeys to \textquote{improvise} if a car cannot be placed in an area because no parking lot is found.
+
+% \paragraph{Approach Limitations.}
+% Overall, the proposed methodology shown promising results during the experiments, but several limitations have been also exposed.
+% First, the sparsity of the carsharing trips data, because of a low usage, highlighted the need to have sufficient data in order to train the utility prediction models.
+% Indeed, in both \emph{Paris} and \emph{Washington} the high error rates when the utility has to be predicted is mostly due to the sparsity in the trip data as well as the high variability is carsharing usage in those cities.
+% The variability in the demand is also due to \textquote{exceptional} events such as road closure, public transport strikes or large cultural events that influence the demand for carsharing.
+% However the records of such data were not available for the training of the utility prediction model.
+% Second, the \emph{Integer Linear Programming} optimization model of the two-step methodology only provides the places where cars should be removed or added as well as the trips jockeys needs to do between two relocations.
+% However, it does not give the complete and optimal path a jockey has to take to both relocate cars and move between each relocation.
+% In Chapter~\ref{ch:simulation}, the test of this methodology required an additional step to take the bits of jockey trips and assemble them, through a greedy algorithm, to create the path each jockey has to take in order to relocate the cars.
+% The use of a greedy algorithm does not provide an assurance that all the paths are minimizing the time spent by all jockeys doing the whole relocation process.
+
+\paragraph{Perspectives.}
+The perspectives to improve the method falls into two categories: the utility prediction and the optimization model.
+In the first category, the precision of the utility prediction could be enhanced by using additional data on external factors like \textquote{exceptional} events, e.g. road closure, public transport strikes, or large cultural events that influence the demand for carsharing.
+Additionally, if the services were more used and the staff could relocate the cars during the day, the utility could be predicted on an hourly basis.
+During the thesis, a tentative to predict the car usage on an hourly basis have been made for a service named \emph{Multicity} located in Berlin.
+This approach modeled the city as a graph, where each cell is a node inside the graph and each trip is an arc between nodes.
+By creating a graph for each hour of the dataset, the resulting ensemble of graphs is ordered to create a dynamic graph, i.e. a graph with nodes and/or arcs changing as time passes.
+The objective was to model the usage as a dynamic graph to then predict the arcs of the future graph, i.e. the \emph{link prediction in dynamic graph} problem.
+By predicting the arcs, the number of trips between each cell could be predicted on an hourly basis.
+However the low usage of the service in \emph{Berlin} made sparse graphs whose arcs could not be predicted accurately.
+Since the usage in \emph{Madrid} is higher, this approach could be explored.
+Using the simulation with a higher count of vehicles could also artificially increase the number of trips and provide denser dynamic graphs to test this approach.
+
+In the second category, multiple practical improvements for \emph{Free2Move} could be made if required by the operator.
+The optimization model is flexible enough to allow the integration of new information or new constraints based on future needs.
+For example, the model could introduce the charge of electric vehicles such that on top of relocating misplaced cars, the jockey could also recharge and optimally put back in service the recharged cars.
+Additionally, the methodology could take into account heterogenous fleet, i.e. fleet with different types of vehicles.
+Indeed, different types of vehicles can have a different usage pattern, e.g. sedans used for commute trips and utility vehicles that are used to move large furniture.
+It could also be used by the optimization model so that types of cars can be separated and be subjected to different constraints, e.g. not putting more than 10 sedans and not more than 5 utility vehicles in a single cell.
+For both the recharging and heterogenous fleet improvements made on the optimization model, this would increase the prediction of the utility since the usage of a vehicle is linked to the state of charge and type of vehicle.
+Moreover, the optimization should avoid relocating cars only by taking into account half of the path made by the jockeys during the relocation of the fleet.
+As seen in the experiments, this leads to unprofitable relocation trip being made.
+Instead of optimizing the fleet placement only by taking into account half of the jockey path, the optimization should be done on both the locations where cars should be taken and dropped off and the paths each jockey travels to make the relocations.
+
+% Under the hypothesis that corresponding data can be gathered for the training of the utility prediction model, two practical improvements to the methodology can be done in a short-term future.
+% The lack of information about the state of charge for electric vehicles in the provided trip datasets has hindered the utility prediction models: this information is a determining factor in the customer usage as low battery is deterrent for customers.
+% A first modification of the model would consist in putting the battery's state of charge of electric vehicles in the utility prediction models as a feature.
+% Then, the \emph{Integer Linear Programming} optimization model could introduce the recharge of electric vehicles such that on top of relocating misplaced cars, the jockey could also recharge and optimally put back in service the recharged cars.
+% Second, the methodology is not capable of taking into account several types of vehicles that can have a usage pattern different too, e.g. sedans used for commute trips and utility vehicles that are used to move large furniture.
+% The introduction of different types of vehicles in the utility prediction model would increase its precision.
+% It could also be used by the optimization model so that types of cars can be separated and be subjected to different constraints, e.g. not putting more than 10 sedans and not more than 5 utility vehicles in a single cell.
+
+\begin{figure}[!b]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.70\textwidth]{figure/ch6_vrp.jpg}}
+        \caption[VRP Example]{Instance of the \emph{Vehicle Routing Problem} for a logistic company: \emph{three trucks} have to deliver goods from \emph{a central depot} to \emph{all shops}. The sum of distance traveled by all trucks have to be minimized.}
+        \label{fig:ch6_vrp}
+\end{figure}
+
+\paragraph{Vehicle Routing for the Car Relocations.}
+For the optimization of both the car placement and the jockey route, techniques from the \emph{Vehicle Routing Problem} field can be used at least to minimize all the route traveled by the jockeys.
+The \emph{Vehicle Routing Problem} (\emph{VRP}) is a generalization of the \emph{Traveling Salesman Problem} (\emph{TSP}).
+In the \emph{TSP} a \emph{single} agent has to minimize its trip while visiting each point, but in the \emph{VRP} this problem is generalized to any number of agents such that the sum of the distances traveled by each agent is minimized while all the points must be visited by at least one agent.
+An example of \emph{VRP} is shown in Figure~\ref{fig:ch6_vrp}, in this case a logistic company has to deliver goods to several shops in a city and has three trucks to do it.
+Since all shops have to be served by the logistic company, it could seek to minimize the distance that all the trucks have to travel to minimize the fuel required for the deliveries.
+
+In the case of carsharing services, the application of the \emph{VRP} to the relocation problem is not straightforward. 
+In particular, if a jockey is an agent and a place to visit is either a location with a car to remove or a location with a car to place, each jockey has to respect an order to visit each location.
+If the complete route of a jockey makes him visit one location where to pick a car and then two locations where a car is expected, then it means the jockey will be able to place only one car in the first location but not in the second.
+The order of places visited by each jockey matters.
+One solution would to consider the \emph{Pickup \& Delivery Problem} which is a particular case of \emph{VRP}: additional constraints are added such that when an agent visit a location this agent is constrained to visit another location directly after.
+Those constraints could be created directly from the optimal relocations computed in the second step of the proposed methodology.
+This would represent an optimized alternative to the greedy jockey route creation detailed in Chapter~\ref{ch:simulation}.
+
+However this solution would not leverage the fact that it is not mandatory to relocate a car in a carsharing service if it is not profitable to do so.
+One could seek to model a \emph{VRP} with the ability to optimize \emph{jointly} which cars should be removed from their locations, which unoccupied locations should receive a car and the complete route of all jockeys.
+
+% or want to optimize the route of jockeys to relocate those cars. In the latter case, there also exist a more specialized field about the \textit{Vehicle Routng Problem}~\cite{goel_vehicle_2017}, which is about finding the shortest path for a vehicle that should pass through given locations.
+
+
+\toignore{
+Ccl:
+Reprendre chapitre par chapitre (Résumé des épisodes précédent)
+Il faut rester assez haut niveau pour la conclusion.
+Puis faire une conclusion générale sur l'ensemble du travail.
+
+Expliquer les limitation des approches qui sont étudiées.
+Limitation net: system à modéliser est trop complexe (travaux sur route, eveM divers dans la ville à prendre en compte)
+=> Données limités par le nombre de voiture dans la flotte, si pas assez de voiture ou usage low alors pas assez de stats (données trop parcimonieuses)
+=> Devra pouvoir généraliser sur plusieurs villes, mais généraliser sur les ville sdemand d'avoir des donées très précises pour caractériser les villes.
+Quel similarité du point de vue du carsharing ?
+Eu la chance d'avoir 3x 1an de données d'autopartage, mais éventaille de situation fait que ces données ne sont pas suffisantes pour prédire les situations compliquées.
+On devrait faire une prédiction à grain un peu plus fin ? Prendre en compte des données plus riches (type google avec les déplacements journaliers).
+
+- Parler de granularité
+- Modélisation
+
+Parler de ville connectées pour avoir plus de données ? (Quel impact environmental ?)
+
+Futur work:
+Faire un lien avec les trucs qui n'ont pas fonctionné ? (Dynamic graph ?)
+Etant donné un placement jockey, dans quel ordre les jockeys devraient faire les déplacements ?
+Puis faire un lien avec vehicle routing probleme.
+}
--- a/7_acknowledgement.tex
+++ b/7_acknowledgement.tex
@ -0,0 +1,3 @@
+\chapter*{Acknowledgement}
+
+\gregory{Il faut marquer des trucs ici.}
--- a/8_annexe.tex
+++ b/8_annexe.tex
@ -0,0 +1,190 @@
+\appendix
+
+\chapter{Appendix - Relocation Strategies Performances}
+
+\section{Comparison of Operator-based Relocations}
+
+\subsection{Madrid Case}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_madrid_utility_bfp.jpeg}}
+        \caption[Madrid Simulation Utility Performance Expe1 BFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Madrid} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_madrid_utility_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_madrid_utility_wfp.jpeg}}
+        \caption[Madrid Simulation Utility Performance Expe1 WFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Madrid} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_madrid_utility_wfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_madrid_profit_bfp.jpeg}}
+        \caption[Madrid Simulation Profit Performance Expe1 BFP Placement]{Curves of the expected \emph{Profit} (in euros) of the \emph{Madrid} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_madrid_profit_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_madrid_profit_wfp.jpeg}}
+        \caption[Madrid Simulation Profit Performance Expe1 WFP Placement]{Curves of the expected \emph{Profit} (in euros) of the \emph{Madrid} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_madrid_profit_wfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_madrid_cost_bfp.jpeg}}
+        \caption[Madrid Simulation Cost Performance Expe1 BFP Placement]{Curves of the expected \emph{Cost} (in euros) incurred by the tested relocation strategies \emph{OR-Greedy} and \emph{OR-Optim} and their variants. Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_madrid_cost_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_madrid_cost_wfp.jpeg}}
+        \caption[Madrid Simulation Cost Performance Expe1 WFP Placement]{Curves of the expected \emph{Cost} (in euros) incurred by the tested relocation strategies \emph{OR-Greedy} and \emph{OR-Optim} and their variants. Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_madrid_cost_wfp}
+\end{figure}
+
+\FloatBarrier
+\subsection{Paris Case}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_paris_utility_bfp.jpeg}}
+        \caption[Paris Simulation Utility Performance Expe1 BFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Paris} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_paris_utility_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_paris_utility_wfp.jpeg}}
+        \caption[Paris Simulation Utility Performance Expe1 WFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Paris} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_paris_utility_wfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_paris_profit_bfp.jpeg}}
+        \caption[Paris Simulation Profit Performance Expe1 BFP Placement]{Curves of the expected \emph{Profit} (in euros) of the \emph{Paris} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_paris_profit_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_paris_profit_wfp.jpeg}}
+        \caption[Paris Simulation Profit Performance Expe1 WFP Placement]{Curves of the expected \emph{Profit} (in euros) of the \emph{Paris} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_paris_profit_wfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_paris_cost_bfp.jpeg}}
+        \caption[Paris Simulation Cost Performance Expe1 BFP Placement]{Curves of the expected \emph{Cost} (in euros) incurred by the tested relocation strategies \emph{OR-Greedy} and \emph{OR-Optim} and their variants. Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_paris_cost_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_paris_cost_wfp.jpeg}}
+        \caption[Paris Simulation Cost Performance Expe1 WFP Placement]{Curves of the expected \emph{Cost} (in euros) incurred by the tested relocation strategies \emph{OR-Greedy} and \emph{OR-Optim} and their variants. Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_paris_cost_wfp}
+\end{figure}
+
+
+\FloatBarrier
+\subsection{Washington Case}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_washington_utility_bfp.jpeg}}
+        \caption[Washington Simulation Utility Performance Expe1 BFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Washington} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_washington_utility_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_washington_utility_wfp.jpeg}}
+        \caption[Washington Simulation Utility Performance Expe1 WFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Washington} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_washington_utility_wfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_washington_profit_bfp.jpeg}}
+        \caption[Washington Simulation Profit Performance Expe1 BFP Placement]{Curves of the expected \emph{Profit} (in euros) of the \emph{Washington} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_washington_profit_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_washington_profit_wfp.jpeg}}
+        \caption[Washington Simulation Profit Performance Expe1 WFP Placement]{Curves of the expected \emph{Profit} (in euros) of the \emph{Washington} service when the operator does nothing (\emph{NoAction}), or use the \emph{OR-Greedy} or \emph{OR-Optim} relocation strategies (with their variant). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_washington_profit_wfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_washington_cost_bfp.jpeg}}
+        \caption[Washington Simulation Cost Performance Expe1 BFP Placement]{Curves of the expected \emph{Cost} (in euros) incurred by the tested relocation strategies \emph{OR-Greedy} and \emph{OR-Optim} and their variants. Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_washington_cost_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu1_washington_cost_wfp.jpeg}}
+        \caption[Washington Simulation Cost Performance Expe1 WFP Placement]{Curves of the expected \emph{Cost} (in euros) incurred by the tested relocation strategies \emph{OR-Greedy} and \emph{OR-Optim} and their variants. Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu1_washington_cost_wfp}
+\end{figure}
+
+\FloatBarrier
+\section{Comparison of Customer-based Relocations}
+
+\subsection{Paris Case}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=1\textwidth]{figure/ch4_expesimu3_paris_urmax.jpeg}}
+        \caption[URMax Last Fleet Distribution WFP Paris]{Heatmap of the fleet distribution in \emph{Paris} after 30 days of simulation. Green cells denotes a low number of cars in the cells and red is a high amount. The number is each cell is the number of cars in the cell.}
+        \label{fig:ch4_expesimu3_paris_urmax}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu3_paris_utility_bfp.jpeg}}
+        \caption[Paris Simulation Utility Performance Expe3 BFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Paris} service when the operator does nothing (\emph{NoAction}), use \emph{OR-Optim} relocation strategies or customer incentives (\emph{UR-Max}, \emph{UR-Heavy}, \emph{UR-Light}). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu3_paris_utility_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu3_paris_utility_wfp.jpeg}}
+        \caption[Paris Simulation Utility Performance Expe3 WFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Paris} service when the operator does nothing (\emph{NoAction}), use \emph{OR-Optim} relocation strategies or customer incentives (\emph{UR-Max}, \emph{UR-Heavy}, \emph{UR-Light}). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu3_paris_utility_wfp}
+\end{figure}
+
+\FloatBarrier
+\subsection{Washington Case}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.95\textwidth]{figure/ch4_expesimu3_washington_urmax.jpeg}}
+        \caption[URMax Last Fleet Distribution WFP Washington]{Heatmap of the fleet distribution in \emph{Washington} after 30 days of simulation. Green cells denotes a low number of cars in the cells and red is a high amount. The number is each cell is the number of cars in the cell.}
+        \label{fig:ch4_expesimu3_washington_urmax}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu3_washington_utility_bfp.jpeg}}
+        \caption[Washington Simulation Utility Performance Expe3 BFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Washington} service when the operator does nothing (\emph{NoAction}), use \emph{OR-Optim} relocation strategies or customer incentives (\emph{UR-Max}, \emph{UR-Heavy}, \emph{UR-Light}). Fleet distribution initialization is following the \emph{Balanced First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu3_washington_utility_bfp}
+\end{figure}
+
+\begin{figure}[h]
+    \centering
+        \makebox[\textwidth][c]{\includegraphics[width=0.8\textwidth]{figure/ch4_expesimu3_washington_utility_wfp.jpeg}}
+        \caption[Washington Simulation Utility Performance Expe3 WFP Placement]{Curves of the expected \emph{Utility} (in minutes) of the \emph{Washington} service when the operator does nothing (\emph{NoAction}), use \emph{OR-Optim} relocation strategies or customer incentives (\emph{UR-Max}, \emph{UR-Heavy}, \emph{UR-Light}). Fleet distribution initialization is following the \emph{Worst First Placement}. One x-tick is a Sunday.}
+        \label{fig:ch4_expesimu3_washington_utility_wfp}
+\end{figure}
--- a/27
+++ b/27
@ -0,0 +1,27 @@
+TARGET=main
+LATEX_FLAGS=--shell-escape
+
+all: pdf
+	rsync -av . /home/gmartin/Backup/Thesis/ --exclude .git
+
+# If there is no bibliography comment the bibtext dependency
+pdf: bibtex
+	pdflatex $(LATEX_FLAGS) $(TARGET).tex
+	pdflatex $(LATEX_FLAGS) $(TARGET).tex
+
+pdf-lazy:
+	pdflatex $(LATEX_FLAGS) $(TARGET).tex
+
+bibtex: pdf-lazy
+	bibtex $(TARGET)
+
+clean:
+	rm -f *.dvi *.aux *.bbl *.blg $(TARGET).ps *.toc *.ind *.out *.brf *.ilg *.idx *.log *.bcf *.nav *.run.xml *.snm *.vrb *.backup tex/*.backup *~ *.tdo
+#find img \( -name "*.aux" -o -name "*.log" \) -exec rm '{}' +
+
+clean_all: clean
+	rm -f $(TARGET).pdf
+
+clean_pdf_images: clean
+	find img -name "*.pdf" -exec rm '{}' +
+
--- a/README.md
+++ b/README.md
@ -0,0 +1,182 @@
+# ![(French flag)](https://upload.wikimedia.org/wikipedia/en/thumb/c/c3/Flag_of_France.svg/50px-Flag_of_France.svg.png) Modèle de thèse Doctorat Bretagne Loire
+
+*Explications en français*
+
+Ce dépôt contient un modèle latex pour le manuscrit de thèse supportant toutes les écoles doctorales de l'[École des docteurs Bretagne Loire](https://www.doctorat-bretagneloire.fr/).
+
+Ce modèle a pour but principal de fournir une première et une quatrième de couverture du manuscrit de thèse intégralement écrites en latex.
+Ces couvertures ont été (manuellement) calquées sur le modèle original au format M$ Word fourni par MathSTIC en 2018, puis généralisé à toutes les écoles doctorales DBL.
+Tandis que la couverture du manuscrit se doit de respecter le format établi par DBL, la disposition du contenu interne du manuscrit est elle plus flexible.
+La disposition proposée dans ce modèle est donc donnée à titre d'exemple mais il n'est cependant pas obligatoire de la respecter.
+
+
+### Structure du dépôt
+
+- `main.tex` contient le squelette du document, aucun texte du manuscrit n'est présent dans ce fichier
+- `these-dbl.cls` contient les dépendances, les paramètres de la bibliographie dont le style de citation et les paramètres de mise en page globale du manuscrit et plus particulièrement des deux couvertures
+- `Couverture-these/pagedegarde.tex` contient les variables à remplir par l'auteur pour compléter la page de garde, ces variables sont utilisées par `\maketitle` redéfini dans `these-dbl.cls`
+- `Couverture-these/resume.tex` contient les variables à remplir par l'auteur pour compléter la quatrième de couverture, ces variables sont utilisées par les macros définies dans `these-dbl.cls`
+- Le `Makefile` vous aide à compiler le latex et la bibliographie en un pdf (détails plus bas)
+- Les autres dossiers contiennent chacun un chapitre du document
+
+
+### Remplir la première et quatrième de couverture
+
+Les informations de la page de garde doivent être renseignées dans les variables du fichier `Couverture-these/pagedegarde.tex`.
+Il suffit de modifier les lignes appelant les commandes `\ecoledoctorale{}` et `\etablissement{}` pour adapter les couvertures à l'école doctorale et à l'établissement, ou les établissements, délivrant le diplôme (par défaut MathSTIC et Université de Rennes 1, respectivement).
+Le fichier `Couverture-these/README.md` liste Les écoles doctorales et établissements supportés ainsi que des liens vers les listes des spécialités et unités de recherche de chaque école doctorale pour aider à compléter la page de garde (commandes `\spec{}` et `\uniterecherche{}`).
+Modifier la disposition des éléments de la page de garde présente dans `these-dbl.cls` ne devrait  être nécessaire que pour rajouter quelques `\vspace` pour préserver la structure original après avoir renseigné les différents champs (e.g., spécialité, composition du jury).
+
+Les variables relatives à la quatrième de couverture sont à renseigner dans `Couverture-these/resume.tex`.
+
+
+### Dépendances
+
+Une distribution LaTeX comme texlive est nécessaire pour compiler le document. À noter que certains paquets nécessaires à ce document ne sont pas toujours directement inclus dans une installation de base de texlive.
+
+Paquets additionnels nécessaires :
+
+- Fedora (installer avec `dnf install`)
+	- texlive-abstract
+	- texlive-wallpaper
+- Autres distributions
+	- TODO (contributions bienvenues)
+
+
+### Compiler le latex en pdf
+
+Le `Makefile` fournis vous aide à compiler votre document.
+Il utilise `pdflatex` et `biber` pour générer le fichier pdf et peut l'afficher grâce à `evince` sur Linux et `open` sur MacOS.
+
+Compiler votre document avec `pdflatex/biber` :
+
+    make
+
+Afficher le pdf généré :
+
+    make viewpdf
+
+Suppimer tous les fichiers générés, pdf inclus :
+
+    make clean
+
+
+### Spécificités d'un document multilingue
+
+La liste des langues utilisées est définie dans `these-dbl.cls` où le paquet `babel` est importé.
+Etant donné que la quatrième de couverture contient du français et de l'anglais, il est nécessaire de conserver au moins ces deux langues dans la liste.
+Il faut utiliser `\selectlanguage{x}` (où x est `french` ou `english`) pour changer de langue après son insertion.
+
+Si la langue principale du contenu du document est l'anglais, vous devez effectuer quelques modifications au modèle :
+
+- remplacer `\selectlanguage{french}` par `\selectlanguage{english}` dans `main.tex`
+- modifier la ligne 488 de `these-dbl.cls` pour remplacer `Partie` par `Part` dans les entêtes
+- inclure un résumé en français d'au moins 4 pages
+
+
+### Changer la police du contenu
+
+La police imposée pour les couvertures est Arial mais vous pouvez utiliser une autre police pour le contenu de la thèse en redéfinissant les commandes `\rmdefault` et `\sfdefault` comme commenté dans `these-dbl.cls`.
+Actuellement la police par défaut de latex est celle utilisée pour le contenu.
+
+
+-----
+
+# ![(UK flag)](https://upload.wikimedia.org/wikipedia/en/thumb/a/ae/Flag_of_the_United_Kingdom.svg/50px-Flag_of_the_United_Kingdom.svg.png) Thesis template for Doctorat Bretagne Loire
+
+*English explanations*
+
+This repository contains a template for the thesis manuscript supporting all doctoral schools of the [École des docteurs Bretagne Loire](https://www.doctorat-bretagneloire.fr/).
+
+The main goal of this template is to provide both front and back covers of the thesis manuscript entirely written in latex.
+These covers have been (manually) reproduced from the original M$ Word model provided by MathSTIC in 2018, then generalized to all doctoral schools.
+While the manuscript covers must follow the rules of DBL, the internal layout of the content is more flexible.
+The content layout provided in this template is therefore given as an exemple rather than as a  mandatory framework.
+
+
+#### Structure of the repository
+
+- `main.tex` contains the backbone structure of the document, no content is present in this file
+- `these-dbl.cls` contains the package dependencies, bibliography parameters including citation style and overall layout specifications including both front and back cover layouts
+- `Couverture-these/pagedegarde.tex` contains the variables that must be filled by the author to complete the front cover, these variables are used in `\maketitle` redefined in `these-dbl.cls`
+- `Couverture-these/resume.tex` contains the variables that must be filled by the author to complete the back cover, these variables are used in the macros defined in `these-dbl.cls`
+- The `Makefile` helps you compile the latex and bibliography into a pdf (details below)
+- The rest of the directories each contain one chapter of the document
+
+
+#### Fill the front and back cover
+
+The front cover details must be provided by filling the variables in `Couverture-these/pagedegarde.tex`.
+The lines calling the `\ecoledoctorale{}` and `\etablissement{}` (i.e., institution) commands must be modified to adapt the covers to the doctoral school and institution(s) delivering the diplome (by default set to MathSTIC and Université de Rennes 1, respectively).
+The file `Couverture-these/README.md` lists the supported doctoral schools and institutions, and contains links pointing to lists of domains and labs/faculties, for each doctoral school, that are needed to fill the front cover (commands `\spec{}` and `\uniterecherche{}`).
+Modifying the front cover layout defined in `these-dbl.cls` should only be necessary to preserve the original layout using a few `\vspace` after filling the front cover (e.g., domain, jury section).
+
+The back cover variables that must be filled are in `Couverture-these/resume.tex`.
+
+
+#### Dependencies
+
+A LaTeX distribution such as texlive is necessary in order to compile your document. Please note some necessary packages are not directly included in a base texlive installation.
+
+Required additional packages:
+
+- Fedora (install using `dnf install`)
+	- texlive-abstract
+	- texlive-wallpaper
+- Other distributions
+	- TODO (contributions welcome)
+
+
+#### Compile latex into pdf
+
+A `Makefile` is provided to help you compile your document. It uses `pdflatex` and`biber` to generate the pdf file and can display it by using `evince` on Linux or `open` on MacOS.
+
+Compile your document with `pdflatex/biber`:
+
+	make
+
+Display the generated pdf:
+
+	make viewpdf
+
+Remove all generated files, pdf included:
+
+	make clean
+
+
+#### Particularities of a multilanguage document
+
+The list of used languages is defined in `these-dbl.cls` where the package `babel` is imported.
+As the back cover contains both French and English, it is necessary to keep at least both these languages in the list.
+Use `\selectlanguage{x}` (where x is `french` or `english`) to switch language after its usage.
+
+If the main language of your document is English, you must apply the following changes to the provided template:
+
+- replace `\selectlanguage{french}` by `\selectlanguage{english}` in `main.tex`
+- edit line 488 of `these-dbl.cls` to replace `Partie` by `Part` in the headers
+- include a summary in French of at least 4 pages
+
+
+### Change the content font
+
+The required font for the covers is Arial but you can use another font for the content of the thesis by redefining the commands `\rmdefault` and `\sfdefault` as commented out in `these-dbl.cls`.
+Currently the latex default font is the one used for the content.
+
+
+-----
+
+# ![(git logo)](https://www.inf.usi.ch/postdoc/roman/img/logo_git_50x50.png) Contribute
+
+To propose changes, (1) you first have to request access to this repository (top page link, [HOWTO](https://docs.gitlab.com/ee/user/project/members/#project-membership-and-requesting-access)), (2) push your branch to this repository once access is granted, and finally (3) create a pull request with your proposed changes. Or just create an issue.
+
+Maintainers: Pierre-Louis Roman (romanp@usi.ch).
+
+Contributors: Louiza Yala (original & main developer), Josquin Debaz, Pierre-Louis Roman, Lucas Bourneuf, Corentin Guezenoc, Clément Elbaz, Florian Arrestier, Alexandre Honorat.
+
+Upstream git repository: https://gitlab.inria.fr/ed-mathstic/latex-template
+
+Previous upstream git repository (2019-2020): https://gitlab.inria.fr/proman/mathstic-thesis-template
+
+Alternative repositories with a squashed history:
+- https://github.com/remolaz/PhD_Thesis_Template_MathSTIC
+- https://github.com/ahonorat/mathSTICtemplatePhD
--- a/bibliography/bibliography.bib
+++ b/bibliography/bibliography.bib
--- a/couverture-fond/image-fond-dos.png
+++ b/couverture-fond/image-fond-dos.png
--- a/couverture-fond/image-fond-dos2.png
+++ b/couverture-fond/image-fond-dos2.png
--- a/couverture-fond/image-fond-garde.png
+++ b/couverture-fond/image-fond-garde.png
--- a/couverture-fond/logo.png
+++ b/couverture-fond/logo.png
--- a/couverture-logos/UR1-couleur.png
+++ b/couverture-logos/UR1-couleur.png
--- a/couverture-logos/UR1-noir.png
+++ b/couverture-logos/UR1-noir.png
--- a/couverture/README.md
+++ b/couverture/README.md
@ -0,0 +1,212 @@
+
+### 3M - École doctorale Matière, Molécules Matériaux
+
+Spécialités & unités de recherche:
+https://ed-3m.doctorat-bretagneloire.fr/fr/7_presentation
+
+```
+\ecoledoctorale{3M}
+
+\etablissement{ENSCR} % École Nationale Supérieure de Chimie Rennes
+\etablissement{IMTA}  % IMT Atlantique
+\etablissement{INSA}  % Institut National des Sciences Appliquées Rennes
+\etablissement{LMU}   % Le Mans Université
+\etablissement{UA}    % Université d'Angers
+\etablissement{UBO}   % Université de Bretagne Occidentale
+\etablissement{UN}    % Université de Nantes
+\etablissement{UR1}   % Université de Rennes 1
+```
+
+
+### ALL - École doctorale Arts Lettres Langues
+
+
+Spécialités & unités de recherche: https://ed-all.doctorat-bretagneloire.fr/fr/3_presentation
+
+```
+\ecoledoctorale{ALL}
+
+\etablissement{LMU}       % Le Mans Université
+\etablissement{UA}        % Université d'Angers
+\etablissement{UBO}       % Université de Bretagne Occidentale
+\etablissement{UBS}       % Université de Bretagne Sud
+\etablissement{UN}        % Université de Nantes
+\etablissement{UR2}       % Université de Rennes 2
+\etablissement{UR2-ENSAB} % Délivrance conjointe - Joint degrees
+```
+
+
+### BS - École doctorale Biologie Santé
+
+Spécialités & unités de recherche: https://ed-bs.doctorat-bretagneloire.fr/fr/4_presentation
+
+```
+\ecoledoctorale{BS}
+
+\etablissement{EHESP}   % École des Hautes Études en Santé Publique de Rennes
+\etablissement{ENS}     % École Normale Supérieure de Rennes
+\etablissement{Oniris}  % Oniris
+\etablissement{UA}      % Université d'Angers
+\etablissement{UBO}     % Université de Bretagne Occidentale
+\etablissement{UN}      % Université de Nantes
+\etablissement{UR1}     % Université de Rennes 1
+\etablissement{UR1-UR2} % Délivrance conjointe - Joint degrees
+```
+
+
+### DSP - École doctorale Droit et Science politique
+
+Spécialités & unités de recherche: https://ed-dsp.doctorat-bretagneloire.fr/fr/6_presentation
+
+```
+\ecoledoctorale{DSP}
+
+\etablissement{LMU}       % Le Mans Université
+\etablissement{UA}        % Université d'Angers
+\etablissement{UBO}       % Université de Bretagne Occidentale
+\etablissement{UBS}       % Université de Bretagne Sud
+\etablissement{UN}        % Université de Nantes
+\etablissement{UR1}       % Université de Rennes 1
+\etablissement{UR1-EHESP} % Délivrance conjointe - Joint degrees
+\etablissement{UR1-UR2}   % Délivrance conjointe - Joint degrees
+```
+
+
+### EDGE - École doctorale sciences Économiques et sciences De Gestion
+
+Spécialités & unités de recherche: https://ed-edge.doctorat-bretagneloire.fr/fr/10_presentation
+
+```
+\ecoledoctorale{EDGE}
+
+\etablissement{IMTA}         % IMT Atlantique
+\etablissement{InstitutAgro} % Institut Agro - AgroCampus Ouest
+\etablissement{LMU}          % Le Mans Université
+\etablissement{UA}           % Université d'Angers
+\etablissement{UBO}          % Université de Bretagne Occidentale
+\etablissement{UBS}          % Université de Bretagne Sud
+\etablissement{UN}           % Université de Nantes
+\etablissement{UR1}          % Université de Rennes 1
+\etablissement{UR1-EHESP}    % Délivrance conjointe - Joint degrees
+```
+
+
+### EGAAL - École doctorale Écologie, Géosciences, Agronomie et Alimentation
+
+Spécialités & unités de recherche: https://ed-egaal.doctorat-bretagneloire.fr/fr/5_presentation
+
+```
+\ecoledoctorale{EGAAL}
+
+\etablissement{InstitutAgro} % Institut Agro - AgroCampus Ouest
+\etablissement{Oniris}       % Oniris
+\etablissement{UA}           % Université d'Angers
+\etablissement{UA-LMU}       % Délivrance conjointe - Joint degrees
+\etablissement{UBO}          % Université de Bretagne Occidentale
+\etablissement{UN}           % Université de Nantes
+\etablissement{UR1}          % Université de Rennes 1
+```
+
+
+### ELICC - École doctorale Education, Langages, Interactions, Cognition, Clinique
+
+Spécialités & unités de recherche: https://ed-elicc.doctorat-bretagneloire.fr/fr/2_presentation
+
+```
+\ecoledoctorale{ELICC}
+
+\etablissement{LMU} % Le Mans Université
+\etablissement{UA}  % Université d'Angers
+\etablissement{UBO} % Université de Bretagne Occidentale
+\etablissement{UBS} % Université de Bretagne Sud
+\etablissement{UN}  % Université de Nantes
+\etablissement{UR1} % Université de Rennes 1
+\etablissement{UR2} % Université de Rennes 2
+```
+
+
+### MathSTIC - École doctorale MathSTIC
+
+Spécialités & unités de recherche: https://ed-mathstic.doctorat-bretagneloire.fr/fr/8_presentation
+
+```
+\ecoledoctorale{MathSTIC}
+
+\etablissement{CS}               % CentraleSupélec
+\etablissement{ECN}              % École Centrale de Nantes
+\etablissement{ENIB}             % École Nationale d'Ingénieurs de Brest
+\etablissement{ENSAI}            % École Nationale de la Statistique et de l'Analyse de l'Information
+\etablissement{ENS}              % École Normale Supérieure de Rennes
+\etablissement{ENSTA}            % École Nationale Supérieure de Techniques Avancées Bretagne
+\etablissement{IMTA}             % IMT Atlantique
+\etablissement{INSA}             % Institut National des Sciences Appliquées Rennes
+\etablissement{LMU}              % Le Mans Université
+\etablissement{UA}               % Université d'Angers
+\etablissement{UBO}              % Université de Bretagne Occidentale
+\etablissement{UBS}              % Université de Bretagne Sud
+\etablissement{UN}               % Université de Nantes
+\etablissement{UR1}              % Université de Rennes 1
+\etablissement{UR1-InstitutAgro} % Délivrance conjointe - Joint degrees
+\etablissement{UR1-UR2}          % Délivrance conjointe - Joint degrees
+```
+
+
+### SML - École doctorale Sciences de la Mer et du Littoral
+
+Spécialités & unités de recherche: https://ed-sml.doctorat-bretagneloire.fr/fr/9_presentation
+
+```
+\ecoledoctorale{SML}
+
+\etablissement{InstitutAgro} % Institut Agro - AgroCampus Ouest
+\etablissement{LMU}          % Le Mans Université
+\etablissement{UA}           % Université d'Angers
+\etablissement{UBO}          % Université de Bretagne Occidentale
+\etablissement{UBO-IMTA}     % Délivrance conjointe - Joint degrees
+\etablissement{UBS}          % Université de Bretagne Sud
+\etablissement{UN}           % Université de Nantes
+```
+
+
+### SPI - École doctorale Sciences pour l'Ingénieur
+
+Spécialités & unités de recherche: https://ed-spi.doctorat-bretagneloire.fr/fr/11_presentation
+
+```
+\ecoledoctorale{SPI}
+
+\etablissement{ECN}       % École Centrale de Nantes
+\etablissement{ECN-ENSA}  % Délivrance conjointe - Joint degrees
+\etablissement{ENS}       % École Normale Supérieure de Rennes
+\etablissement{ENSTA}     % École Nationale Supérieure de Techniques Avancées Bretagne
+\etablissement{IMTA}      % IMT Atlantique
+\etablissement{INSA}      % Institut National des Sciences Appliquées Rennes
+\etablissement{LMU}       % Le Mans Université
+\etablissement{UA}        % Université d'Angers
+\etablissement{UBO}       % Université de Bretagne Occidentale
+\etablissement{UBO-ENIB}  % Délivrance conjointe - Joint degrees
+\etablissement{UBS}       % Université de Bretagne Sud
+\etablissement{UN}        % Université de Nantes
+\etablissement{UN-Oniris} % Délivrance conjointe - Joint degrees
+\etablissement{UR1}       % Université de Rennes 1
+```
+
+
+### STT - École doctorale Sociétés, temps, territoires
+
+Spécialités & unités de recherche: https://ed-stt.doctorat-bretagneloire.fr/fr/12_acteurs-du-doctorat
+
+```
+\ecoledoctorale{STT}
+
+\etablissement{EHESP}     % École des Hautes Études en Santé Publique de Rennes
+\etablissement{ENSA-UN}   % Délivrance conjointe - Joint degrees
+\etablissement{LMU}       % Le Mans Université
+\etablissement{UA}        % Université d'Angers
+\etablissement{UBO}       % Université de Bretagne Occidentale
+\etablissement{UBS}       % Université de Bretagne Sud
+\etablissement{UN}        % Université de Nantes
+\etablissement{UR1}       % Université de Rennes 1
+\etablissement{UR2}       % Université de Rennes 2
+\etablissement{UR2-ENSAB} % Délivrance conjointe - Joint degrees
+```
--- a/couverture/liste-ecoles-etablissements.tex
+++ b/couverture/liste-ecoles-etablissements.tex
@ -0,0 +1,49 @@
+
+%%% Switch case in latex
+%%% https://tex.stackexchange.com/a/343306
+\makeatletter
+\newcommand\addcase[3]{\expandafter\def\csname\string#1@case@#2\endcsname{#3}}
+\newcommand\makeswitch[2][]{%
+  \newcommand#2[1]{%
+    \ifcsname\string#2@case@##1\endcsname\csname\string#2@case@##1\endcsname\else#1\fi%
+  }%
+}
+\makeatother
+
+%%%% Il faut adapter la taille des logos dans certains cas (e.g., EGAAL, 2 etablissements)
+\newcommand\hauteurlogos[3]{
+    \hauteurlogoecole{#1}
+    \hauteurlogoetablissementA{#2}
+    \hauteurlogoetablissementB{#3}
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%% ECOLES DOCTORALES %%%%%%%%%%%%%%%%
+
+%%%% #1: dossier des images, #2: numero ED, #3: couleur ED, #4-#5: nom complet sur plusieurs lignes
+\newcommand\addecoledoctorale[5]{\direcole{#1}\numeroecole{#2}\definecolor{color-ecole}{RGB}{#3}\nomecoleA{#4}\nomecoleB{#5}}
+
+\makeswitch[default]\ecoledoctorale{}
+
+\addcase\ecoledoctorale{MathSTIC}{\addecoledoctorale
+    {MathSTIC}
+    {601}
+    {236,115,127}
+    {Math\'{e}matiques et Sciences et Technologies}
+    {de l'Information et de la Communication}
+}
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+%%%%%%%%%%%%%%%% ETABLISSEMENTS %%%%%%%%%%%%%%%%
+
+%%%% #1 nom du logo, #2-#4: nom complet sur plusieurs lignes
+\newcommand\addetablissement[4]{\logoetablissementB{#1}\nometablissementC{#2}\nometablissementD{#3}\nometablissementE{#4}}
+
+\makeswitch[default]\etablissement{}
+
+\addcase\etablissement{UR1}{\addetablissement
+    {UR1-noir}
+    {}
+    {}
+    {L'UNIVERSIT\'{E} DE RENNES 1}
+}
--- a/couverture/pagedegarde.tex
+++ b/couverture/pagedegarde.tex
@ -0,0 +1,84 @@
+% La page de garde est en français
+% The front cover is in French
+\selectlanguage{french}
+
+% Inclure les infos de chaque établissement
+% Include each institution data
+\input{./couverture/liste-ecoles-etablissements.tex}
+
+% Inclure infos de l'école doctorale
+% Include doctoral school data
+% (3M ALL BS DSP EDGE EGAAL ELICC MathSTIC SML SPI STT)
+\ecoledoctorale{MathSTIC}
+
+% Inclure infos de l'établissement
+% Include institution data
+\etablissement{UR1}
+
+%Inscrivez ici votre sp\'{e}cialit\'{e} (voir liste des sp\'{e}cialit\'{e}s sur le site de votre \'{e}cole doctorale)
+%Indicate the domain (see list of domains in your ecole doctorale)
+\spec{Informatique}
+
+%Attention : le pr\'{e}nom doit être en minuscules (Jean) et le NOM en majuscules (BRITTEF) 
+%Attention : the first name in small letters and the name in Capital letters 
+\author{Grégory MARTIN}
+
+% Donner le titre complet de la th\`{e}se, \'{e}ventuellement le sous titre, si n\'{e}cessaire sur plusieurs lignes 
+%Give the complete title of the thesis, if necessary on several lines
+\title{Data-driven Vehicle Relocation in\\Free Floating Carsharing Services}
+
+%Indiquer la date et le lieu de soutenance de la th\`{e}se 
+%indicates the date and the place of the defense 
+\date{15/12/2022}
+\lieu{Centre INRIA de l'Université de Rennes}
+
+%Indiquer le nom du (ou des) laboratoire (s) dans le(s)quel(s) le travail de th\`{e}se a \'{e}t\'{e} effectu\'{e}, indiquer aussi si souhait\'{e} le nom de la (les) facult\'{e}(s) (UFR, \'{e}cole(s), Institut(s), Centre(s)...), son (leurs) adresse(s)... 
+%Indicates the name (or names) of research laboratories where the work has been done as well as (if desired) the names of faculties (UFR, Schools, institution...
+\uniterecherche{Institut National de Recherche en Informatique et en Automatique}
+
+%Indiquer le Numero de th\`{e}se, si cela est opportun, ou laisser vide pour faire disparaitre cet ligne de la couverture
+%Indicate the number of the thesis if there is one. otherwise leave empty so the line disappeurs on the cover
+\numthese{« si pertinent »} % \numthese{}
+
+%Indiquer le Pr\'{e}nom en minuscules et le Nom en majuscules, le titre de la personne et l'\'{e}tablissement dans lequel il effectue sa recherche  
+%Indicates the first name on small letters and the Names on capital letters, the person's title and the institution where he/she belongs to.
+%Exemples :  Examples :
+%%%- Professeur, Universit\'{e} d'Angers 
+%%%- Chercheur, CNRS, \'{e}cole Centrale de Nantes 
+%%%-  Professeur d'universit\'{e} – Praticien Hospitalier, Universit\'{e} Paris V  
+%%%-  Maitre de conf\'{e}rences, Oniris 
+%%%- Charg\'{e} de recherche, INSERM, HDR, Universit\'{e} de Tours  
+ %S'il n'y a pas de co-direction, faire disparaitre cet item de la couverture  
+ %In there is no co-director, remove the item from the cover
+\jury{
+{\normalTwelve \textbf{Rapporteurs avant soutenance :}}\\ \newline
+\footnotesizeTwelve
+\begin{tabular}{@{}ll}
+François JACQUENET & Professeur, Université Jean Monnet de Saint-Etienne \\
+Bruno CRÉMILLEUX & Professeur, Université de Caen-Normandie \\
+\end{tabular}
+
+\vspace{\baselineskip}
+{\normalTwelve \textbf{Composition du Jury :}}\\
+{\fontsize{9.5}{11}\selectfont {\textcolor{red}{\textit{Attention, en cas d'absence d'un des membres du Jury le jour de la soutenance, la composition du jury doit être revue pour s'assurer qu'elle est conforme et devra être répercutée sur la couverture de thèse}}}}\\ \newline
+\footnotesizeTwelve
+\begin{tabular}{@{}lll}
+
+Pr\'{e}sident :        & Pr\'{e}nom NOM & Fonction et \'{e}tablissement d'exercice \textit{(à préciser après la soutenance)} \\
+Examinateurs :         & Christine SOLNON & Professeure, INSA de Lyon \\
+                       & Marc PLANTEVIT & Professeur EPITA\\ 
+Dir. de th\`{e}se :    & Elisa FROMONT & Professeure, Université de Rennes 1\\
+Co-dir. de th\`{e}se : & Alexandre TERMIER & Professeur, Université de Rennes 1\\ 
+Co-encadrante : & Laurence ROZ\'E & Maîtresse de Conférences, INSA de Rennes\\ 
+\end{tabular}
+
+\vspace{\baselineskip}
+{\normalTwelve \textbf{Invit\'{e}(s) :}}\\ \newline
+\footnotesizeTwelve
+\begin{tabular}{@{}ll}
+Matthieu DONAIN & Senior Researcher, Stellantis \\
+\end{tabular}
+}
+
+
+\maketitle
--- a/couverture/resume.tex
+++ b/couverture/resume.tex
@ -0,0 +1,39 @@
+\markboth{}{}
+% Plus petite marge du bas pour la quatrième de couverture
+% Shorter bottom margin for the back cover
+\newgeometry{inner=30mm,outer=20mm,top=40mm,bottom=20mm}
+
+%insertion de l'image de fond du dos (resume)
+%background image for resume (back)
+\backcoverheader
+
+% Switch font style to back cover style
+\selectfontbackcover{ % Font style change is limited to this page using braces, just in case
+
+\titleFR{Repositionnement de véhicules pour des services d'autopartage en \textit{free-floating} basé sur les données}
+
+\keywordsFR{Autopartage - Regression - Programmation linéaire en nombre entier - Simulation}
+
+\abstractFR{Eius populus ab incunabulis primis ad usque pueritiae tempus extremum, quod annis circumcluditur fere trecentis, circummurana pertulit bella, deinde aetatem ingressus adultam post multiplices bellorum aerumnas Alpes transcendit et fretum, in iuvenem erectus et virum ex omni plaga quam orbis ambit inmensus, reportavit laureas et triumphos, iamque vergens in senium et nomine solo aliquotiens vincens ad tranquilliora vitae discessit.
+Hoc inmaturo interitu ipse quoque sui pertaesus excessit e vita aetatis nono anno atque vicensimo cum quadriennio imperasset. natus apud Tuscos in Massa Veternensi, patre Constantio Constantini fratre imperatoris, matreque Galla.
+Thalassius vero ea tempestate praefectus praetorio praesens ipse quoque adrogantis ingenii, considerans incitationem eius ad multorum augeri discrimina, non maturitate vel consiliis mitigabat, ut aliquotiens celsae potestates iras principum molliverunt, sed adversando iurgandoque cum parum congrueret, eum ad rabiem potius evibrabat, Augustum actus eius exaggerando creberrime
+docens, idque, incertum qua mente, ne lateret adfectans. quibus mox Caesar acrius efferatus, velut contumaciae quoddam vexillum altius erigens, sine respectu salutis alienae vel suae ad vertenda opposita instar rapidi fluminis irrevocabili impetu ferebatur.
+Hae duae provinciae bello quondam piratico catervis mixtae praedonum.}
+
+
+
+\titleEN{Data-driven Vehicle Relocation in Free Floating Carsharing Services}
+
+\keywordsEN{Carsharing - Regression - Integer Linear Programming - Simulation}
+
+\abstractEN{Eius populus ab incunabulis primis ad usque pueritiae tempus extremum, quod annis circumcluditur fere trecentis, circummurana pertulit bella, deinde aetatem ingressus adultam post multiplices bellorum aerumnas Alpes transcendit et fretum, in iuvenem erectus et virum ex omni plaga quam orbis ambit inmensus, reportavit laureas et triumphos, iamque vergens in senium et nomine solo aliquotiens vincens ad tranquilliora vitae discessit.
+Hoc inmaturo interitu ipse quoque sui pertaesus excessit e vita aetatis nono anno atque vicensimo cum quadriennio imperasset. natus apud Tuscos in Massa Veternensi, patre Constantio Constantini fratre imperatoris, matreque Galla.
+Thalassius vero ea tempestate praefectus praetorio praesens ipse quoque adrogantis ingenii, considerans incitationem eius ad multorum augeri discrimina, non maturitate vel consiliis mitigabat, ut aliquotiens celsae potestates iras principum molliverunt, sed adversando iurgandoque cum parum congrueret, eum ad rabiem potius evibrabat, Augustum actus eius exaggerando creberrime
+docens, idque, incertum qua mente, ne lateret adfectans. quibus mox Caesar acrius efferatus, velut contumaciae quoddam vexillum altius erigens, sine respectu salutis alienae vel suae ad vertenda opposita instar rapidi fluminis irrevocabili impetu ferebatur.
+Hae duae provinciae bello quondam piratico catervis mixtae praedonum.}
+
+}
+
+% Rétablit les marges d'origines
+% Restore original margin settings
+\restoregeometry
--- a/figure/ch0_autolib.jpeg
+++ b/figure/ch0_autolib.jpeg
--- a/figure/ch0_free2move.jpg
+++ b/figure/ch0_free2move.jpg
--- a/figure/ch0_free2move_raw.jpg
+++ b/figure/ch0_free2move_raw.jpg
--- a/figure/ch1_cells_district.jpeg
+++ b/figure/ch1_cells_district.jpeg
--- a/figure/ch1_communauto_area.jpg
+++ b/figure/ch1_communauto_area.jpg
--- a/figure/ch1_decisiontree.jpg
+++ b/figure/ch1_decisiontree.jpg
--- a/figure/ch1_des.jpg
+++ b/figure/ch1_des.jpg
--- a/figure/ch1_ilp.jpg
+++ b/figure/ch1_ilp.jpg
--- a/figure/ch1_ilp.png
+++ b/figure/ch1_ilp.png
--- a/figure/ch1_matsim.jpg
+++ b/figure/ch1_matsim.jpg
--- a/figure/ch1_paris_arrondissement.jpeg
+++ b/figure/ch1_paris_arrondissement.jpeg
--- a/figure/ch1_reloc_weikl.jpeg
+++ b/figure/ch1_reloc_weikl.jpeg
--- a/figure/ch1_simu_simmob.jpeg
+++ b/figure/ch1_simu_simmob.jpeg
--- a/figure/ch1_singapore.jpeg
+++ b/figure/ch1_singapore.jpeg
--- a/figure/ch1_singapore_graph.jpg
+++ b/figure/ch1_singapore_graph.jpg
--- a/figure/ch1_user_brendel.jpeg
+++ b/figure/ch1_user_brendel.jpeg
--- a/figure/ch1_weikl_practice.jpeg
+++ b/figure/ch1_weikl_practice.jpeg
--- a/figure/ch2_customer_study_madrid.jpeg
+++ b/figure/ch2_customer_study_madrid.jpeg
--- a/figure/ch2_customer_study_paris.jpeg
+++ b/figure/ch2_customer_study_paris.jpeg
--- a/figure/ch2_customer_study_paris_raw.jpeg
+++ b/figure/ch2_customer_study_paris_raw.jpeg
--- a/figure/ch2_customer_study_washington.jpeg
+++ b/figure/ch2_customer_study_washington.jpeg
--- a/figure/ch2_depdes_madrid.jpeg
+++ b/figure/ch2_depdes_madrid.jpeg
--- a/figure/ch2_depdes_paris.jpeg
+++ b/figure/ch2_depdes_paris.jpeg
--- a/figure/ch2_depdes_washington.jpeg
+++ b/figure/ch2_depdes_washington.jpeg
--- a/figure/ch2_distance_boxplot.jpeg
+++ b/figure/ch2_distance_boxplot.jpeg
--- a/figure/ch2_grid_madrid.jpeg
+++ b/figure/ch2_grid_madrid.jpeg
--- a/figure/ch2_grid_paris.jpeg
+++ b/figure/ch2_grid_paris.jpeg
--- a/figure/ch2_grid_washington.jpeg
+++ b/figure/ch2_grid_washington.jpeg
--- a/figure/ch2_nbtrip_madrid.jpeg
+++ b/figure/ch2_nbtrip_madrid.jpeg
--- a/figure/ch2_nbtrip_paris.jpeg
+++ b/figure/ch2_nbtrip_paris.jpeg
--- a/figure/ch2_nbtrip_washington.jpeg
+++ b/figure/ch2_nbtrip_washington.jpeg
--- a/figure/ch2_osm_madrid.jpeg
+++ b/figure/ch2_osm_madrid.jpeg
--- a/figure/ch2_osm_paris.jpeg
+++ b/figure/ch2_osm_paris.jpeg
--- a/figure/ch2_osm_washington.jpeg
+++ b/figure/ch2_osm_washington.jpeg
--- a/figure/ch2_tripdayhour_madrid.jpeg
+++ b/figure/ch2_tripdayhour_madrid.jpeg
--- a/figure/ch2_tripdayhour_paris.jpeg
+++ b/figure/ch2_tripdayhour_paris.jpeg
--- a/figure/ch2_tripdayhour_washington.jpeg
+++ b/figure/ch2_tripdayhour_washington.jpeg
--- a/figure/ch3_method_summary.jpeg
+++ b/figure/ch3_method_summary.jpeg
--- a/figure/ch3_optim_madrid.jpeg
+++ b/figure/ch3_optim_madrid.jpeg
--- a/figure/ch3_optim_paris.jpeg
+++ b/figure/ch3_optim_paris.jpeg
--- a/figure/ch3_optim_washington.jpeg
+++ b/figure/ch3_optim_washington.jpeg
--- a/figure/ch4_calibration_curve_madrid.jpeg
+++ b/figure/ch4_calibration_curve_madrid.jpeg
--- a/figure/ch4_calibration_curve_paris.jpeg
+++ b/figure/ch4_calibration_curve_paris.jpeg
--- a/figure/ch4_calibration_curve_washington.jpeg
+++ b/figure/ch4_calibration_curve_washington.jpeg
--- a/figure/ch4_calibration_nbtrip_madrid.jpeg
+++ b/figure/ch4_calibration_nbtrip_madrid.jpeg
--- a/figure/ch4_calibration_nbtrip_paris.jpeg
+++ b/figure/ch4_calibration_nbtrip_paris.jpeg
--- a/figure/ch4_calibration_nbtrip_washington.jpeg
+++ b/figure/ch4_calibration_nbtrip_washington.jpeg
--- a/figure/ch4_expesimu1_madrid_cost_bfp.jpeg
+++ b/figure/ch4_expesimu1_madrid_cost_bfp.jpeg
--- a/figure/ch4_expesimu1_madrid_cost_wfp.jpeg
+++ b/figure/ch4_expesimu1_madrid_cost_wfp.jpeg
--- a/figure/ch4_expesimu1_madrid_profit_bfp.jpeg
+++ b/figure/ch4_expesimu1_madrid_profit_bfp.jpeg
--- a/figure/ch4_expesimu1_madrid_profit_wfp.jpeg
+++ b/figure/ch4_expesimu1_madrid_profit_wfp.jpeg
--- a/figure/ch4_expesimu1_madrid_utility_bfp.jpeg
+++ b/figure/ch4_expesimu1_madrid_utility_bfp.jpeg
--- a/figure/ch4_expesimu1_madrid_utility_wfp.jpeg
+++ b/figure/ch4_expesimu1_madrid_utility_wfp.jpeg
--- a/figure/ch4_expesimu1_paris_cost_bfp.jpeg
+++ b/figure/ch4_expesimu1_paris_cost_bfp.jpeg
--- a/figure/ch4_expesimu1_paris_cost_wfp.jpeg
+++ b/figure/ch4_expesimu1_paris_cost_wfp.jpeg
--- a/figure/ch4_expesimu1_paris_profit_bfp.jpeg
+++ b/figure/ch4_expesimu1_paris_profit_bfp.jpeg
--- a/figure/ch4_expesimu1_paris_profit_wfp.jpeg
+++ b/figure/ch4_expesimu1_paris_profit_wfp.jpeg
--- a/figure/ch4_expesimu1_paris_utility_bfp.jpeg
+++ b/figure/ch4_expesimu1_paris_utility_bfp.jpeg
--- a/figure/ch4_expesimu1_paris_utility_wfp.jpeg
+++ b/figure/ch4_expesimu1_paris_utility_wfp.jpeg
--- a/figure/ch4_expesimu1_washington_cost_bfp.jpeg
+++ b/figure/ch4_expesimu1_washington_cost_bfp.jpeg
--- a/figure/ch4_expesimu1_washington_cost_wfp.jpeg
+++ b/figure/ch4_expesimu1_washington_cost_wfp.jpeg
--- a/figure/ch4_expesimu1_washington_profit_bfp.jpeg
+++ b/figure/ch4_expesimu1_washington_profit_bfp.jpeg
--- a/figure/ch4_expesimu1_washington_profit_wfp.jpeg
+++ b/figure/ch4_expesimu1_washington_profit_wfp.jpeg
--- a/figure/ch4_expesimu1_washington_utility_bfp.jpeg
+++ b/figure/ch4_expesimu1_washington_utility_bfp.jpeg
--- a/figure/ch4_expesimu1_washington_utility_wfp.jpeg
+++ b/figure/ch4_expesimu1_washington_utility_wfp.jpeg
--- a/figure/ch4_expesimu2_madrid_nbjockey.jpeg
+++ b/figure/ch4_expesimu2_madrid_nbjockey.jpeg
--- a/figure/ch4_expesimu2_paris_nbjockey.jpeg
+++ b/figure/ch4_expesimu2_paris_nbjockey.jpeg
--- a/figure/ch4_expesimu2_washington_nbjockey.jpeg
+++ b/figure/ch4_expesimu2_washington_nbjockey.jpeg
--- a/figure/ch4_expesimu3_paris_urmax.jpeg
+++ b/figure/ch4_expesimu3_paris_urmax.jpeg
--- a/figure/ch4_expesimu3_paris_utility_bfp.jpeg
+++ b/figure/ch4_expesimu3_paris_utility_bfp.jpeg
--- a/figure/ch4_expesimu3_paris_utility_wfp.jpeg
+++ b/figure/ch4_expesimu3_paris_utility_wfp.jpeg
--- a/figure/ch4_expesimu3_washington_urmax.jpeg
+++ b/figure/ch4_expesimu3_washington_urmax.jpeg
--- a/figure/ch4_expesimu3_washington_utility_bfp.jpeg
+++ b/figure/ch4_expesimu3_washington_utility_bfp.jpeg
--- a/figure/ch4_expesimu3_washington_utility_wfp.jpeg
+++ b/figure/ch4_expesimu3_washington_utility_wfp.jpeg
--- a/figure/ch4_greedy_jockey_path.jpeg
+++ b/figure/ch4_greedy_jockey_path.jpeg
--- a/figure/ch4_incentive_madrid.jpeg
+++ b/figure/ch4_incentive_madrid.jpeg
--- a/figure/ch4_incentive_paris.jpeg
+++ b/figure/ch4_incentive_paris.jpeg
--- a/Show More
+++ b/Show More