Monitoring the driver's activity using 3D information

  1. Peláez Coronado, Gustavo Adolfo
Supervised by:
  1. Arturo de la Escalera Hueso Director
  2. José María Armingol Moreno Director

Defence university: Universidad Carlos III de Madrid

Fecha de defensa: 24 July 2015

  1. Andrés Iborra García Chair
  2. Francisco José Rodríguez Urbano Secretary
  3. José Manuel Pastor García Committee member

Type: Thesis


Driver supervision is crucial in safety systems for the driver. It is important to monitor the driver to understand his necessities, patterns of movements and behaviour under determined circumstances. The availability of an accurate tool to supervise the driver’s behaviour allows multiple objectives to be achieved such as the detection of drowsiness (analysing the head movements and blinking pattern) and distraction (estimating where the driver is looking by studying the head and eyes position). Once the misbehaviour is detected in both cases an alarm, of the correct type according to the situation, could be triggered to correct the driver’s behaviour. This application distinguishes itself form other driving assistance systems due to the fact that it is oriented to analyse the inside of the vehicle instead of the outside. It is important to notice that inside supervising applications are as important as the outside supervising applications because if the driver falls asleep, a pedestrian detection algorithm can do only limited actions to prevent the accident. All this under the best and predetermined circumstances. The application has the potential to be used to estimate if the driver is looking at certain area where another application detected that an obstacle is present (inert object, animal or pedestrian). Although the market has already available technologies, able to provide automatic driver monitoring, the associated cost of the sensors to accomplish this task is very high as it is not a popular product (compared to other home or entertaining devices) nor there is a market with a high demand and supply for this sensors. Many of these technologies require external and invasive devices (attach one or a set of sensors to the body) which may interfere the driving movements proper of the nature of the driver under no supervised conditions. Current applications based on computer vision take advantage of the latest development of information technologies and the increase in computational power to create applications that fit to the criteria of a non-invasive method for driving monitoring application. Technologies such as stereo and time of flight cameras are able to overcome some of the difficulties related to computer vision applications such as extreme lighting conditions (too dark or too bright) saturation of the colour sensors and lack of depth information. It is true that the combination of different sensors can overcome this problems by performing multiple scans from different areas or by combining the information obtained from different devices but this requires an additional step of calibration, positioning and it involves a dependability factor of the application on not one but as many sensors included in the task to perform the supervision because if one of them fails, the results may not be correct. Some of the recent gaming sensors available in the market, such as the Kinect sensor bar form Microsoft, are providing a new set of previously-expensive sensors embedded in a low cost device, thus providing 3D information together with some additional features and without the need for complex sets of handcrafted system that can fail as previously mentioned. The proposed solution in this thesis monitors the driver by using the different data from the Kinect sensor (depth information, infrared and colour image). The fusion of the information from the different sources allows the usage of 2D and 3D algorithms in order to provide a reliable face detection, accurate pose estimation and trustable detection of facial features such as the eyes and nose. The system will compare, with an average speed over 10Hz, the initial face capture with the next frames, it will compare by an iterative algorithm previously configured with the compromise of accuracy and speed. In order to determine the reliability and accuracy of the proposed system, several tests were performed for the head-pose orientation algorithm with an Inertial Measurement Unit (IMU) attached to the back of the head of the collaborative subjects. The inertial measurements provided by the IMU were used as a ground truth for three degrees of freedom (3DoF) tests (yaw, pitch and roll). Finally, the tests results were compared with those available in current literature to check the performance of the algorithm presented. Estimating the head orientation is the main function of this proposal as it is the one that delivers more information to estimate the behaviour of the driver. Whether it is to have a first estimation if the driver is looking to the front or if it is presenting signs of fatigue when nodding. Supporting this tool, is another that is in charge of the analysis of the colour image that will deal with the study of the eyes of the driver. From this study, it will be possible to estimate where the driver is looking at by estimating the gaze orientation through the position of the pupil. The gaze orientation would help, along with the head orientation, to have a more accurate guess regarding where the driver is looking. The gaze orientation is then a support tool that complements the head orientation. Another way to estimate a hazardous situation is with the analysis of the opening of the eyes. It can be estimated if the driver is tired through the study of the driver’s blinking pattern during a determined time. If it is so, the driver increases the chance to cause an accident due to drowsiness. The part of the whole solution that deals with solving this problem will analyse one eye of the driver to estimate if it is closed or open according to the analysis of dark regions in the image. Once the state of the eye is determined, an analysis during a determined period of time will be done in order to know if the eye was most of the time closed or open and thus estimate in a more accurate way if the driver is falling asleep or not. This 2 modules, drowsiness detector and gaze estimator, will complement the estimation of the head orientation with the goal of getting more certainty regarding the driver’s status and, when possible, to prevent an accident due to misbehaviours. It is worth to mention that the Kinect sensor is built specifically for indoor use and connected to a video console, not for the outside. Therefore, it is inevitable that some limitations arise when performing monitoring under real driving conditions. They will be discussed in this proposal. However, the algorithm presented can be used with any point-cloud based sensor (stereo cameras, time of flight cameras, laser scanners etc...); more expensive, but less sensitive compared to the former. Future works are described at the end in order to show the scalability of this proposal.