Construction Site Hard Hat & Proximity Detection

Construction Site Hard Hat & Proximity Detection

How Artificial Intelligence and Machine Learning can be used to reduce injuries and protect employees.

Nowadays Artificial Intelligence (A.I.) and Machine Learning is being used to help companies attempt to answer many critical cases:

  • How we can optimize process X?
  • What we can do to increase safety on constructions?
  • What revenue we can expect tomorrow?

Construction-site accidents remain a major source of injuries leading to unnecessary costs suffered by the companies. The Occupational Safety and Health Administration (OSHA) reports, that one in ten construction site workers gets injured every year [1,2]. Roughly 150,000 construction site accident injuries are reported each year according to the Bureau of Labor Statistics [1]. While some of these events are pure accidents, a major percentage is caused by the negligence of employees and by the lack of adherence to safety precautions.  Modern AI-based approaches allow for automation of monitoring of the construction sites. This kind of monitoring can be performed in real-time using video from security cameras.

Not wearing a hard hat or helmet while on the construction site is an example of a common security hazard that leads to increased severity of sustained injuries. Object recognition is of the main areas in the research on machine learning. A number of pre-trained neural network models is publicly available, allowing for faster and cheaper development of commercial solutions.

We propose an example solution that can be used for automatic detection of employees not wearing protective head gear while in hazardous areas. The project is a prototype that can be further adjusted to meet the desired requirements. The code was prepared in Python using publicly available libraries and datasets.  We used SSD-512, which is an example of a publicly available pre-trained network that is commonly used for object detection purposes. Depending on the available hardware and number of cameras, another pre-trained network can be used to reach the desired balance between the accuracy and the speed of computation.

We used a two steps approach – first we detect heads visible in the scene, then we detect people wearing hard hats. Single-step approach based on detection of people visible in the scene and their hard hats can result in a significant number of false alerts, one potential problem comes from a variation of the position of analyzed person – if the vision of the hard hat is obscured while the rest of the body is clearly visible, a false alarm may be raised. A Head detection module may be added to reduce the number of these false alerts.

In the first step, we used the pre-trained model for head detection based on SSD-512 network.

To run the model, first download the code and pre-trained weights (we used FloydHub).

Prepare a python3 virtualenv:
virtualenv --system-site-packages -p python3 <venv_path>
source <venv_path>/bin/activate
pip install -r <download_path>/ssd_head_keras/requirements.txt

Let’s consider a photo depicting a group of three people – one wearing a hard hat and two without protection.

The second model was built using publicly available YOLO-type networks. YOLO (You only look once) network offers improved computation speed (each image is passed once) while preserving acceptable accuracy, which allows it to be used for real-time detection even on weaker hardware platforms. Again, we’ll go through a brief overview of keynote parts of the code.

First, we setup the model and load pre-trained weights:

#Make the model 
yolo = YOLO(backend = config['model']['backend'],
                input_size = config['model']['input_size'], 
                labels = config['model']['labels'], 
                max_box_per_image = config['model']['max_box_per_image'],
                anchors = config['model']['anchors'])
#Load trained weights 
image = cv2.imread(image_path)
boxes = yolo.predict(image)

Each box has a label (person with hard hat/hard hat) and position details. These sets of coordinates with labels can be combined with head position data to detect people without hard hats.  We combine the detection of hard hats with the detection of people in order to avoid detecting hard hats that just lay around – if we also wanted to combine detection with the identification of entrance-to-restricted-area type events, focusing just on hard hats could result in an increased number of false alerts. Sample visualization of the output is shown below:

In this demonstration, we used the Keras library with the TensorFlow backend. While the model can be run on the CPU to detect security hazards in photos or archived security camera material, achieving real-time analysis of video input requires the GPU support.

Presented models offer generic solutions and thus sacrifice accuracy for the generalization. If more information about the context of use is available (eg. all hard hats have the same color or only employees wearing a certain type of clothing are allowed to enter the restricted area), they can be trained to achieve higher accuracy of detection by using custom, context-specific datasets.

The parameters used for the detection can also be calibrated to achieve the desired balance between the detection rate and the number of false alarms. 

The proposed solution can work with both photo and video data.  Depending on the hardware used, high accuracy detection of potential security hazards can be performed in real-time.

Depending on the placement of security cameras, detection can be applied to either the whole scene or just a restricted, predefined part of it.

Another potential source of accidents comes from workers entering restricted areas. Video from security cameras can be used to detect people in parts of the scene, entrance to which should be restricted. Since the solution proposed for detection of people without hard hats gives us information about the position of detected people, the lists of their coordinates may be filtered to detect only people in the part of the scene, eg. if in the image below the entrance to the left 1/3rd of the scene is prohibited.

Now, we filter detection output to only detect people in that part of the scene:



Special thanks to Oleksiy Yatsko for his contributions to this blog

Related Posts