During spring the sheep on the family farm get their lambs. One of the trickiest aspects of this time period seeing which sheep are having lambs at which moment. Traditionally we would frequently go to the sheep pens and just look. However this is quite time consuming and you can still somewhat easily miss a sheep about to give birth. The first technological improvement we've implemented is adding cameras above each pen such that we have at least some indication what is going on when we are not near the sheep pen.
However, as someone in the computer science field I saw an opportunity to try to experiment with AI to see if we could keep track of the sheep on the video feed using computer vision. The ultimate goal would be the train some sort of AI on the behaviour and identify anomolies (sick sheep/giving birth/etc).
This post is the first outlining my experimentation with using computer vision on the video feeds I have available.
Disclaimer: I have very limited knowledge in the field of AI, anything in these blog posts should not be taken as best practice, just as my experimentation
The data I'll be working with is the video feeds from the cameras above the sheep pen. The biggest advantage I have in this regard is that each of the cameras are situated fairly similarly, with a fisheye view of the pen. I don't have to think about the many different perspectives. This also comes with the advantage that I could easily obtain plenty of video footage for training/testing of computer vision. This also is the biggest disadvantage, the cameras are all above, this normal 'portrait' shots of sheep in other data sets aren't very applicable. This means that I figured I would have to manually annotated the video feeds. A bit tedious, but with good annotation software this shouldn't be too much of a problem.
Techstack & Annotation
One of the main points of concern for me when choicing the techstack was ease of use. I quickly wanted to have a PoC with minimal effort. As I am most proficient in the Python programming language I looked around and settled on using ImageAI as the framework of choice. It has some excellent documentation and plenty of tutorials. Many very close to what I wanted to do. The annotation format that ImageAI required the Pascal VOC format, looking around it seems like LabelImg is the easiest tool for annotation the stills taken from the video feeds I have available.
Most of the code is pretty much a carbon copy of the excellent documentation of ImageAI and a great medium post I found.
The main change I made is that I didn't manually divide the data (as that is not fun to do and becomes tedious if I keep adding images). I keep all images and annotations in a single folder and just let the code divide it in a training and test set. Very basic file manupulation.
First matching the jpgs with the xml annotations:
pairs =  for item in folder.iterdir(): if item.suffix == ".xml": pngitem = item.with_suffix(".jpg") if pngitem.exists(): pairs.append((item, pngitem)) return pairs
And then dividing it in the train/test set following some sort of ratio:
ntrain = int(len(pairs) * (1-ratio)) ntest = len(pairs) - ntrain trainindices = list(range(len(pairs))) np.random.shuffle(trainindices) trainindices = trainindices[:ntrain] testindices = [x for x in range(len(pairs)) if x not in trainindices] os.makedirs(str(target_folder/ "train")) os.makedirs(str(target_folder/ "train"/"images")) os.makedirs(str(target_folder/ "train"/"annotations")) os.makedirs(str(target_folder/ "test")) os.makedirs(str(target_folder/ "test" / "images")) os.makedirs(str(target_folder/ "test"/"annotations")) for ti in trainindices: pair = pairs[ti] shutil.copy(str(pair), str(target_folder/ "train" / "annotations")) shutil.copy(str(pair), str(target_folder/ "train" / "images")) for ti in testindices: pair = pairs[ti] shutil.copy(str(pair), str(target_folder/ "test" / "annotations")) shutil.copy(str(pair), str(target_folder/ "test" / "images"))
Then the carbon-copied training using the pretrained YoloV3 model.
trainer.setDataDirectory(data_directory=datafolderstr) trainer.setTrainConfig(object_names_array=['sheep'], batch_size=3, num_experiments=160, train_from_pretrained_model="pretrained-yolov3.h5") trainer.trainModel()
The first results
After the first training session of 50 epochs, (I am not very patient, it should have been 160) I decided to try out the model, just to get a feeling for the viability
trainer.setTrainConfig(object_names_array=['sheep'], batch_size=3, num_experiments=160, train_from_pretrained_model="pretrained-yolov3.h5") ... 32/32 [==============================] - 111s 3s/step - loss: 46.8834 - yolo_layer_loss: 8.3166 - yolo_layer_1_loss: 11.0930 - yolo_layer_2_loss: 16.2352 - val_loss: 52.6401 - val_yolo_layer_loss: 10.5831 - val_yolo_layer_1_loss: 12.3149 - val_yolo_layer_2_loss: 18.5034 Epoch 50/160 10/32 [========>.....................] - ETA: 1:36 - loss: 49.7423 - yolo_layer_loss: 9.9249 - yolo_layer_1_loss: 13.0551 - yolo_layer_2_loss: 15.5236
Using the last resultant model in the
video_detector = CustomVideoObjectDetection() video_detector.setModelTypeAsYOLOv3() video_detector.setModelPath("sheep/models/detection_model-ex-045--loss-0046.173.h5") video_detector.setJsonPath("sheep/json/detection_config.json") video_detector.loadModel() video_detector.detectObjectsFromVideo(input_file_path="videos/6rVTaOQ37T.mp4", output_file_path="videos/6rVTaOQ37T.detected.mp4", minimum_percentage_probability=30, log_progress=True)
A still of the resultant detection is shown below. Overall, most sheep are fairly well detected, but there are some clear artifacts. Some sheep are missed entirely and there are double detections present as well. The shown probabilites are also low, mostly hovering around the 50 mark. I imagine this would be improved by actually letting the model train for the full set of epochs (and with more data).
For a video file showing the initial model on actual moving sheep. This shows that the model is quite capable of tracking a sheep through space as well, but of course there is no actual 'tracking' happening as such yet.
For a first attempt I am extremely satisfied with the results. There are a few next points to address.
- I need to get some more annotated data, also from the night view. Perhaps even train a new model for the night view
- I need to allow the model to actually train for a full set of epochs. I'll either have to leverage some of the GPUs I have in my cluster or just use the raw compute of my cluster.
- Just for fun, I want to set up a real time feed for each of the pens with the tracking overlay. The CCTV software, Shinobi, I use already has provisions for this.
- The most difficult points are still not addressed. There is only sheep detection in an image, no actual tracking. I am very much unsure on how to address this