Semantic image segmentation using a U-Net-like architecture¶

When applying/using photogrammetry, it is sometimes needed to segment the object to reconstruct it. When the number of photos is high, the task can be time-consuming. To help with this painstaking process, I tested semantic segmentation using deep learning methods, especially a U-net model.

Here I present some photos of a vertebra currently housed in the museum of Namibia under the care of Helke Mocke within the frame of a project developed by Amélie Beaudet (University of Cambridge). We took a number of pictures of this vertebra to build a 3D model with photogrammetry. The objective is to segment the vertebra in each photo in order to build an accurate 3D model.

The code is adapted from https://keras.io/examples/vision/oxford_pets_image_segmentation/

Data¶

First, we defined the folders containing the photos and the associated masks. I used 153 photos for the training and validation datasets. There are 3 important parameters in the code below: the size of the images, the number of classes and the batch size. The dimension of the images has to correspond to a square, and my original photos were 6000x4000. So, the images were deformed to fit into a square. Another solution could be to cut the images so that the dimensions could correspond to a square. I have two classes only: the bone (vertebra) or the background, so the number of classes is 2. The batch size defines the number of samples that will be propagated throughout the network. I need to investigate more but, in this case, I got better results with a batch size of 1.

In [1]:

import os

input_dir = "/media/aj/gemeaux/work_new/En_cours_new/photogrammetrie_Namibie/test_ml/input"
target_dir = "/media/aj/gemeaux/work_new/En_cours_new/photogrammetrie_Namibie/test_ml/mask"

img_size = (128, 128)
num_classes = 2
batch_size = 1

input_img_paths = sorted(
    [
        os.path.join(input_dir, fname)
        for fname in os.listdir(input_dir)
        if fname.endswith(".JPG")
    ]
)
target_img_paths = sorted(
    [
        os.path.join(target_dir, fname)
        for fname in os.listdir(target_dir)
        if fname.endswith(".png") and not fname.startswith(".")
    ]
)

print("Number of segmented images:", len(input_img_paths))

Number of segmented images: 153

Here are some classes for loading and having access to the photos and masks

In [2]:

from IPython.display import display
from tensorflow.keras.preprocessing.image import load_img
import PIL
from PIL import ImageOps
from PIL import Image
from tensorflow import keras
import numpy as np
from tensorflow.keras.preprocessing.image import load_img

class BonePhotosTrain(keras.utils.Sequence):
    """Helper to iterate over the data (as Numpy arrays)."""

    def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
        self.batch_size = batch_size
        self.img_size = img_size
        self.input_img_paths = input_img_paths
        self.target_img_paths = target_img_paths

    def __len__(self):
        return len(self.target_img_paths) // self.batch_size

    def __getitem__(self, idx):
        """Returns tuple (input, target) correspond to batch #idx."""
        i = idx * self.batch_size
        batch_input_img_paths = self.input_img_paths[i : i + self.batch_size]
        batch_target_img_paths = self.target_img_paths[i : i + self.batch_size]
        x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
        for j, path in enumerate(batch_input_img_paths):
            img = load_img(path, target_size=self.img_size)
            x[j] = img
        y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype="uint8")
        for j, path in enumerate(batch_target_img_paths):
            img = load_img(path, target_size=self.img_size, color_mode="grayscale")
            y[j] = np.expand_dims(img, 2)
            y[j] = y[[j]]
            y[j] = y[j] / 255
        return x, y
    
    def display_image(self, idx):
        x, y = self.__getitem__(idx)
        x = x[0]
        x = x.astype(np.uint8)
        x = Image.fromarray(x, 'RGB')
        display(x)
        y = y[0] * 255
        y = y.astype(np.uint8)
        y = Image.fromarray(y.reshape(y.shape[0], y.shape[1]), 'L')
        display(y)


class BonePhotosNew(keras.utils.Sequence):
    """Helper to iterate over the data (as Numpy arrays)."""

    def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
        self.batch_size = batch_size
        self.img_size = img_size
        self.input_img_paths = input_img_paths
        self.target_img_paths = target_img_paths

    def __len__(self):
        return len(self.input_img_paths) // self.batch_size

    def __getitem__(self, idx):
        """Returns tuple (input, target) correspond to batch #idx."""
        i = idx * self.batch_size
        batch_input_img_paths = self.input_img_paths[i : i + self.batch_size]
        batch_target_img_paths = self.target_img_paths[i : i + self.batch_size]
        x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
        for j, path in enumerate(batch_input_img_paths):
            img = load_img(path, target_size=self.img_size)
            x[j] = img
        return x

    def display_image(self, idx):
        x = self.__getitem__(idx)
        x = x[0]
        x = x.astype(np.uint8)
        x = Image.fromarray(x, 'RGB')
        display(x)

Then we randomly define the training and validation datasets (123 images for the training and 30 images for the validation).

In [3]:

import random

# Split our img paths into a training and a validation set
val_samples = 30
random.Random(len(input_img_paths)).shuffle(input_img_paths)
random.Random(len(input_img_paths)).shuffle(target_img_paths)
train_input_img_paths = input_img_paths[:-val_samples]
train_target_img_paths = target_img_paths[:-val_samples]
val_input_img_paths = input_img_paths[-val_samples:]
val_target_img_paths = target_img_paths[-val_samples:]

train_gen = BonePhotosTrain(batch_size, img_size, train_input_img_paths, train_target_img_paths)
val_gen = BonePhotosTrain(batch_size, img_size, val_input_img_paths, val_target_img_paths)

In [4]:

print("Example of training images:")
train_gen.display_image(0)

print("Example of validation images:")
val_gen.display_image(15)

Example of training images:

Example of validation images:

The architecture model¶

Then we can define the Unet-like architecture model (the same that is used by fchollet)

In [5]:

from tensorflow.keras import layers


def get_model(img_size, num_classes):
    inputs = keras.Input(shape=img_size + (3,))

    ### [First half of the network: downsampling inputs] ###

    # Entry block
    x = layers.Conv2D(32, 3, strides=2, padding="same")(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.Activation("relu")(x)

    previous_block_activation = x  # Set aside residual

    # Blocks 1, 2, 3 are identical apart from the feature depth.
    for filters in [64, 128, 256]:
        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(filters, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.Activation("relu")(x)
        x = layers.SeparableConv2D(filters, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.MaxPooling2D(3, strides=2, padding="same")(x)

        # Project residual
        residual = layers.Conv2D(filters, 1, strides=2, padding="same")(
            previous_block_activation
        )
        x = layers.add([x, residual])  # Add back residual
        previous_block_activation = x  # Set aside next residual

    ### [Second half of the network: upsampling inputs] ###

    for filters in [256, 128, 64, 32]:
        x = layers.Activation("relu")(x)
        x = layers.Conv2DTranspose(filters, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.Activation("relu")(x)
        x = layers.Conv2DTranspose(filters, 3, padding="same")(x)
        x = layers.BatchNormalization()(x)

        x = layers.UpSampling2D(2)(x)

        # Project residual
        residual = layers.UpSampling2D(2)(previous_block_activation)
        residual = layers.Conv2D(filters, 1, padding="same")(residual)
        x = layers.add([x, residual])  # Add back residual
        previous_block_activation = x  # Set aside next residual

    # Add a per-pixel classification layer
    outputs = layers.Conv2D(num_classes, 3, activation="softmax", padding="same")(x)

    # Define the model
    model = keras.Model(inputs, outputs)
    return model


# Free up RAM in case the model definition cells were run multiple times
keras.backend.clear_session()

# Build model
model = get_model(img_size, num_classes)
#model.summary()

2021-10-27 16:28:21.744885: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

And then we can train the model

In [6]:

# Configure the model for training.
# We use the "sparse" version of categorical_crossentropy
# because our target data is integers.
model.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy")

callbacks = [
    keras.callbacks.ModelCheckpoint("oxford_segmentation.h5", save_best_only=True)
]

# Train the model, doing validation at the end of each epoch.
epochs = 10
model.fit(train_gen, epochs=epochs, validation_data=val_gen, callbacks=callbacks)

2021-10-27 16:28:22.464964: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-10-27 16:28:22.481843: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299965000 Hz

Epoch 1/10
123/123 [==============================] - 58s 454ms/step - loss: 0.8894 - val_loss: 0.2098
Epoch 2/10
123/123 [==============================] - 56s 457ms/step - loss: 0.1010 - val_loss: 0.3320
Epoch 3/10
123/123 [==============================] - 53s 430ms/step - loss: 0.0707 - val_loss: 0.1524
Epoch 4/10
123/123 [==============================] - 53s 432ms/step - loss: 0.0609 - val_loss: 0.0579
Epoch 5/10
123/123 [==============================] - 53s 432ms/step - loss: 0.0491 - val_loss: 0.1336
Epoch 6/10
123/123 [==============================] - 53s 431ms/step - loss: 0.0489 - val_loss: 0.2362
Epoch 7/10
123/123 [==============================] - 55s 451ms/step - loss: 0.0459 - val_loss: 0.2005
Epoch 8/10
123/123 [==============================] - 53s 430ms/step - loss: 0.0398 - val_loss: 0.0691
Epoch 9/10
123/123 [==============================] - 55s 451ms/step - loss: 0.0575 - val_loss: 0.0513
Epoch 10/10
123/123 [==============================] - 53s 431ms/step - loss: 0.0364 - val_loss: 0.0452

Out[6]:

<tensorflow.python.keras.callbacks.History at 0x7f2b10b52d30>

The results¶

And we apply the model to the validation dataset and we can displlay some results. Even with only 10 iterations and an image size of 128x128, the result is quite good. The computation took around 10 minutes. of course, at the end, the mask should be reshaped to fit the original size of the photos (code not shown here).

In [7]:

val_preds = model.predict(val_gen)

In [8]:

def display_mask(i):
    """Quick utility to display a model's prediction."""
    mask = np.argmax(val_preds[i], axis=-1)
    mask = np.expand_dims(mask, axis=-1)
    img = PIL.ImageOps.autocontrast(keras.preprocessing.image.array_to_img(mask))
    display(img)

for i in range(0, 10):
    val_gen.display_image(i)
    display_mask(i)

Finally, I applied the same code by using 153 photos in the training dataset and an image size of 1024x1024. The computation took around 5 hours and the model was applied to 468 photos. There was still some adjustments to be done but it helped a lot.