When applying/using photogrammetry, it is sometimes needed to segment the object to reconstruct it. When the number of photos is high, the task can be time-consuming. To help with this painstaking process, I tested semantic segmentation using deep learning methods, especially a U-net model.
Here I present some photos of a vertebra currently housed in the museum of Namibia under the care of Helke Mocke within the frame of a project developed by Amélie Beaudet (University of Cambridge). We took a number of pictures of this vertebra to build a 3D model with photogrammetry. The objective is to segment the vertebra in each photo in order to build an accurate 3D model.
The code is adapted from https://keras.io/examples/vision/oxford_pets_image_segmentation/
First, we defined the folders containing the photos and the associated masks. I used 153 photos for the training and validation datasets. There are 3 important parameters in the code below: the size of the images, the number of classes and the batch size. The dimension of the images has to correspond to a square, and my original photos were 6000x4000. So, the images were deformed to fit into a square. Another solution could be to cut the images so that the dimensions could correspond to a square. I have two classes only: the bone (vertebra) or the background, so the number of classes is 2. The batch size defines the number of samples that will be propagated throughout the network. I need to investigate more but, in this case, I got better results with a batch size of 1.
import os
input_dir = "/media/aj/gemeaux/work_new/En_cours_new/photogrammetrie_Namibie/test_ml/input"
target_dir = "/media/aj/gemeaux/work_new/En_cours_new/photogrammetrie_Namibie/test_ml/mask"
img_size = (128, 128)
num_classes = 2
batch_size = 1
input_img_paths = sorted(
[
os.path.join(input_dir, fname)
for fname in os.listdir(input_dir)
if fname.endswith(".JPG")
]
)
target_img_paths = sorted(
[
os.path.join(target_dir, fname)
for fname in os.listdir(target_dir)
if fname.endswith(".png") and not fname.startswith(".")
]
)
print("Number of segmented images:", len(input_img_paths))
Number of segmented images: 153
Here are some classes for loading and having access to the photos and masks
from IPython.display import display
from tensorflow.keras.preprocessing.image import load_img
import PIL
from PIL import ImageOps
from PIL import Image
from tensorflow import keras
import numpy as np
from tensorflow.keras.preprocessing.image import load_img
class BonePhotosTrain(keras.utils.Sequence):
"""Helper to iterate over the data (as Numpy arrays)."""
def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
self.batch_size = batch_size
self.img_size = img_size
self.input_img_paths = input_img_paths
self.target_img_paths = target_img_paths
def __len__(self):
return len(self.target_img_paths) // self.batch_size
def __getitem__(self, idx):
"""Returns tuple (input, target) correspond to batch #idx."""
i = idx * self.batch_size
batch_input_img_paths = self.input_img_paths[i : i + self.batch_size]
batch_target_img_paths = self.target_img_paths[i : i + self.batch_size]
x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
for j, path in enumerate(batch_input_img_paths):
img = load_img(path, target_size=self.img_size)
x[j] = img
y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype="uint8")
for j, path in enumerate(batch_target_img_paths):
img = load_img(path, target_size=self.img_size, color_mode="grayscale")
y[j] = np.expand_dims(img, 2)
y[j] = y[[j]]
y[j] = y[j] / 255
return x, y
def display_image(self, idx):
x, y = self.__getitem__(idx)
x = x[0]
x = x.astype(np.uint8)
x = Image.fromarray(x, 'RGB')
display(x)
y = y[0] * 255
y = y.astype(np.uint8)
y = Image.fromarray(y.reshape(y.shape[0], y.shape[1]), 'L')
display(y)
class BonePhotosNew(keras.utils.Sequence):
"""Helper to iterate over the data (as Numpy arrays)."""
def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
self.batch_size = batch_size
self.img_size = img_size
self.input_img_paths = input_img_paths
self.target_img_paths = target_img_paths
def __len__(self):
return len(self.input_img_paths) // self.batch_size
def __getitem__(self, idx):
"""Returns tuple (input, target) correspond to batch #idx."""
i = idx * self.batch_size
batch_input_img_paths = self.input_img_paths[i : i + self.batch_size]
batch_target_img_paths = self.target_img_paths[i : i + self.batch_size]
x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype="float32")
for j, path in enumerate(batch_input_img_paths):
img = load_img(path, target_size=self.img_size)
x[j] = img
return x
def display_image(self, idx):
x = self.__getitem__(idx)
x = x[0]
x = x.astype(np.uint8)
x = Image.fromarray(x, 'RGB')
display(x)
Then we randomly define the training and validation datasets (123 images for the training and 30 images for the validation).
import random
# Split our img paths into a training and a validation set
val_samples = 30
random.Random(len(input_img_paths)).shuffle(input_img_paths)
random.Random(len(input_img_paths)).shuffle(target_img_paths)
train_input_img_paths = input_img_paths[:-val_samples]
train_target_img_paths = target_img_paths[:-val_samples]
val_input_img_paths = input_img_paths[-val_samples:]
val_target_img_paths = target_img_paths[-val_samples:]
train_gen = BonePhotosTrain(batch_size, img_size, train_input_img_paths, train_target_img_paths)
val_gen = BonePhotosTrain(batch_size, img_size, val_input_img_paths, val_target_img_paths)
print("Example of training images:")
train_gen.display_image(0)
print("Example of validation images:")
val_gen.display_image(15)
Example of training images:
Example of validation images:
from tensorflow.keras import layers
def get_model(img_size, num_classes):
inputs = keras.Input(shape=img_size + (3,))
### [First half of the network: downsampling inputs] ###
# Entry block
x = layers.Conv2D(32, 3, strides=2, padding="same")(inputs)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
previous_block_activation = x # Set aside residual
# Blocks 1, 2, 3 are identical apart from the feature depth.
for filters in [64, 128, 256]:
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.SeparableConv2D(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding="same")(x)
# Project residual
residual = layers.Conv2D(filters, 1, strides=2, padding="same")(
previous_block_activation
)
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
### [Second half of the network: upsampling inputs] ###
for filters in [256, 128, 64, 32]:
x = layers.Activation("relu")(x)
x = layers.Conv2DTranspose(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
x = layers.Conv2DTranspose(filters, 3, padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.UpSampling2D(2)(x)
# Project residual
residual = layers.UpSampling2D(2)(previous_block_activation)
residual = layers.Conv2D(filters, 1, padding="same")(residual)
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
# Add a per-pixel classification layer
outputs = layers.Conv2D(num_classes, 3, activation="softmax", padding="same")(x)
# Define the model
model = keras.Model(inputs, outputs)
return model
# Free up RAM in case the model definition cells were run multiple times
keras.backend.clear_session()
# Build model
model = get_model(img_size, num_classes)
#model.summary()
2021-10-27 16:28:21.744885: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
And then we can train the model
# Configure the model for training.
# We use the "sparse" version of categorical_crossentropy
# because our target data is integers.
model.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy")
callbacks = [
keras.callbacks.ModelCheckpoint("oxford_segmentation.h5", save_best_only=True)
]
# Train the model, doing validation at the end of each epoch.
epochs = 10
model.fit(train_gen, epochs=epochs, validation_data=val_gen, callbacks=callbacks)
2021-10-27 16:28:22.464964: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2) 2021-10-27 16:28:22.481843: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2299965000 Hz
Epoch 1/10 123/123 [==============================] - 58s 454ms/step - loss: 0.8894 - val_loss: 0.2098 Epoch 2/10 123/123 [==============================] - 56s 457ms/step - loss: 0.1010 - val_loss: 0.3320 Epoch 3/10 123/123 [==============================] - 53s 430ms/step - loss: 0.0707 - val_loss: 0.1524 Epoch 4/10 123/123 [==============================] - 53s 432ms/step - loss: 0.0609 - val_loss: 0.0579 Epoch 5/10 123/123 [==============================] - 53s 432ms/step - loss: 0.0491 - val_loss: 0.1336 Epoch 6/10 123/123 [==============================] - 53s 431ms/step - loss: 0.0489 - val_loss: 0.2362 Epoch 7/10 123/123 [==============================] - 55s 451ms/step - loss: 0.0459 - val_loss: 0.2005 Epoch 8/10 123/123 [==============================] - 53s 430ms/step - loss: 0.0398 - val_loss: 0.0691 Epoch 9/10 123/123 [==============================] - 55s 451ms/step - loss: 0.0575 - val_loss: 0.0513 Epoch 10/10 123/123 [==============================] - 53s 431ms/step - loss: 0.0364 - val_loss: 0.0452
<tensorflow.python.keras.callbacks.History at 0x7f2b10b52d30>
And we apply the model to the validation dataset and we can displlay some results. Even with only 10 iterations and an image size of 128x128, the result is quite good. The computation took around 10 minutes. of course, at the end, the mask should be reshaped to fit the original size of the photos (code not shown here).
val_preds = model.predict(val_gen)
def display_mask(i):
"""Quick utility to display a model's prediction."""
mask = np.argmax(val_preds[i], axis=-1)
mask = np.expand_dims(mask, axis=-1)
img = PIL.ImageOps.autocontrast(keras.preprocessing.image.array_to_img(mask))
display(img)
for i in range(0, 10):
val_gen.display_image(i)
display_mask(i)
Finally, I applied the same code by using 153 photos in the training dataset and an image size of 1024x1024. The computation took around 5 hours and the model was applied to 468 photos. There was still some adjustments to be done but it helped a lot.