Overview

This project explores image transformation techniques by mapping reference points between images to change transform and shape at the pixel level. Using triangulation and affine transforms, it enables seamless comparison, averaging, and morphing of images. By leveraging dataset averages, this method unlocks creative possibilities for image manipulation and produces fascinating experimental results.

Defining Correspondences

The first step is defining correspondences by selecting key points on both images with a one-to-one mapping. These points are used to create a triangulation for each image, breaking them into corresponding triangles. This setup allows us to later apply transformations to each triangle, matching the shape and form of one image to the other during the morphing process.

A good triangulation can be defined as one that maximizes the minimum angle of all the triangles in the triangulation, i.e. for a triangulation of a set of points $T$ , we maximize

\max_T \left( \min_{\triangle_i \in T} \theta_i \right)

where $\theta_i$ refers to the smallest angle in the triangle $\triangle_i$ . So the choice of method to compute the triangulation could determine the quality of the results. For instance, Delaunay triangulation works by connecting key points to form triangles such that no point lies inside the circumcircle of any triangle. The algorithm iteratively adds points and checks for violations of this property, flipping edges if needed to maintain the Delaunay condition. This ensures the triangulation is optimal, creating well-proportioned triangles that avoid extreme angles, which helps maintain a smooth and accurate transformation.

For this project, the key points were marked with this tool available online.

Computing the "Mid-way Face"

To minimize triangle deformations, we compute a mid-shape by averaging the corresponding points from both images. This also produces a mid-triangulation, as the resulting average points will have a one-to-one mapping with the points in either image. Warping both images into this shared mid-shape, using the average of their vector coordinates, ensures a smoother and more balanced transformation between the two.

\mathbf{P}_{\text{mid}} = \frac{1}{2} \left( \mathbf{P}_1 + \mathbf{P}_2 \right)

An affine transform enables a linear mapping from one set of points to another using a transformation matrix. To find this matrix, we calculate it based on the corresponding vertices of triangles in our triangulation for both images relative to the mid-shape. For triangle vertices $p_1$ , $p_2$ , $p_3$ in the source shape (the original image) and $q_1$ , $q_2$ , $q_3$ in the target shape (the mid-way shape), we can express the transformation as

\begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = \begin{bmatrix} a & b & c \\ d & e & f \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}

We get the following system of equations to solve for the coefficients

\begin{bmatrix} p_{1,x} & p_{1,y} & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & p_{1,x} & p_{1,y} & 1 \\ p_{2,x} & p_{2,y} & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & p_{2,x} & p_{2,y} & 1 \\ p_{3,x} & p_{3,y} & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & p_{3,x} & p_{3,y} & 1 \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \end{bmatrix} = \begin{bmatrix} q_{1,x} \\ q_{1,y} \\ q_{2,x} \\ q_{2,y} \\ q_{3,x} \\ q_{3,y} \end{bmatrix}

Solving these equations produces the affine transformation matrix, which can then be applied to all pixels within the triangle, ensuring an accurate warping to match the target shape and location.

Inverse warping maps pixels from the destination image back to the source image. This approach ensures that each destination pixel gets an accurate value from the source, preventing gaps and distortions, which are common in forward warping. Forward warping, by contrast, pushes pixels from the source to the destination, often missing pixels or leaving gaps because not all destination pixels are hit. In inverse warping, we use the inverse of the affine matrix $A^{-1}$ to compute source coordinates for each pixel in the final image

\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = A^{-1} \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix}

Since the computed coordinates $(x,\ y)$ are typically non-integer, bilinear interpolation is used to calculate the final pixel value taking a weighted avearge of the surrounding pixel values, preventing artifacts and ensuring a smooth transformation.

The warped images are finally averaged and stacked

The Morph Sequence

We can control the contribution of each image's shape and color introducing a warp and dissolve factor. The warp factor $w$ controls how much each image's shape affects the final shape, while the dissolve factor $d$ blends the color and texture from the warped images.

\mathbf{P}_{\text{final}} = (1 - w) \cdot \mathbf{P}_{1} + w \cdot \mathbf{P}_{2}

\mathbf{I}_{\text{final}} = (1 - d) \cdot \mathbf{I}_{1,\ \text{warped}} + d \cdot \mathbf{I}_{2,\ \text{warped}}

Here, $\mathbf{I}_{1,\ \text{warped}}$ and $\mathbf{I}_{2,\ \text{warped}}$ are the images warped into the shared final shape $\mathbf{P}_{\text{final}}$ . By adjusting $w$ and $d$ , we can smoothly transition both shape and color. This can also be used to create animations by gradually changing $w$ and $d$ over multiple frames, producing a seamless morph from one image to the other.

The "Mean Face" of a population

Averaging faces involves warping multiple images to a common shape and blending their pixel values. This process creates a composite image that represents the average appearance of the group, capturing subtle variations while providing a unified representation.

The results below were produced with the labeled images in the Danes dataset available online, consisting of 37 images of Danish individuals.

Then I warped an image of myself with the average of the Danes dataset and vice-verse, although I should probably not draw any conclusions based on this dataset...

And here's a few examples of faces in the dataset warped into the average Danes shape.

Caricatures - Extrapolating from the mean

By selecting a warp factor $w$ outside the range of 0 to 1, we can create caricatures of faces through extrapolation. Values greater than 1 exaggerate facial features, while negative values compress them, allowing for playful distortions.

How Spanish am I?

Here we find the average faces for a wide range of countries around the globe. Looking for interesting results I morphed my face with the average male and female from Spain to verify how Spanish I truly am.

These were the average faces for Spanish male and female along with a portrait image of myself.

Then we follow our morphing procedure to produce the following results.

Which are just as funny as they are traumatizing. I will blame it on the inconsitency of the camera's field of view between images.

by Jorge Diaz Chao