CS 180

by Jorge Diaz Chao

Colorizing the Prokudin-Gorskii photo collection

Overview

Sergei Mikhailovich Prokudin-Gorskii was a Russian photographer who pioneered color photography in the early 20th century. He captured thousands of color images, each created by taking three exposures of a scene on a glass plate using red, green, and blue filters. These photos were later digitized and made publicly available, with the blue, green and red color plates aligned from top to bottom. This project explores the alignment of different color plates and other image processing techniques to restore and enhance the quality of the images.

Color Alignment

The photos in the dataset consist of blue, green, and red plates stacked vertically, so before alignment, the original photos are divided into thirds. Each photo is imported as a two-dimensional matrix, where float values between 0 and 1 represent the intensity of the red, green, or blue channels. To separate the color channels, we divide the height of the matrix by 3 and assign each third of the matrix to b, g, and r from top to bottom, representing the blue, green, and red plates, respectively. Note that later, these are stacked in the more common order: red, green, and blue.

When stacking the plates, it becomes apparent that they do not align properly, resulting in a colored but blurry photo. However, aligning these plates is not too difficult. While color intensities may vary locally across different areas of the photo, they generally follow similar shapes. One approach is to calculate the difference between two color channels and score this difference to assess how well the plates are aligned. There are several methods to compare two photos as two-dimensional matrices. Two such methods are:

Sum of Squared Differences (SSD)

np.sum((image1 - image2) ** 2)

Normalized Cross-Correlation (NCC)

na = a.ravel() / np.linalg.norm(a)
nb = b.ravel() / np.linalg.norm(b)
return np.dot(na, nb)

Due to normalization, NCC (Normalized Cross-Correlation) is particularly useful when dealing with varying illumination and texture. In contrast, SSD (Sum of Squared Differences) involves simpler operations and is more appropriate when uniformity and speed are priorities. NCC was used to produce the results. The approach involves comparing different transformations of a photo with a reference and scoring these transformations to select the one with the best alignment. For simplicity, the red and green plates were translated left to right and top to bottom within an arbitrary pixel window [-15, 15] and compared to the blue plate, which served as the reference photo.

For higher-resolution images, a fixed window size may not be suitable, as misalignment could be more significant, making a window size of 15 pixels inadequate relative to the image's height and width. A naive approach would be to increase the window size, but this can be inefficient, as the algorithm's complexity grows as O(n^2) with the window size, leading to long processing times. An alternative is to use an image pyramid, which involves downsizing the original photo (or set of photos) by a scaling factor and working with lower-resolution images first before moving to higher-resolution images. This approach allows for an initial prediction of the expected transformation on the lower-resolution images. This predicted transformation is then scaled according to the resizing factor of the pyramid and used as an estimate for alignment when re-running the algorithm on the higher-resolution images.

Edge Alignment

The alignment technique faces challenges due to differences in illuminance between the color plates, which can lead to poor scoring even if the photos are correctly aligned. To address this, edge detection is used to focus on the structural features of the images rather than their color intensity.

The Canny edge detection method is effective in this context. It begins by smoothing the image to remove noise, then detects areas with significant intensity changes to identify edges. These edges are refined by thinning them and applying threshold levels to classify them as strong or weak. Weak edges are connected to strong ones if they belong to the same boundary, resulting in a clear outline of the shapes in the image.

Using the edge-detected images, which have the same dimensions as the originals, allows for more accurate alignment. The transformations inferred from these edge-detected images are then applied to the original photos. This method ensures that alignment is based on consistent structural features, which helps improve accuracy despite variations in illuminance.

Results

Cropping

The results may include unwanted borders due to the black edges of the plates and residual misalignment. To address this, a cropping algorithm can be developed to remove unnecessary empty space around the main content of an image. The process begins by detecting edges in the image, which highlights areas with significant changes in color or brightness. Once the edges are identified, the algorithm examines a smaller square region from spaced out from the walls of the image. It then scans outward from these regions to determine where the important content begins, based on the strength of the edges. After identifying the boundaries of the main content, the algorithm crops the image to focus only on that area, effectively removing the excess borders. The result is a more tightly framed image that highlights the subject without the distraction of unwanted borders.

Note that these unwanted borders are noise to our scoring methods, so plate images were cropped by a factor of 2 to run the alignment.