15-463 Project 1: Images of the Russian Empire
For the first project of Computational Photography, students were asked to reconstruct color images from black and white photographs taken by Sergei Mikhailovich Prokudin-Gorskii. The original photographs were taken as three separate captures on a single glass plate. Overlaid with red, green and blue films, the three captures attempt to represent grayscale intensity of the corresponding color channels. Scans of the original plates are available from the Library of Congress' Prokudin-Gorskii archive. The results of my program running on the required images can be found here and the alignment offsets can be found here. The results of my program running on a few images I chose are here and their offsets are here. You can also view the original plate scans.
Given three aligned color channels, it is very easy to combine them into a color image using the
cat command in Matlab. The non-trivial part of this assignment lies in the ability to separate and align the channels from the scans. To separate the channels, I merely split the image into thirds vertically. As suggested on the project description page, I began by computing the L2 norm (otherwise known as sum of squared differences or SSD) for the red and green channels against the blue channel over a
The SSD technique worked very well for the small sample images, but the computational cost of this on the larger images is very high and inefficient. As per suggestion on the project description page, I recursively computed the alignment using the Gaussian pyramids for each image. The pyramid is essentially a logarithmic search over the image that first aligns on smaller versions of the two images and uses that best alignment to find a small window to search over with the larger images. With the gaussian pyramid and SSD heuristic, I was able to align all images quickly, but a little less than half of the images would align properly.
Solely running SSD over the channels is faulty since the intensity of one color channel may differ greatly from another. For example, a man wearing a highly saturated blue coat will have a intensity
1.0 on the blue channel and intensity
0.0 on the red and green channels. Such a high difference in intensity contributes negatively to the SSD matching, which caused many of the images to be heavily misaligned. While intensity of the pixels in the image may differ, the edges in each of the channels have a positive correlation in most situatinos. Subtracting a Gaussian blur of the original channel, I was able to perform simple edge detection and send the edges to the alignment algorithm. Running the program over a few images, it was evident that the sharp edge between the black and white striped borders were matched more often than the content of the actual photographs. Thus after the edge detection, I set the 10% border to zero.
After these improvements to the alignment algorithm, eight out of the twenty images were still misaligned. I found that when I was performing the pyramid search, the window for more fine searching was too constrained. Increasing the window size from
[-2,2], I then correctly aligned eighteen of the twenty images.
For the two images which had incorrect alignment, the algorithm chose to explore a local minima in one of the small images which did not lead to the global minimum SSD. Interestingly, in the
01043u.tif image the red channel found the correct offset but the green channel did not. One possible solution for this problem would be to apply a median filter on the edges to remove the excess edge noise. It appeared that this noise caused the incorrect window exploration.
I had some additional difficulties using Matlab for the first time. Richness of the Matlab libraries made it both easier and harder to program this project. With so many functions, a lot of the functionality that I need to code has already been implemented for me. Knowing what image and matrix manipulation functions I must write, however, is made difficult from Matlab's wealth of library functions since my knowledge of these libraries is so adolescent. Among other things, I was confused by the interchangeable use of matrix and pixel access, where
M(row, col) is the same as
M(y, x), and the use of column-major indices.