By isolating and manipulating different frequency components of an image, a wide range of visual effects can be achieved, from seamless blending to edge detection. This project explores different techniques and tools like the Fourier transform, derivative of gaussian, laplacian stacks among others to filter out features from images, enhance them or create cool visual effects.
Edge detection is essential in computer vision, as edges highlight rapid intensity changes, often marking object boundaries. A common method involves calculating the image gradient, which shows the direction and strength of these changes. Finite difference operators are an effective way to approximate derivatives and extract the gradient for edge detection.
Convolving an image with finite difference operators provides a good approximation of its gradient. Convolution with the operator returns the gradient in the x direction, highlighting horizontal edges, while captures vertical edges. This process shifts the kernel across the image and calculates the weighted sum of pixel intensities, estimating the local derivative. Formally, this computes the partial derivatives, revealing the image's intensity changes in each direction.
By combining the x and y gradients, we can compute the magnitude and direction of the overall gradient, providing crucial information about the edges and intensity transitions within the image. To find the edges of an image one could first compute the gradient magnitude and then converting the gradient magnitude image to a binary image by setting an arbitrary threshold.
However, the edge image appears to have a lot of noise. To find the edges more accurately one could attempt to apply a Gaussian filter to reduce noise as high frequency patterns, effectively blurring the image, and then follow the procedure we did.
By applying the gaussian filter, we effectively blur the original image and filter out high frequencies which appear as noise when filtering the edges. After blurring the image the edges are clearer and sharper, while high frequency areas like the grass the gradient is big but not consistent is no longer detected as edges. Furthermore, the process of convolving the original image with the gaussian filter and then the derivatives should be equivalent to convoling the original image direclty with the DoG, which we can verify by subtracting the two resulting images and check if the resulting matrix only contains 0s, which is the case. Note that to visualize the DoG filtered image, I initially used the same convolution method, but for an exact match, I applied the full convolution approach.
A fairly straightforward and simple method to sharpen an image is by filtering the high freuqencies of an image and adding them to the original. To produce a high pass filter one can use the a low pass filter with a gaussian blur like we did before, and substract the low frequencies from the original image. Then, our sharpened image can be calculated
where is the unit impulse. This equation suggest that we can apply our sharpening filter with a single convolution.
Hover your mouse over the following examples to see the orginal version of the image.
The latter example takes fairly clear image taken with a digital camera and blurs it before applying our sharpening operation. We notice that the end result gets close to the original image but it is not as good. We expect this due to the application of the gaussian filter to our orginal image and once again later to filter the high frequencies of the image, where we are producing new values for the pixels by taking a weighted mean.
Hybrid images blend the sharp details (high frequencies) of one image with the softer, broader features (low frequencies) of another. At close range, the detailed image is more noticeable, whereas from a distance, the smoother features take over. This results in a visual effect that changes depending on how far the viewer is from the image. To create a hybrid image, one can separate the high and low-frequency components of the images. The low-frequency components are obtained by applying a Gaussian filter, which smooths the image. The high-frequency components are extracted by subtracting the blurred image from the original, leaving the finer details. For example, by combining the low-frequency content of image 1 with the high-frequency details of image 2, a hybrid image is produced, where different features become visible at varying viewing distances.
To visualize the difference range of frequencies between images we utilize the Fourier transform, a mathematical tool that decomposes an image into its frequency components. It converts the spatial domain (the pixel values of the image) into the frequency domain, representing how fast pixel values change across the image. Low frequencies correspond to smooth, gradual variations (broad structures), while high frequencies capture sharp edges and fine details.
Note that the effect in the last example is not very clear. The images are difficult to separate due to their very similar high-frequency components.
A Gaussian stack is a series of progressively blurred versions of an image, created by repeatedly applying a Gaussian filter. Each level of the stack retains the original image size but removes more high-frequency details, preserving only the broader structures. In contrast, a Laplacian stack is built by subtracting consecutive levels of the Gaussian stack. This reveals the high-frequency details lost between levels, emphasizing fine structures and edges.
To compute a Guassian stack one can repeatedly apply a Gaussian filter to the previous image in the stack, starting from the original image. Then
Then, to compute a Laplacian stack, we subtract consecutive levels of the Gaussian stack. At the final level, the Laplacian stack hold the final gaussian level. So
Summing all the levels of the Laplacian stack, along with the final Gaussian level, reconstructs the original image as we reintroduce the high-frequency details lost in the blurring process. Each Laplacian level captures the difference between successive Gaussian layers, so adding these differences back restores the image's full frequency range, resulting in an accurate reconstruction.
A binary mask helps blend two images by determining which parts of each image are visible in the final result. For each level of the Laplacian stacks, the mask decides how much of each image's details contribute to the blended image. In areas where the mask is 1, we take details from the first image, and where it’s 0, we use the second image. Gradually combining the details from each image across multiple frequency levels ensures a smooth, natural blend, creating seamless transitions between the two images. For each layer in the stack we compute the blending as
Then we can reconstruct the blended images by taking the sum of our laplacian blended stack.