Abstract—Deep Learning has opened up numerous possibilities in different fields of Computer Science, Medical Systems, Geographical Systems, Stock Market Analysis, etc for prediction and data analysis. In the early age of photography, most of the images were captured in B&W and today they are being colored manually using softwares like Photoshop, 3D Film Maker etc. Here, we address the problem of coloring the black and white images. This is a very difficult problem and normally requires manual adjustment to achieve artifact-free quality. It also requires a careful selection of reference images and hours of patience to color the images. We aim to develop a high-quality fully-automatic colorization method. It uses modern deep learning technique to automatically colorize black and white photos. Inspired by the recent success in deep learning techniques which provide amazing modeling of large-scale data, this paper reformulates the colorization problem so that deep learning techniques can be directly employed. In this work, we use a pre-trained neural net (VGG) as a base-network to extract features. We are using Hypercolumn vectors to store all the information about the image as it passes through the CNN. The colored image will be in the YUV colorspace.Keywords— CNN, Neural Network, Image Colorization, YUV, HypercolumnsIntroductionAutomated colorization of black and white images has been subject to much research within the computer vision and machine learning communities. Image colorization assigns a color to each pixel of a target grayscale image. One approach is to use computer software to manually color the images.It requires substantial efforts from the user to provide considerable scribbles on the target grayscale images. It is thus time-assuming to colorize a grayscale image.Another way is to provide an example image which typically transfers the color information from it to the target image. However, finding a suitable reference image itself becomes a major challenge.Deep Learning techniques have already achieved huge success with the availability of large amount of data. It can even outperform humans to some extent. They have been blossoming in the field of Computer Vision and Image Processing, and are already performing complex tasks such as Object Detection, Classification and Segmentation, etc.Image colorization here is seen as a regression problem and we aim to use Deep Neural Networks to solve this problem. A large database of images is readily available over the internet which we intend to leverage.Generally, it is very difficult to achieve exact coloring of an image as there is huge loss  of information. So, we cannot expect the accuracy to be as high as in the case of classification problem. For example, a person with greyish shirt in the grayscale image could be wearing red, green, yellow, pink or any mixture of the above colors. However, the objects for which the color is generally known, are expected to have good colorization. Example, we can expect the skies to be blue and grass to be green,etc. Partial colorization is also expected in case of portraits, to get skin-tone and black hair, as it is often seen. Same could not be said about objects whose color changes continuously, like color of a car. Even in the digital era, B&W images form a large section of the total number of images.Most of them being those taken back in the 19th century. However, those images are quite important and there is the need to preserve them, as they contain lot of information. These B&W images might contain historical information or other types of information. The early age of filmography was captured in black and white. Decaying photographs are usually colored manually to preserve them. Coloring them using deep learning is an appropriate solution which could save hours of manual labour. The proposed system takes as input a B&W image which is then analysed and features are extracted from it. The extracted feature is represented in the form of a hypercolumn vector, which contains all the necessary information as the image is passed from the initial layer to the final layer of the CNN. The unwanted information is then left out, and the remaining information is then passed through another CNN which gives out the chrominance (UV) values for the YUV colorspace. The luminous intensity (Y) is same as that in the input image.The ultimate objective of the proposed system is to produce results by analysing the input image and produce pixel perfect colored image. The overall objective of the project is subject to the success three phases of the system.Our project was inspired Intra-prediction for Color Image Coding Using YUV Correlation, published by Luís F. R. Lucas, Nuno M. M. Rodrigues, Sérgio M. M. de Faria, Eduardo A. B. da Silva, Murilo B. de Carvalho, Vitor M. M. da Silva2 and other inspirational work in Deep Learning.2. Intra-prediction for color image coding using YUV correlationThe author proposes that after RGB to YUV conversion there exist some correlation between chrominance components and luminous components which can be explained using linear chrominance prediction model1.where C^(x,y) represents the predicted chrominance and L`(x,y) represents reconstructed luminance. ? and ? can be calculated using least mean square method.where RL’,C(x1, y1; x2, y2) is cross co-variance between L'(x1, y1) and C(x2,y2)2.1. The MMP Algorithm with Chroma Prediction The author uses a generic MMP algorithm that is used for lossy data compression and modifies it with Chroma Prediction. The Algorithm uses a set of prediction modes  based on H.264/AVC and Least Square Prediction. The chroma prediction can be calculated using previously reconstructed luminance and chrominance components. The MMP based color-coding method first encodes Y chroma components using variation of MMP algorithm. Then the V chroma component is encoded using 5 prediction modes. 3. Predicting Chroma from Luma with Frequency Domain Intra PredictionAccording to the authors, chrominance plane can be predicted using reconstructed luminance plane in frequency domain. The author e design a frequency domain intra predictor for chroma that exploits the same local correlation with lower complexity than the spatial predictor and which works with lapped transforms. The authors proposed algorithm for linear chrominance prediction. 3where ? and ? are calculated as linear least square regressionFigure 1. Comparison of (a) the original uncompressed image with composite images of (b) reconstructed luma and predicted chroma using experimental Daala intra modes and (c) reconstructed luma and predicted chroma using FD-CfL.3.1. Extension to Frequency Domain In laped transform, the reconstructed pixel data is not available. The product of linear pre-filter and linear forward DCT gives the coefficient in the lapped frequency domain.where the ?DC and ?DC are computed using the linear regression.3.2. PVQ with Frequency BandsNxN blocks are considered together as a single vector input for PVQ prediction. Consider the frequency band structure currently used by Daala in Figure 2 The PVQ-CfL technique is modified to work with any arbitrary partitioning of block coefficient into band.Figure 2. The band structure of 4×4, 8×8 and 16×16 blocks in Daala.4. ImageNet Classification with Deep Convolutional Neural Networks The authors proposes a deep convolutional neural network to classify images into 1000 different classes. It consist of 60,000,000 parameters, 650,000 neurons and 5 convolutional layer. The Architecture of the model is given in Figure 2. The network was made up of 5 conv layers, max-pooling layers, dropout layers, and 3 fully connected layers. It used data augmentation techniques consisting of image translations, horizontal reflections, and patch extractions. Dropout layer was implemented to provide regularization and combat the problem of overfitting. To reduce non-linearity authors have user ReLU activation function.4It achieved a top 5 test error rate of 15.4%. Figure 3. Architecture of CNN5. VGG-16VGG16 was based on the principles of simplicity and depth.VGG Net is reinforced the notion that convolutional neural networks have to have a deep network of layers in order for this hierarchical representation of visual data to work.The authors (Karen Simonyan and Andrew Zisserman of the University of Oxford) created a 19 layer CNN that used 3×3 filters with stride and pad of 1, along with 2×2 pooling layers with stride 2.The spatial size of the input volumes at each layer decrease because of continuous pooling , but the depth of the volumes increase due to the increased number of filters as you go down the network.After each maxpool layer the number of filters doubles. This reinforces the idea of shrinking spatial dimensions, but growing depth.Figure 4. Architecture of VGG166. Fast RCNN The paper proposes a Fast Region-based Convolutional Neural Network method (Fast R-CNN) for object detection. The authors proposed a novel technique for image segmentation purpose in 2015 which uses Selective search for generating region proposals and a network using these proposals to detect objects. The algorithm proceeds by generating 2000 different regions that have the highest probability of containing an object. These region proposals are then “warped” into an image size that can be fed into a trained CNN. Using a pretrained model avoids the need to train a CNN ourselves.Figure 5. Fast R-CNN workflowThe trained model then extracts a feature vector for each region. This vector is then used as the input to a set of linear SVMs that are trained for each class and output a classification.7. Faster R-CNN: Towards real-time object detection with region proposal networks.Faster RCNN is an Object detection and classification model. Developed by  Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun at  Microsoft, the model aims to . The authors introduced the novel idea of Regional Proposal Networks that share convolutional layers with state-of-the-art object detection networks. The marginal cost for computing proposals is very less.Inputs: Images.Outputs: Classifications and bounding box coordinates of objects in the images.Faster R-CNN has two networks: Region proposal network (RPN) for generating region proposals A network using these proposals to detect objects.7.1. AlgorithmUse a pretrained CNN model and extract feature maps from the last  convolution layer.Train a region proposal network that will decide if there is an object or not on the image. The same network should also give box locations.Pass these proposals to a Region of Interest (ROI) pooling layer like Spatial Pyramid Pooling Layer.The output from the previous step are fixed boxes in the image which contain objects. They are then send to a fully connected layer for the purpose of classification.8. Selective SearchSelective Search looks at the image through windows of different sizes, and for each size tries to group together adjacent pixels as per different features. These features could be texture, color, size, shape or intensity to identify objects.Selective Search starts by over-segmenting the image based on intensity of the pixels using a graph-based segmentation method by Felzenszwalb and Huttenlocher.Hierarchical Grouping Algorithm Input: (colour) image Output: Set of object location hypotheses L Obtain initial regions R = {r1,··· ,rn} Initialise similarity set S = 0 foreach Neighbouring region pair (ri,rj) do Calculate similarity s(ri,rj) S = S s(ri,rj) while  S 0 do Get highest similarity s(ri,rj) = max(S) Merge corresponding regions rt = ri rj Remove similarities regarding ri : S = S s(ri,r?) Remove similarities regarding rj : S = S s(r?,rj) Calculate similarity set St between rt and its neighbours S = SSt R = Rrt Extract object location boxes L from all regions in R9. Region Proposal NetworkInput image is passed through a convolution network which generates a set of convolutional feature maps. The feature map is obtained by different units sharing the same weights and biases.A sliding window is then run spatially on these feature maps. The size of sliding window is n×n. And for each sliding window, a set of 9 anchor points are generated having the same center (xa,ya). However, they have 3 different aspect ratios and 3 different scales. For each of these anchors, a value p? is computed. This value indicates the area that these anchors overlap with ground-truth bounding boxes.Finally, the 33 spatial features extracted from those convolutional feature maps are fed to a smaller  regression network. The output of regressor determines  a predicted bounding-box (x,y,w,h).10. Hypercolumns for Object Segmentation and Fine-grained LocalizationAlgorithms use features from the last Fully-Connected(FC) layer to extract information about certain input. But, information from the last FC layer do not give provide precise localization whereas the initial layers lack semantic information. In order to extract maximum information, the authors have defined the idea of hypercolumns. Hypercolumn of a pixel is the vector of activations of all CNN units above that pixel.Initially the input image is passed through a CNN and feature map for each location in the image is extracted. Continuous bilinear upsampling is applied so  that the activation feature map remains the same size as the input image. The Fully-Connected layer activations are viewed as 11 feature maps meaning that all the locations share the same information for the Fully Connected part of the hypercolumn. These activations are then given a vector representation which forms the Hypercolumn. The hypercolumn contains information about the initial layers (spatial information) as well as final layers (semantic information). This representation can then be used in segmentation and localizationFigure 6. The hypercolumn representation11. Colorful Image ColorizationThe author propose a fully automatic approach that produces vibrant and realistic colorization. The author trains a CNN to map from a grayscale input to a distribution over quantized color value outputs. Given the luminance channel L, the system predicts the corresponding a and b of the image in CIE Lab color space. The authors have used classification loss, with rebalanced rare classes. The author had evaluated their algorithm using a “colorization turing test.” asking human participants to choose between a generated and ground truth image.Figure 7. Network Architecture 11.1. Class Rebalancing Due to the appearance of background such as clouds, pavement, wall and dirt the distribution of ab value is strongly biased. The number of pixels in natural images at desaturated value are order of magnitude higher than for saturated values. The loss function is dominated by desaturated ab values.The author accounts for class imbalance problem by reweighting the loss of each pixel at training time12. Image Colorization with Deep Convolutional Neural Networks The author proposes a convolutional-neural-network-based system that faithfully colorize black and white photographic images without direct human assistance. The images are converted into CIELUV color space. During training, the algorithm reads image in RGB color space, the images are then converted into CIELUV color space. The black and white luminance L channel is fed to the model as input. The U and V channel are predicted as output The author has used rectified linear unit as the nonlinearity that follows each of our convolutional and dense layers. Mathematically, the rectified linear unit is defined as f(x) = max(0, x)There’s a batch normalization layer before every non-linearity layer apart from last few layers before output.Figure 8. Regression network schematic13. ReferencesLee, Sang Heon, and Nam Ik Cho. “Intra prediction method based on the linear relationship between the channels for YUV 4? 2? 0 intra coding.” Image Processing (ICIP), 2009 16th IEEE International Conference on. IEEE, 2009.Lucas, Luís FR, et al. “Intra-prediction for color image coding using YUV correlation.” Image Processing (ICIP), 2010 17th IEEE International Conference on. IEEE, 2010.Egge, Nathan E., and Jean-Marc Valin. “Predicting chroma from luma with frequency domain intra prediction.” Visual Information Processing and Communication VI. Vol. 9410. International Society for Optics and Photonics, 2015.Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE international conference on computer vision. 2015.Zhang, Richard, Phillip Isola, and Alexei A. Efros. “Colorful image colorization.” European Conference on Computer Vision. Springer International Publishing, 2016.Hwang, Jeff, and You Zhou. “Image Colorization with Deep Convolutional Neural Networks.”Hariharan, Bharath, et al. “Hypercolumns for object segmentation and fine-grained localization.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.Black, Green Magenta Red Blue. “YUV color space.” Communications Engineering Desk Reference 469 (2009).Uijlings, Jasper RR, et al. “Selective search for object recognition.” International journal of computer vision 104.2 (2013): 154-171.Ren, Shaoqing, et al. “Faster R-CNN: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems. 2015.


I'm Joan!

Would you like to get a custom essay? How about receiving a customized one?

Check it out