This paper investigates into the colorization problem which converts a grayscale image to a colorful version. This is a very difficult problem and normally requires manual adjustment to achieve artifact-free quality. For instance, it normally requires human-labelled color scribbles on the grayscale target image or a careful selection of colorful reference images (e.g., capturing the same scene in the grayscale target image). Unlike the previous methods, this paper aims at a high-quality fully-automatic colorization method. With the assumption of a perfect patch matching technique, the use of an extremely large-scale reference database (that contains sufficient color images) is the most reliable solution to the colorization problem. However, patch matching noise will increase with respect to the size of the reference database in practice. Inspired by the recent success in deep learning techniques which provide amazing modeling of large-scale data, this paper re-formulates the colorization problem so that deep learning techniques can be directly employed. To ensure artifact-free quality, a joint bilateral filtering based post-processing step is proposed. We further develop an adaptive image clustering technique to incorporate the global image information. Numerous experiments demonstrate that our method outperforms the state-of-art algorithms both in terms of quality and speed.
Numerous efforts have been made to design various low-level saliency cues for RGBD saliency detection, such as color and depth contrast features as well as background and color compactness priors. However, how these low-level saliency cues interact with each other and how they can be effectively incorporated to generate a master saliency map remain challenging problems. In this paper, we design a new convolutional neural network (CNN) to automatically learn the interaction mechanism for RGBD salient object detection. In contrast to existing works, in which raw image pixels are fed directly to the CNN, the proposed method takes advantage of the knowledge obtained in traditional saliency detection by adopting various flexible and interpretable saliency feature vectors as inputs. This guides the CNN to learn a combination of existing features to predict saliency more effectively, which presents a less complex problem than operating on the pixels directly. We then integrate a superpixel-based Laplacian propagation framework with the trained CNN to extract a spatially consistent saliency map by exploiting the intrinsic structure of the input image. Extensive quantitative and qualitative experimental evaluations on three data sets demonstrate that the proposed method consistently outperforms the state-of-the-art methods.
De partm e n t o f C o m pute r S c ie n c e 2 C e n te r fo r V is ualiz atio n an d V irtual E n v iro n m e n ts U n iv e rs ity o f N o rth C aro lin a U n iv e rs ity o f K e n tuc k y C h ape l H ill, U S A L e x in g to n , U S A A b stractRecent research has focused on systems for obtaining automatic 3 D reconstructions of urban env ironments from v ideo acq uired at street lev el. T hese systems record enormous amounts of v ideo; therefore a k ey comp onent is a stereo matcher w hich can p rocess this data at sp eeds comp arable to the recording frame rate. F urthermore, urban env ironments are uniq ue in that they ex hibit mostly p lanar surfaces. T hese surfaces, w hich are often imaged at obliq ue angles, p ose a challenge for many w indow -based stereo matchers w hich suffer in the p resence of slanted surfaces. W e p resent a multi-v iew p lane-sw eep -based stereo algorithm w hich correctly handles slanted surfaces and runs in real-time using the grap hics p rocessing unit (G P U ). O ur algorithm consists of (1 ) identifying the scene's p rincip le p lane orientations, (2 ) estimating dep th by p erforming a p lane-sw eep for each direction, (3 ) combining the results of each sw eep . T he latter can op tionally be p erformed using grap h cuts. A dditionally, by incorp orating p riors on the locations of p lanes in the scene, w e can increase the q uality of the reconstruction and reduce comp utation time, esp ecially for uniform tex tureless surfaces. W e demonstrate our algorithm on a v ariety of scenes and show the imp rov ed accuracy obtained by accounting for slanted surfaces. . I ntrod uctionR e c o n s truc tio n s o f b uild in g s in 3 D fro m ae rial o r s ate llite im ag e ry h as lo n g b e e n a to pic o f re s e arc h in c o m pute r v is io n an d ph o to g ram m e try . T h e s uc c e s s o f s uc h re s e arc h c an b e s e e n in applic atio n s s uc h as Go o g le E arth an d M ic ro s o ft V irtual E arth , w h ic h n o w o ffe r 3 D v is ualiz atio n s o f s e v e ral c itie s . H o w e v e r, s uc h v is ualiz atio n s lac k g ro un dle v e l re alis m , d ue m o s tly to th e po in t o f v ie w o f th e imag e ry . A d iffe re n t appro ac h is to g e n e rate v is ualiz atio n s in th e fo rm o f pan o ram as [16,12 ] w h ic h re q uire le s s d ata to b e c o n s truc te d b ut als o lim it th e us e r's ab ility to fre e ly n avig ate th e e n v iro n m e n t. R e c e n t re s e arc h h as fo c us e d o n s y ste m s fo r o b tain in g auto m atic 3 D re c o n s truc tio n s o f urb an e n v iro n m e n ts fro m v id e o ac q uire d at s tre e t le v e l [15, 13 , 6].U rb an e n v iro n m e n ts are un iq ue in th at th e y e x h ib it m o s tly plan ar s urfac e s . A ty pic al im ag e , fo r e x am ple , m ay c o n tain a g ro un d plan e , an d m ultiple fac ad e plan e s in te rs e c tin g at rig h t an g le s . M an y s y s te m s aim to re c o n s truc t s uc h im ag e ry us in g s pars e te c h n iq ue s , w h ic h e x am in e po in t o r lin e c o rre s po n d e ...
Matching cost aggregation is one of the oldest and still popular methods for stereo correspondence. While effective and efficient, cost aggregation methods typically aggregate the matching cost by summing/averaging over a user-specified, local support region. This is obviously only locally-optimal, and the computational complexity of the full-kernel implementation usually depends on the region size. In this paper, the cost aggregation problem is re-examined and a non-local solution is proposed. The matching cost values are aggregated adaptively based on pixel similarity on a tree structure derived from the stereo image pair to preserve depth edges. The nodes of this tree are all the image pixels, and the edges are all the edges between the nearest neighboring pixels. The similarity between any two pixels is decided by their shortest distance on the tree. The proposed method is non-local as every node receives supports from all other nodes on the tree. The proposed method can be naturally extended to the time domain for enforcing temporal coherence. Unlike previous methods, the non-local property guarantees that the depth edges will be preserved when the temporal coherency between all the video frames are considered. A non-local weighted median filter is also proposed based on the non-local cost aggregation algorithm. It has been demonstrated to outperform all local weighted median filters on disparity/depth upsampling and refinement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.