Abstract:Abstract-This survey aims at reviewing recent computer vision techniques used in the assessment of image aesthetic quality. Image aesthetic assessment aims at computationally distinguishing high-quality photos from low-quality ones based on photographic rules, typically in the form of binary classification or quality scoring. A variety of approaches has been proposed in the literature trying to solve this challenging problem. In this survey, we present a systematic listing of the reviewed approaches based on v… Show more
“…S P IC E = F 1 Scor e = 2 * P r ecision * Recall pr ecision + Recall (9) As shown in Table 4, our model is superior to the method proposed by the PCCD [4] in various attributes. The PCCD method [4] uses the attribute fusion training method, which combines the three attributes of Composition, Color and Lighting, Subject of Photo.…”
Figure 1: Aesthetic Attributes Assessment of Images. We predict caption and score of each aesthetic attribute of an image.ABSTRACT Image aesthetic quality assessment has been a relatively hot topic during the last decade. Most recently, comments type assessment (aesthetic captions) has been proposed to describe the general aesthetic impression of an image using text. In this paper, we propose Aesthetic Attributes Assessment of Images, which means the aesthetic attributes captioning. This is a new formula of image aesthetic assessment, which predicts aesthetic attributes captions together with the aesthetic score of each attribute. We introduce a new dataset named DPC-Captions which contains comments of up to 5 aesthetic attributes of one image through knowledge transfer from a full-annotated small-scale dataset. Then, we propose Aesthetic Multi-Attribute Network (AMAN), which is trained on a mixture of fully-annotated small-scale PCCD dataset and weakly-annotated large-scale DPC-Captions dataset. Our AMAN makes full use of transfer learning and attention model in a single framework. The experimental results on our DPC-Captions and PCCD dataset reveal that our method can predict captions of 5 aesthetic attributes together with numerical score assessment of each attribute. We use the evaluation criteria used in image captions to prove that our specially designed AMAN model outperforms traditional CNN-LSTM model and modern SCA-CNN model of image captions.
“…S P IC E = F 1 Scor e = 2 * P r ecision * Recall pr ecision + Recall (9) As shown in Table 4, our model is superior to the method proposed by the PCCD [4] in various attributes. The PCCD method [4] uses the attribute fusion training method, which combines the three attributes of Composition, Color and Lighting, Subject of Photo.…”
Figure 1: Aesthetic Attributes Assessment of Images. We predict caption and score of each aesthetic attribute of an image.ABSTRACT Image aesthetic quality assessment has been a relatively hot topic during the last decade. Most recently, comments type assessment (aesthetic captions) has been proposed to describe the general aesthetic impression of an image using text. In this paper, we propose Aesthetic Attributes Assessment of Images, which means the aesthetic attributes captioning. This is a new formula of image aesthetic assessment, which predicts aesthetic attributes captions together with the aesthetic score of each attribute. We introduce a new dataset named DPC-Captions which contains comments of up to 5 aesthetic attributes of one image through knowledge transfer from a full-annotated small-scale dataset. Then, we propose Aesthetic Multi-Attribute Network (AMAN), which is trained on a mixture of fully-annotated small-scale PCCD dataset and weakly-annotated large-scale DPC-Captions dataset. Our AMAN makes full use of transfer learning and attention model in a single framework. The experimental results on our DPC-Captions and PCCD dataset reveal that our method can predict captions of 5 aesthetic attributes together with numerical score assessment of each attribute. We use the evaluation criteria used in image captions to prove that our specially designed AMAN model outperforms traditional CNN-LSTM model and modern SCA-CNN model of image captions.
“…The release of two cropping databases [39,13] facilitates the training of discriminative cropping models. However, the handcrafted features are not strong enough to accurately predict image aesthetics [11].…”
Section: Related Workmentioning
confidence: 99%
“…Two objective metrics, namely intersection-overunion (IoU) and boundary displacement error (BDE) [14], were defined to evaluate the performance of image cropping models on these databases. These public benchmarks enable many researchers to develop and test their cropping models, significantly facilitating the research on automatic image cropping [39,11,34,5,6,10,15,22,36]. Though many efforts have been made, there exists sev- Table 1.…”
Image cropping aims to improve the composition as well as aesthetic quality of an image by removing extraneous content from it. Existing image cropping databases provide only one or several human-annotated bounding boxes as the groundtruth, which cannot reflect the non-uniqueness and flexibility of image cropping in practice. The employed evaluation metrics such as intersection-over-union cannot reliably reflect the real performance of cropping models, either. This work revisits the problem of image cropping, and presents a grid anchor based formulation by considering the special properties and requirements (e.g., local redundancy, content preservation, aspect ratio) of image cropping. Our formulation reduces the searching space of candidate crops from millions to less than one hundred. Consequently, a grid anchor based cropping benchmark is constructed, where all crops of each image are annotated and more reliable evaluation metrics are defined. We also design an effective and lightweight network module, which simultaneously considers the region of interest and region of discard for more accurate image cropping. Our model can stably output visually pleasing crops for images of different scenes and run at a speed of 125 FPS. Code and dataset are available at: https://github.com/HuiZeng/ Grid-Anchor-based-Image-Cropping.
“…They allow the creation of models that can analyze any picture and predict their aesthetic value, without the need for any annotated data about its contents; and without making use of hand-crafted features. Some examples of the use of CNNs for image aesthetics prediction and related topics can be found in [2,20,33,6,11,4,9,35,10,17]. Some of those papers make use of information about the contents of the pictures to improve the predictions of the models.…”
Section: Related Work 21 Computational Aesthetic Assessment In Photomentioning
The use of computational methods to evaluate aesthetics in photography has gained interest in recent years due to the popularization of convolutional neural networks and the availability of new annotated datasets. Most studies in this area have focused on designing models that do not take into account individual preferences for the prediction of the aesthetic value of pictures. We propose a model based on residual learning that is capable of learning subjective, userspecific preferences over aesthetics in photography, while surpassing the stateof-the-art methods and keeping a limited number of user-specific parameters in the model. Our model can also be used for picture enhancement, and it is suitable for content-based or hybrid recommender systems in which the amount of computational resources is limited.The problem of taking into account subjective preferences on image aesthetics prediction is referred to as personalized image aesthetics [27]. Most recent approaches to image aesthetics evaluation have used different deep-learning models, which require a significant amount of annotated data for their training and evaluation. In real-world situations, it is unrealistic to assume that we will have thousands of annotated examples of rated images for any given user. This puts limits on the use of deep learning models for personalized image aesthetics prediction.In order to train a machine learning model capable of taking into account individual preferences over aesthetics in photography, an annotated dataset with the identities of the raters of each picture is needed. One example of this kind of dataset is the FLICKER-AES dataset, presented by Ren et al. [27], which contains over 40000 images rated by more than 200 different human raters. Their study provides, along with this dataset (and another, smaller, dataset), a residual-based learning model capable of taking into account user-specific preferences over aesthetics in photography.We build on their work, and propose an end-to-end convolutional neural network model capable of modelling user-specific preferences with different levels of abstraction, while keeping a reduced number of user-specific parameters within the model. Our method models user-specific preferences by using residual adapters, which were presented in [26,25] and have shown success in multi-domain learning. The main difference between our model and Ren et al.'s is that they model user-specific preferences by first training a generic aesthetics network, which predicts a mean aesthetic score, and computes a user-specific offset by training a Support Vector Regressor using the predicted content and some manually-defined attributes of the picture as its input; whereas our model embeds the user-specific parameters in different layers of the neural network, therefore allowing the model to find user-specific features with different levels of abstraction, and which do not necessarily depend on the contents and a fixed set of attributes of the pictures.Our main contributions are as follows: First, we propo...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.