This article deals with a modern disease of academic science that consists of an enormous increase in the number of scientific publications without a corresponding advance of knowledge. Findings are sliced as thin as salami and submitted to different journals to produce more papers. If we consider academic papers as a kind of scientific 'currency' that is backed by gold bullion in the central bank of 'true' science, then we are witnessing an article-inflation phenomenon, a scientometric bubble that is most harmful for science and promotes an unethical and antiscientific culture among researchers. The main problem behind the scenes is that the impact factor is used as a proxy for quality. Therefore, not only for convenience, but also based on ethical principles of scientific research, we adhere to the San Francisco Declaration on Research Assessment when it emphasizes "the need to eliminate the use of journal-based metrics in funding, appointment and promotion considerations; and the need to assess research on its own merits rather on the journal in which the research is published". Our message is mainly addressed to the funding agencies and universities that award tenures or grants and manage research programmes, especially in developing countries. The message is also addressed to well-established scientists who have the power to change things when they participate in committees for grants and jobs.
Our purpose in this research is to develop a methodology to automatically and efficiently classify web images as UML static diagrams, and to produce a computer tool that implements this function. The tool receives as input a bitmap file (in different formats) and tells whether the image corresponds to a diagram. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19000 web images manually classified by experts. In this work we do not consider the textual contents of the images.
Our purpose in this research is to develop a method to automatically and efficiently classify web images as Unified Modeling Language (UML) static diagrams, and to produce a computer tool that implements this function. The tool receives a bitmap file (in different formats) as an input and communicates whether the image corresponds to a diagram. For pragmatic reasons, we restricted ourselves to the simplest kinds of diagrams that are more useful for automated software reuse: computer-edited 2D representations of static diagrams. The tool does not require that the images are explicitly or implicitly tagged as UML diagrams. The tool extracts graphical characteristics from each image (such as grayscale histogram, color histogram and elementary geometric forms) and uses a combination of rules to classify it. The rules are obtained with machine learning techniques (rule induction) from a sample of 19,000 web images manually classified by experts. In this work, we do not consider the textual contents of the images. Our tool reaches nearly 95% of agreement with manually classified instances, improving the effectiveness of related research works. Moreover, using a training dataset 15 times bigger, the time required to process each image and extract its graphical features (0.680 s) is seven times lower.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.