In this thesis, we work on the task of Text Spotting, within the field of Computer Vision. In this manuscript, we propose new algorithms, methods, and datasets that can be used to detect, recognize, and enhance text character sequences found within images, based on the need for information retrieval on systems that cannot crawl or access such information by any other means that is not a graphical representation. Motivated by our work alongside the Spanish National Cybersecurity Institute (INCIBE), we focus our research on recovering character sequences found within visual media of both darknet and industrial sources. We intend to support INCIBE products and services related to cybersecurity that may monitor potential illegal activities and critical infrastructures.To improve scene text recognition performance, we analyze images in terms of their irregularity, because some methods often claim to be robust on irregular datasets that contain a large amount of irregular text. After building a classification model for these categories, we created a new dataset, the Fully Irregular Text (FIT-Text) dataset, composed primarily of irregular images, with the intention that other methods, oriented to this problem, can use it to evaluate their performance.We propose a new performance metric, the Contained-Levenshtein (C-Lev) accuracy. Literature scene text recognizers have traditionally reported both the accuracy and the normalized edit distance on datasets as a performance metric, but never combined the two into a singular, effective metric that can help discern between severe and low priority mistakes. C-Lev also serves as a label-checking tool, helping methods stay robust against minor human-generated labeling errors.To increase scene text accuracy, we propose the integration of string-distance measurements as components of the loss functions in both CTC and Attention recognizers. Testing various distances as the proposed weight, we consider the Hamming distance the most beneficial, with a total improvement of over 6% accuracy using literature datasets.For scene text detectors, we propose a new metric that assigns value to scene text images according to their documented regions, the Text Density Distribution (TDD), which classifies visual media according to the spatial distribution of region clusters. We also propose using this metric to train scene text detectors, while monitoring their computational cost and performance balance. We note that the detection F1 score only drops 4% when Para los detectores de texto, proponemos una nueva métrica que asigna valor a las imágenes según sus regiones documentadas, la Distribución de Densidad de Texto (TDD en inglés), que clasifica los medios visuales según su cantidad y distribución espacial de regiones. Proponemos utilizar esta métrica para seleccionar conjuntos reducidos de datos con los que entrenar detectores de texto, reduciendo su coste computacional y preservando su rendimiento. Observamos que la F1 score de la detección sólo disminuye en un 4 % cuando se utiliza menos del...