Tiago Roxo scite author profile

Cybercrime affects companies worldwide, costing millions of dollars annually. The constant increase of threats and vulnerabilities raises the need to handle vulnerabilities in a prioritized manner. This prioritization can be achieved through Common Vulnerability Scoring System (CVSS), typically used to assign a score to a vulnerability. However, there is a temporal mismatch between the vulnerability finding and score assignment, which motivates the development of approaches to aid in this aspect. We explore the use of Natural Language Processing (NLP) models in CVSS score prediction given vulnerability descriptions. We start by creating a vulnerability dataset from the National Vulnerability Database (NVD). Then, we combine text pre-processing and vocabulary addition to improve the model accuracy and interpret its prediction reasoning by assessing word importance, via Shapley values. Experiments show that the combination of Lemmatization and 5,000-word addition is optimal for DistilBERT, the outperforming model in our experiments of the NLP methods, achieving state-of-the-art results. Furthermore, specific events (such as an attack on a known software) tend to influence model prediction, which may hinder CVSS prediction. Combining Lemmatization with vocabulary addition mitigates this effect, contributing to increased accuracy. Finally, binary classes benefit the most from pre-processing techniques, particularly when one class is much more prominent than the other. Our work demonstrates that DistilBERT is a state-of-the-art model for CVSS prediction, demonstrating the applicability of deep learning approaches to aid in vulnerability handling. The code and data are available at https://github.com/Joana-Cabral/.

show abstract

Is Gender “In-the-Wild” Inference Really a Solved Problem?

Roxo

Proença

2021

IEEE Trans. Biom. Behav. Identity Sci.

View full text Add to dashboard Cite

YinYang-Net: Complementing Face and Body Information for Wild Gender Recognition

Roxo

Proença

2022

IEEE Access

View full text Add to dashboard Cite

Soft biometrics inference in surveillance scenarios is a topic of interest for various applications, particularly in security-related areas. However, soft biometric analysis is not extensively reported in wild conditions. In particular, previous works on gender recognition report their results in face datasets, with relatively good image quality and frontal poses. Given the uncertainty of the availability of the facial region in wild conditions, we consider that these methods are not adequate for surveillance settings. To overcome these limitations, we: 1) present frontal and wild face versions of three well-known surveillance datasets; and 2) propose YinYang-Net (YY-Net), a model that effectively and dynamically complements facial and body information, which makes it suitable for gender recognition in wild conditions. The frontal and wild face datasets derive from widely used Pedestrian Attribute Recognition (PAR) sets (PETA, PA-100K, and RAP), using a pose-based approach to filter the frontal samples and facial regions. This approach retrieves the facial region of images with varying image/subject conditions, where the state-of-the-art face detectors often fail. YY-Net combines facial and body information through a learnable fusion matrix and a channel-attention sub-network, focusing on the most influential body parts according to the specific image/subject features. We compare it with five PAR methods, consistently obtaining state-of-the-art results on gender recognition, and reducing the prediction errors by up to 24% in frontal samples. The announced PAR datasets versions and YY-Net serve as the basis for wild soft biometrics classification and are available in here.

show abstract

YinYang-Net: Complementing Face and Body Information for Wild Gender Recognition

Roxo¹,

Proença²

2021

Preprint

View full text Add to dashboard Cite

Theoretical and practical assessments over SSH

Costa¹,

Roxo²,

Lopes³

et al. 2023

RPTEL

View full text Add to dashboard Cite

During the COVID-19 pandemic, universities worldwide were forced to close, causing a shift from presential to remote classes. This situation motivated teachers to find suitable tools to evaluate students remotely, fairly, and accurately. However, currently available systems are either survey or exercise evaluation based, not suitable for competency-based assessments. Faced with this context and limitations of available evaluation systems, we developed TestsOverSSH, a system to devise, deliver, and automatically correct assessments performed in a Command Line Interface (CLI) environment. Unique assessments are generated per student when they access the proposed system via Secure SHell (SSH). TestsOverSSH is composed of shell scripts that orchestrate a series of tools and services that come pre-installed in Linux distributions. It can be used to construct multiple-choice or direct answer questions while also requiring students to perform tasks in the environment per se, namely computer programming or CLI manipulation-related assignments. We present examples of the question types in this system, explaining question formats and operating guidelines. Since the assessments are directly performed in the system, logs and command history can be easily retrieved while keeping information within student devices uncollected. We performed evaluations using this system in a real context and obtained student feedback through a custom survey and the System Usability Scale (SUS). Survey results and SUS score suggest that TestsOverSSH is an intuitive evaluation tool, with eased access and usage, making it applicable for e-learning.

show abstract

Is Gender "In-the-Wild" Inference Really a Solved Problem?

Roxo¹,

Proença²

2021

Preprint

View full text Add to dashboard Cite

Soft biometrics analysis is seen as an important research topic, given its relevance to various applications. However, even though it is frequently seen as a solved task, it can still be very hard to perform in wild conditions, under varying image conditions, uncooperative poses, and occlusions. Considering the gender trait as our topic of study, we report an extensive analysis of the feasibility of its inference regarding image (resolution, luminosity, and blurriness) and subject-based features (face and body keypoints confidence). Using three state-of-the-art datasets (PETA, PA-100K, RAP) and five Person Attribute Recognition models, we correlate feature analysis with gender inference accuracy using the Shapley value, enabling us to perceive the importance of each image/subject-based feature. Furthermore, we analyze face-based gender inference and assess the pose effect on it. Our results suggest that: 1) image-based features are more influential for low-quality data; 2) an increase in image quality translates into higher subject-based feature importance; 3) facebased gender inference accuracy correlates with image quality increase; and 4) subjects' frontal pose promotes an implicit attention towards the face. The reported results are seen as a basis for subsequent developments of inference approaches in uncontrolled outdoor environments, which typically correspond to visual surveillance conditions.

show abstract

WASD: A Wilder Active Speaker Detection Dataset

Roxo¹,

Costa²,

Inácio³

et al. 2023

Preprint

View full text Add to dashboard Cite

Current Active Speaker Detection (ASD) models achieve great results on AVA-ActiveSpeaker (AVA), using only sound and facial features. Although this approach is applicable in movie setups (AVA), it is not suited for less constrained conditions. To demonstrate this limitation, we propose a Wilder Active Speaker Detection (WASD) dataset, with increased difficulty by targeting the two key components of current ASD: audio and face. Grouped into 5 categories, ranging from optimal conditions to surveillance settings, WASD contains incremental challenges for ASD with tactical impairment of audio and face data. We select state-of-the-art models and assess their performance in two groups of WASD: Easy (cooperative settings) and Hard (audio and/or face are specifically degraded). The results show that: 1) AVA trained models maintain a stateof-the-art performance in WASD Easy group, while underperforming in the Hard one, showing the 2) similarity between AVA and Easy data; and 3) training in WASD does not improve models performance to AVA levels, particularly for audio impairment and surveillance settings. This shows that AVA does not prepare models for wild ASD and current approaches are subpar to deal with such conditions. The proposed dataset also contains body data annotations to provide a new source for ASD, and is available at https://github.com/Tiago-Roxo/WASD.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tiago Roxo

Predicting CVSS Metric via Description Interpretation

Is Gender “In-the-Wild” Inference Really a Solved Problem?

YinYang-Net: Complementing Face and Body Information for Wild Gender Recognition

YinYang-Net: Complementing Face and Body Information for Wild Gender Recognition

Theoretical and practical assessments over SSH

Is Gender "In-the-Wild" Inference Really a Solved Problem?

WASD: A Wilder Active Speaker Detection Dataset

Contact Info

Product

Resources

About