Fairness is an increasingly important concern as machine learning models are used to support decision making in high-stakes applications such as mortgage lending, hiring, and prison sentencing. This paper introduces a new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license (https://github.com/ibm/aif360). The main objectives of this toolkit are to help facilitate the transition of fairness research algorithms to use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms.The package includes a comprehensive set of fairness metrics for datasets and models, explanations for these metrics, and algorithms to mitigate bias in datasets and models. It also includes an interactive Web experience (https://aif360.mybluemix.net) that provides a gentle introduction to the concepts and capabilities for line-of-business users, as well as extensive documentation, usage guidance, and industry-specific tutorials to enable data scientists and practitioners to incorporate the most appropriate tool for their problem into their work products. The architecture of the package has been engineered to conform to a standard paradigm used in data science, thereby further improving usability for practitioners. Such architectural design and abstractions enable researchers and developers to extend the toolkit with their new algorithms and improvements, and to use it for performance benchmarking. A built-in testing infrastructure maintains code quality.
Previous algorithms for the recovery of Bayesian belief network structures from data have been either highly dependent on conditional independence (CI) tests, or have required on ordering on the nodes to be supplied by the user. We present an algorithm that integrates these two approaches: CI tests are used to generate an ordering on the nodes from the database, which is then used to recover the underlying Bayesian network structure using a non-Cl-test-based method. Results of the evaluation of the algorithm on a number of databases (e.g., ALARM, LED, and SOYBEAN) are presented. We also discuss some algorithm performance issues and open problems.
IMPORTANCEThe lack of standards in methods to reduce bias for clinical algorithms presents various challenges in providing reliable predictions and in addressing health disparities. OBJECTIVE To evaluate approaches for reducing bias in machine learning models using a real-world clinical scenario. DESIGN, SETTING, AND PARTICIPANTSHealth data for this cohort study were obtained from the IBM MarketScan Medicaid Database. Eligibility criteria were as follows: (1) Female individuals aged 12 to 55 years with a live birth record identified by delivery-related codes from January 1, 2014, through December 31, 2018; (2) greater than 80% enrollment through pregnancy to 60 days post partum; and (3) evidence of coverage for depression screening and mental health services. Statistical analysis was performed in 2020. EXPOSURES Binarized race (Black individuals and White individuals). MAIN OUTCOMES AND MEASURES Machine learning models (logistic regression [LR], randomforest, and extreme gradient boosting) were trained for 2 binary outcomes: postpartum depression (PPD) and postpartum mental health service utilization. Risk-adjusted generalized linear models were used for each outcome to assess potential disparity in the cohort associated with binarized race (Black or White). Methods for reducing bias, including reweighing, Prejudice Remover, and removing race from the models, were examined by analyzing changes in fairness metrics compared with the base models. Baseline characteristics of female individuals at the top-predicted risk decile were compared for systematic differences. Fairness metrics of disparate impact (DI, 1 indicates fairness) and equal opportunity difference (EOD, 0 indicates fairness). RESULTS Among 573 634 female individuals initially examined for this study, 314 903 were White (54.9%), 217 899 were Black (38.0%), and the mean (SD) age was 26.1 (5.5) years. The risk-adjusted odds ratio comparing White participants with Black participants was 2.06 (95% CI, 2.02-2.10) for clinically recognized PPD and 1.37 (95% CI, 1.33-1.40) for postpartum mental health service utilization. Taking the LR model for PPD prediction as an example, reweighing reduced bias as measured by improved DI and EOD metrics from 0.31 and −0.19 to 0.79 and 0.02, respectively.
Automotive telematics may be defined as the informationintensive applications that are being enabled for vehicles by a combination of telecommunications and computing technology. Telematics by its nature requires the capture of sensor data, storage and exchange of data to obtain remote services. In order for automotive telematics to grow to its full potential, telematics data must be protected. Data protection must include privacy and security for end-users, service providers and application providers. In this paper, we propose a new framework for data protection that is built on the foundation of privacy and security technologies. The privacy technology enables users and service providers to define flexible data model and policy models. The security technology provides traditional capabilities such as encryption, authentication, non-repudiation. In addition, it provides secure environments for protected execution, which is essential to limiting data access to specific purposes.
Automotive telematics may be defined as the informationintensive applications that are being enabled for vehicles by a combination of telecommunications and computing technology. Telematics by its nature requires the capture of sensor data, storage and exchange of data to obtain remote services. In order for automotive telematics to grow to its full potential, telematics data must be protected. Data protection must include privacy and security for end-users, service providers and application providers. In this paper, we propose a new framework for data protection that is built on the foundation of privacy and security technologies. The privacy technology enables users and service providers to define flexible data model and policy models. The security technology provides traditional capabilities such as encryption, authentication, non-repudiation. In addition, it provides secure environments for protected execution, which is essential to limiting data access to specific purposes.
Previous algorithms for the recovery of Bayesian belief network structures from data have been either highly dependent on conditional independence (CI) tests, or have required on ordering on the nodes to be supplied by the user.We present an algorithm that integrates these two approaches: CI tests are used to generate an ordering on the nodes from the database, which is then used to recover the underlying Bayesian network structure using a non-Cl-test-based method. Results of the evaluation of the algorithm on a number of databases (e.g., ALARM, LED, and SOYBEAN) are presented. We also discuss some algorithm performance issues and open problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.