Drug resistance (DR) remains a global healthcare concern. In contrast to other human bacterial pathogens, acquiring mutations in the genome is the main mechanism of drug resistance for Mycobacterium tuberculosis (MTB). For some antibiotics resistance of a particular isolate can be predicted with high confidence knowing whether specific mutations occurred, but for some antibiotics our knowledge of resistance mechanism is moderate. Statistical machine learning (ML) methods are used in attempts to infer new genes implicated in drug resistance. These methods use large collections of isolates with known whole-genome sequences and resistance status for different drugs. However, high correlations between the presence or absence of resistance to drugs that are used together in one treatment regimen complicate inference of causal mutations by traditional ML. Recently, several new methods were suggested to deal with the problem of correlations of response variables in training data. In this study, we applied the following methods to tackle the confounding effect of resistance co-occurrence in a dataset of approximately 13 000 complete genomes of MTB with characterized resistance status for 13 drugs: logistic regression with different regularization penalty functions, a polynomial-time algorithm for best-subset selection problem (ABESS), and "Hungry, Hungry SNPos" (HHS) method. We compared these methods by the ability to select known causal mutations for the resistance to each particular drug and not to select mutations in genes that are known to be associated with resistance to other drugs. ABESS significantly outperformed the others selecting more relevant sets of mutations. We also showed that aggregation of rare mutations into features indicating changes of PFAM domains increased the quality of prediction and these features were majorly selected by ABESS.