The advent of rapid whole genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes directly from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking to evaluate the performance of these methods is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno, Aytan-Aktug), a ML baseline and the rule-based ResFinder, by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable performance variation across techniques and datasets. While ML methods generally excelled for closely related strains, ResFinder better handled divergent genomes. Overall, Kover most frequently ranked top among ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes like macrolides, and sulfonamides, were predicted with the highest accuracies. Prediction quality varied substantially across species-antibiotic combinations particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amox-clav, cefoxitin, ceftazidime, and piperacillin/tazobactam, together with tetracyclines demonstrated more variable performances than the other benchmarked antibiotics. By organism,C. jejuniandE. faeciumphenotypes were more robustly predicted thanEscherichia coli,Staphylococcus aureus,Salmonella enterica,Neisseria gonorrhoeae,Klebsiella pneumoniae,Pseudomonas aeruginosa,Acinetobacter baumannii,Streptococcus pneumoniae, and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights optimization needs for robust clinical application, particularly for strains that diverge substantially from those used for training.