Background:The advent of metagenomic sequencing provides microbial abundance patterns that can be leveraged for sample origin prediction. Supervised machine learning classification approaches have been reported to predict sample origin accurately when the origin has been previously sampled. Using metagenomic datasets provided by the 2019 CAMDA challenge, we evaluated the influence of technical, analytical and machine learning approaches for result interpretation and source prediction of new origins.Results:Comparison between 16S rRNA amplicon and shotgun sequencing approaches as well as metagenomic analytical tools showed differences in measured microbial abundance of the same samples, especially for organisms present at low abundance. Shotgun sequence data analyzed using Kraken2 and Bracken taxonomic annotation, had higher detection sensitivity than did other methods. As classification models are limited to labeling previously trained origins, we proposed an alternative approach using Lasso-regularized multivariate regression to predict geographic coordinates for comparison. In both models, the prediction errors were much higher in Leave-1-city-out than in 10-fold cross validation, the former of which realistically forecasted the difficulty in accurately predicting samples from new origins than pre-trained origins. The challenge was further confirmed using mystery samples obtained from new origins. Overall, prediction performances between regression and classification models, as measured by mean squared error, were comparable on mystery samples. Due to higher prediction errors for samples from new origins, we provided an additional strategy based on prediction ambiguity to infer whether a sample is from a new origin for practical applications. Lastly, we showed increased prediction error when data from a different sequencing protocol were included as training data.Conclusions:Here we highlighted the capacity of predicting sample origin accurately with pre-trained origins and the challenge of predicting new origins through both regression and classification models. Overall, the work provided a summary evaluation of sequencing techniques, protocol, taxonomic analytical approaches, and machine learning approaches to inform future designs in metagenomic prediction of sample origin.