Identifying the cell of origin of cancer is important to guide treatment decisions. However, in patients with 'cancer of unknown primary' (CUP), standard diagnostic tools often fail to identify the primary tumor. As an alternative, machine learning approaches have been proposed to classify the cell of origin based on somatic mutation profiles in the genome of solid tissue biopsies. However, solid biopsies can cause complications and certain tumors are not accessible. A promising alternative would be liquid biopsies, which contain ctDNA originating from the tumor. Problematically, somatic mutation profiles of tumors obtained from liquid biopsies are inherently extremely sparse and current machine learning models fail to perform in this setting. Here we propose an improved machine learning method to deal with the sparse nature of liquid biopsy data. Firstly, we downsample the SNVs in the samples in order to mimic sparse data conditions. Then extensive data augmentation is performed to artificially increase the number of training samples in order to enhance model robustness under sparse data conditions. Finally, we employ data integration to merge information from i) somatic single nucleotide variant (SNV) density across the genome, ii) somatic SNVs in driver genes and iii) trinucleotide motifs. Our adapted method achieves an average accuracy of 0.88 on the data where only 70% of SNVs are retained, which is comparable to an average accuracy of 0.87 with the original model on the full SNV data. Even when only 2% of the data is retained, the average accuracy is 0.65 compared to 0.41 with the original model. The method and results presented here open the way for application of machine learning in the detection of the cell of origin of cancer from sparse liquid biopsy data.