Sugarcane (Saccharum spp. hybrids) is a leading industrial crop in tropical and subtropical regions worldwide. More recently, sugarcane has been selected as a key feedstock for biofuels due to its rapid growth, high fiber content and favorable energy input/output ratio.Breeding sugarcane varieties with biomass for efficient conversion to biofuels can be optimized by understanding the genetic control of biomass composition. However, the genetic analysis of these traits is hindered by the genomic complexity, and the limited availability of a reference genome. The aims of this project were: the development of a high-throughput profiling method for rapid screening of the key biomass traits in a sugarcane population; the construction of a new full-length transcriptome reference database; and the identification of transcripts associated with sugar and fiber accumulation in sugarcane.For the screening of genotypes, newly developed predictive models employing nearinfrared (NIR) spectral analysis, coupled with the high performance liquid chromatography (HPLC), were shown to allow high-throughput profiling of major components in the fiber and sugar fractions in sugarcane biomass. Contrasting genotypes of low fiber and high fiber (minimum of ~29% and maximum of 61% total dry biomass) were identified amongst 331 samples from 186 sugarcane genotypes. The population studied exhibited a wide range of fiber/sugar ratio, from 0.4 (as low as that of the typical commercial sugarcane variety) to 2.2 (similar to that of energy-cane). In addition, the lignin content (the central factor in the biomass recalcitrance) ranged from 6 to 14% of the total dry biomass. To aid genotyping, a new sugarcane transcriptome (termed as SUGIT database) was constructed using PacBio full-length isoform sequencing (Iso-Seq), and a cDNA library derived from 22 diverse sugarcane genotypes, of the key tissues (leaf, internode and root), at different developmental stages (from immature to mature). Comparative analysis showed that this new SUGIT database included more full-length transcripts, longer predicted transcripts, and higher average length of the largest 1,000 proteins, compared to a de novo assembly from Illumina RNA-Seq short-read data from the same sample set. The annotation suggested that the majority (~94%) of the SUGIT database was from coding RNAs, while a very small proportion (~2%) could be long non-coding RNAs. About 70-82% of the RNASeq reads from different tissues mapped back to the SUGIT database, suggesting that it represented well the targeted tissues, while about 69% of this database was aligned with the sorghum genome, confirming the high conservation of orthologs in the genic regions of ii the two genomes. Applying the SUGIT database to differential expression analysis (FDR, false discovery rate corrected p-value <0.05), 1,649 transcript isoforms were identified as being differentially expressed between the young and mature tissues in the sugarcane plant, while 555 transcript isoforms were differentially expressed between th...