Context. The field of galaxy evolution will make a great leap forward in the next decade as a consequence of the huge effort by the scientific community in multi-object spectroscopic facilities. Various future surveys will enormously increase the number of available galaxy spectra, providing new insights into unexplored areas of research. To maximise the impact of such incoming data, the analysis methods must also step up, extracting reliable information from the available spectra. It is therefore urgent to refine and test reliable analysis tools that are able to infer the properties of a galaxy from medium-or high-resolution spectra. Aims. In this paper we aim to investigate the limits and the reliability of different spectral synthesis methods in the estimation of the mean stellar age and metallicity. These two quantities are fundamental to determine the assembly history of a galaxy by providing key insights into its star formation history. The main question this work aims to address is which signal-to-noise ratios (S/N) are needed to reliably determine the mean stellar age and metallicity from a galaxy spectrum and how this depends on the tool used to model the spectra. Methods. To address this question we built a set of realistic simulated spectra containing stellar and nebular emission, reproducing the evolution of a galaxy in two limiting cases: a constant star formation rate and an exponentially declining star formation with a single initial burst. We degraded the synthetic spectra built from these two star formation histories (SFHs) to different S/N and analysed with three widely used spectral synthesis codes, namely Fado, Steckmap, and Starlight, assuming similar fitting set-ups and the same base of spectral templates. Results. For S/N ≤ 5 all the three tools show a large diversity in the results. The Fado and Starlight tools find median differences in the light-weighted mean stellar age of ∼0.1 dex, while Steckmap shows a higher value of ∼0.2 dex. For S/N > 50 the median differences in Fado are ∼0.03 dex (∼ 7%), a factor 3 and 4 lower than the 0.08 dex (∼20%) and 0.11 dex (∼30%) obtained from Starlight and Steckmap, respectively. Detailed investigations of the best-fit spectrum for galaxies with overestimated mass-weighted quantities point towards the inability of purely stellar models to fit the observed spectral energy distribution around the Balmer jump. Conclusions. Our results imply that when a galaxy enters a phase of high specific star formation rate (sSFR) the neglect of the nebular continuum emission in the fitting process has a strong impact on the estimation of its SFH when purely stellar fitting codes are used, even in presence of high S/N spectra. The median value of these differences are of the order of 7% (Fado), 20% (Starlight), and 30% (Steckmap) for light-weighted quantities, and 20% (Fado), 60% (Starlight), and 20% (Steckmap) for mass-weighted quantities. More specifically, for a continuous SFH both Steckmap and Starlight overestimate the stellar age by >2 dex within the first ∼100 My...