A new speaker-adaptation technique for deep neural network (DNN)-based speech synthesis-which requires only speech data without orthographic transcriptions-is proposed. This technique is based on a DNN-based speech-synthesis model that takes speaker, gender, and age into consideration as additional inputs and outputs acoustic parameters of corresponding voices from text in order to construct a multi-speaker model and perform speaker adaptation. It uses a new input code that represents acoustic similarity to each of the training speakers in a probability. The new input code, called "speaker-similarity vector," is obtained by concatenating posterior probabilities calculated from each model of the training speakers. GMM-UBM or i-vector/PLDA, which are widely used in text-independent speaker verification, are used to represent the speaker models, since they can be used without text information. Text and the speaker-similarity vectors of the training speakers are used as input to first train a multi-speaker speech-synthesis model, which outputs acoustic parameters of the training speakers. A new speaker-similarity vector is then estimated by using a small amount of speech data uttered by an unknown target speaker on the basis of the separately trained speaker models. It is expected that inputting the estimated speaker-similarity vector into the multi-speaker speech-synthesis model can generate synthetic speech that resembles the target speaker's voice. In objective and subjective experiments, adaptation performance of the proposed technique was evaluated using not only studioquality adaptation data but also low-quality (i.e., noisy and reverberant) data. The results of the experiments indicate that the proposed technique makes it possible to rapidly construct a voice for the target speaker in DNN-based speech synthesis.