This paper describes a novel call recognizer system based on the machine learning approach. Current trends, intelligence, emotional recognition and other factors are important challenges in the real world. The proposed system provides robustness with high accuracy and adequate response time for the human–computer interaction. Intelligence and emotion recognition from the speech of human–computer interfaces are simulated via multiple classifier systems (MCSs). At a higher-level stage, the acoustic stream phase extracts certain acoustic features based on the pitch and energy of the signal. Here, the feature space is labeled with various emotional types in the training phase. Emotional categories are trained in the acoustic feature space. The semantic stream process converts speech into text in the input speech signal. Text classification algorithms are applied subsequently. The clustering and classification process is performed via a [Formula: see text]-means algorithm. The detection of the Tone of Voice of call recognition system is achieved with the XGBoost model for feature extraction and detection of a particular phrase in the client call phase. Speech expressions are used for understanding the human emotion. The algorithms are tested and demonstrate good performance in the simulation environment.