This paper presents a multiply-accumulate (MAC) unit that enables a dual-mode truncation error compensation (TEC) scheme based on a fixed-width Booth multiplier (FWBM) for convolutional neural network (CNN) inference operations. The proposed tailored TEC schemes of Modes 1 and 2 can achieve high MAC accuracy for a general or rectified linear unit-based CNN model with general (Mode 1) or positive/zero (Mode 2) input patterns. By pre-calculating the pre-known CNN model coefficients, the proposed dual-mode TEC scheme can be realized using minimal partial product operations with high hardware efficiency using a software-hardware codesign approach. Further, a reconfigurable architecture of the resultant MAC unit is presented to realize the proposed dual-mode TEC scheme. By evaluating the accuracy for 9-N and 25-N MAC operations (N denotes the number of times MAC is performed), a MAC operation using the proposed TEC scheme can achieve the highest accuracy for Modes 1 and 2, relative to contrast samples that directly employ the FWBM with a conventional TEC function. The hardware performances of 9-N and 25-N MAC units are also evaluated using the TSMC 40-nm standard cell library. Compared with the contrast TEC-enabled designs, the proposed MAC unit exhibits higher hardware efficiency in terms of area, delay, and power consumption and achieves a minimum reduction of more than 40% in both area-delay-error and power-delay-error products. Moreover, the resultant 9-N and 25-N MAC units are verified using a system-on-chip field-programmable gate array platform to test a CNN model for handwritten digit classification.