As a task to locate and classify objects of interest in given images or videos, object detection is a core research direction of computer vision. It promotes intelligent monitoring, face recognition, and image segmentation development. However, deep learning-based object detection models still have a low detection accuracy for small-scale objects. In this paper, we propose a multi-scale feature fusion object detection with a bidirectional feature pyramid network and 1 secondary candidate box refinement, named MS-Faster R-CNN. The strategy, combined with the Feature Pyramid Network(FPN), uses two links to complete the feature fusion, making the semantics of the fused features richer and adapted to different scale objects. In the candidate box recommendation stage, we use the cascaded Region Proposal Network(RPN) and optimized Non-Maximum Sup-pression(NMS) so that the candidate box of the small-scale object would not be over-suppressed, which improves the recommendation efficiency of the candidate box. Finally, the Region of Interest(ROI) Align pooling technology based on the bilinear interpolation method is considered to avoid the loss of accuracy caused by quantization. Extensive experiments show that our scheme has better detection performance than other methods on datasets of Pascal VOC 2007, Pascal VOC 2012, and MS COCO.