An important yet challenging problem in understanding indoor scene is recovering indoor frame structure from a monocular image. It is more difficult when occlusions and illumination vary, and object boundaries are weak. To overcome these difficulties, a new approach based on line segment refinement with two constraints is proposed. First, the line segments are refined by four consecutive operations, i.e., reclassifying, connecting, fitting, and voting. Specifically, misclassified line segments are revised by the reclassifying operation; some short line segments are joined by the connecting operation; the undetected key line segments are recovered by the fitting operation with the help of the vanishing points; the line segments converging on the frame are selected by the voting operation. Second, we construct four frame models