The segregation of figures from the background is an important step in visual perception. In primary visual cortex, figures evoke stronger activity than backgrounds during a delayed phase of the neuronal responses, but it is unknown how this figure-ground modulation (FGM) arises and whether it is necessary for perception. Here, we show, using optogenetic silencing in mice, that the delayed V1 response phase is necessary for figure-ground segregation. Neurons in higher visual areas also exhibit FGM and optogenetic silencing of higher areas reduced FGM in V1. In V1, figures elicited higher activity of vasoactive intestinal peptide-expressing (VIP) interneurons than the background, whereas figures suppressed somatostatin-positive interneurons, resulting in an increased activation of pyramidal cells. Optogenetic silencing of VIP neurons reduced FGM in V1, indicating that disinhibitory circuits contribute to FGM. Our results provide insight into how lower and higher areas of the visual cortex interact to shape visual perception.