Accurately timing sub-second sensory events is crucial when perceiving our dynamic world. This ability allows complex human behaviors that require timing-dependent multisensory integration and action planning. Such behaviors include perception and performance of speech, music, driving, and many sports. How are responses to sensory event timing processed for multisensory integration and action planning? We measured responses to viewing systematically changing visual event timing using ultra-high-field fMRI. We analyzed these responses with neural population response models selective for event duration and frequency, following behavioral, computational, and macaque action planning results and comparisons to alternative models. We found systematic local changes in timing preferences (recently described in supplementary motor area) in an extensive network of topographic timing maps, mirroring sensory cortices and other quantity processing networks. These timing maps were partially left lateralized and widely spread, from occipital visual areas through parietal multisensory areas to frontal action planning areas. Responses to event duration and frequency were closely linked. As in sensory cortical maps, response precision varied systematically with timing preferences, and timing selectivity systematically varied between maps. Progressing from posterior to anterior maps, responses to multiple events were increasingly integrated, response selectivity narrowed, and responses focused increasingly on the middle of the presented timing range. These timing maps largely overlap with numerosity and visual field map networks. In both visual timing map and visual field map networks, selective responses and topographic map organization may facilitate hierarchical transformations by allowing neural populations to interact over minimal distances.