Animal pose estimation tools based on deep learning have greatly improved animal behaviour quantification. These tools perform pose estimation on individual video frames, but do not account for variability of animal body shape in their prediction and evaluation. Here, we introduce a novel multi-frame animal pose estimation framework, referred to as OptiFlex. This framework integrates a flexible base model (i.e., FlexibleBaseline), which accounts for variability in animal body shape, with an OpticalFlow model that incorporates temporal context from nearby video frames. Pose estimation can be optimised using multi-view information to leverage all four dimensions (3D space and time). We evaluate FlexibleBaseline using datasets of four different lab animal species (mouse, fruit fly, zebrafish, and monkey) and introduce an intuitive evaluation metric-adjusted percentage of correct key points (aPCK). Our analyses show that OptiFlex provides prediction accuracy that outperforms current deep learning based tools, highlighting its potential for studying a wide range of behaviours across different animal species.