This paper addresses Hip-Hop lyric generation with conditional Neural Language Models. We develop a simple yet effective mechanism to extract and apply conditional templates from text snippets, and show -- on the basis of a large-scale crowd-sourced manual evaluation -- that these templates significantly improve the quality and realism of the generated snippets. Importantly, the proposed approach enables end-to-end training, targeting formal properties of text such as rhythm and rhyme, which are central characteristics of rap texts. Additionally, we explore how generating text at different scales (e.g. character-level or word-level) affects the quality of the output. We find that a hybrid form -- a hierarchical model that aims to integrate Language Modeling at both word and character-level scales -- yields significant improvements in text quality, yet surprisingly, cannot exploit conditional templates to their fullest extent. Our findings highlight that text generation models based on Recurrent Neural Networks (RNN) are sensitive to the modeling scale and call for further research on the observed differences in effectiveness of the conditioning mechanism at different scales.
|Title of host publication||Proceedings of the 12th International Conference on Natural Language Generation (INLG)|
|Place of Publication||Tokyo, Japan|
|Publication status||Published - Oct 2019|