Generative Modelling for Controllable Piano Performance Audio Synthesis

Audio Examples

This page contains a set of audio samples in support of the paper. All pieces used here are unseen during training.

In our work, we generate realistic piano performance in audio domain that closely follows temporal conditioning of the two essential style features for piano performances: articulation (legato or detached), and dynamics (loud or soft). Below we demonstrate the model’s applicability of fine-grained style morphing over the course of synthesized audio, which can be based on conditions sampled from the prior (Part 1), or inferred from other pieces (Part 2).

One of the envisioned use cases is to inspire creative and brand new interpretations for existing pieces of piano music.

Contents

Gradual Style Morphing
Performance Style Transfer

Part 1: Gradual Style Morphing

Example 1 - Original Generated Mel-Spectrogram Audio Result
staccato soft -> legato loud
 
staccato loud -> legato soft
 
legato soft -> staccato loud
 
legato loud -> staccato soft
 
Example 2 - Original Generated Mel-Spectrogram Audio Result
staccato soft -> legato loud
 
staccato loud -> legato soft
 
legato soft -> staccato loud
 
legato loud -> staccato soft
 
Example 3 - Original Generated Mel-Spectrogram Audio Result
staccato soft -> legato loud
 
staccato loud -> legato soft
 
legato soft -> staccato loud
 
legato loud -> staccato soft
 

Part 2: Performance Style Transfer

Example 1 Mel-Spectrogram / Piano Roll Audio
Content Piece
Style Piece
Result
Amplitude Correlation
 
Example 2 Mel-Spectrogram / Piano Roll Audio
Content Piece
Style Piece
Result
Amplitude Correlation
 
Example 3 Mel-Spectrogram / Piano Roll Audio
Content Piece
Style Piece
Result
Amplitude Correlation