Samples for "Fitting New Speakers Based on a Short Untranscribed Sample"

Introduction

We present supplementary audio samples that were generated using the proposed method. These samples capture new voices, unseen during training and the same VCTK85 trained model was used to fit all voices. For each dataset used in the paper, we present below the ground truth (unseen during both training and fitting) as well as the sample generated for the same text as the ground truth.

Note that the fitting occured on a different sample of the same speaker. The obtained embedding, coupled with the text of the new sample were used after the fitting step to generate it.