[1]A. van den Oord et al., “WaveNet: A Generative Model for Raw Audio.” arXiv, Sep. 19, 2016. [2]N. Kalchbrenner et al., “Efficient Neural Audio Synthesis.” arXiv, Jun. 25, 2018. [3]J. Kong, J. Kim, and J. Bae, “HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis.” arXiv, Oct. 23, 2020. [4]J. Engel, L. Hantrakul, C. Gu, and A. Roberts, “DDSP: Differentiable Digital Signal Processing.” arXiv, Jan. 14, 2020. [5]J. Engel, K. K. Agrawal, S. Chen, I. Gulrajani, C. Donahue, and A. Roberts, "GANSynth: Adversarial Neural Audio Synthesis." in International Conference on Learning Representations (ICLR), 2019. [6]J. Engel et al., “Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders.” arXiv, Apr. 05, 2017. [7]L. Hantrakul, J. Engel, A. Roberts, and C. Gu, "Fast and Flexible Neural Audio Synthesis." in International Society for Music Information Retrieval Conference (ISMIR), 2019. [8]A. van den Oord et al., “Parallel WaveNet: Fast High-Fidelity Speech Synthesis.” arXiv, Nov. 28, 2017. [9]R. Prenger, R. Valle, and B. Catanzaro, “WaveGlow: A Flow-based Generative Network for Speech Synthesis.” arXiv, Oct. 30, 2018. [10]J.-M. Valin and J. Skoglund, “LPCNet: Improving Neural Speech Synthesis Through Linear Prediction.” arXiv, Feb. 19, 2019. [11]N. Mor, L. Wolf, A. Polyak, and Y. Taigman, “A Universal Music Translation Network.” arXiv, May 23, 2018. [12]S. Kovela, R. Valle, A. Dantrey, and B. Catanzaro, “Any-to-Any Voice Conversion with F 0 and Timbre Disentanglement and Novel Timbre Conditioning,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece: IEEE, Jun. 2023, pp. 1–5. [13]J. W. Kim, J. Salamon, P. Li, and J. P. Bello, “CREPE: A Convolutional Representation for Pitch Estimation.” arXiv, Feb. 16, 2018. [14]B. Nguyen and F. Cardinaux, “NVC-Net: End-to-End Adversarial Voice Conversion.” arXiv, Jun. 02, 2021. [15]K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” arXiv, Dec. 10, 2015. [16]A. Vaswani et al., “Attention Is All You Need.” arXiv, Aug. 01, 2023. [17]R. Dey and F. M. Salem, “Gate-variants of Gated Recurrent Unit (GRU) neural networks,” in 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA: IEEE, Aug. 2017, pp. 1597–1600. [18]K. Kumar et al., “MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis.” arXiv, Dec. 08, 2019. [19]X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley, “Least Squares Generative Adversarial Networks.” arXiv, Apr. 05, 2017. [20] D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," in International Conference on Learning Representations (ICLR), 2014. [21] I. J. Goodfellow et al., "Generative Adversarial Nets," in Advances in Neural Information Processing Systems (NeurIPS), 2014.