|
[1] K. Kobayashi and T. Toda, “Sprocket: open-source voice conversion software,” Proc. Odyssey, pp. 203-210, June 2018. [2] M. Masanori, “Harvest: a high-performance fundamental frequency estimator rom speech signals,” Proc. INTERSPEECH, pp. 2321–2325, Aug. 2017. [3] M. Masanori, “D4C, a band-aperiodicity estimator for high-quality speech synthesis,” Speech Communication, vol. 84, pp. 57-65, Sept. 2016. [4] M. Masanori, “CheapTrick, a spectral envelope estimator for high-quality speech synthesis,” Speech Communication, vol. 67, pp. 1-7, Sept. 2015. [5] J. Demmel, J. R. Gilbert, and X. S. Li, “Superlu users’ guide,” http://crd.lbl.gov/~xiaoye/SuperLU/ug.pdf. [6] M. Morise and Y. Watanabe, “Sound quality comparison among high-quality vocoders by using re-synthesized speech,” Acoust. Sci. & Tech. 33, vol. 39, no. 3, pp. 263-265, May 2018. [7] J. Flanagan and R. Golden, “Phase vocoder,” The Bell System Technical Journal, vol. 45, no. 9, pp. 1493–1509, 2009. [8] Y. Stylianou, O. Capp´e, and E. Moulines, “Continuousprobabilistic transform for voice conversion,” IEEE Trans.SAP, vol. 6, no. 2, pp. 131–142, Mar. 1998. [9] T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum likelihood estimation of spectral parameter trajectory,” IEEE Trans. ASLP, vol. 15, no. 8, pp. 2222–2235, Nov. 2007. [10] T. Toda, T. Muramatsu, and H. Banno, “Implementation of computationally efficient real-time voice conversion,” Proc. INTERSPEECH, Sept. 2012 [11] N. Pilkington, H. Zen, and M. Gales, “Gaussian process experts for voice conversion,” Proc. INTERSPEECH, pp. 2761–2764, Aug. 2011. [12] N. Xu, Y. Tang, J. Bao, A. Jiang, X. Liu, and Z. Yang, “Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data,” Speech Communication, vol. 58, pp. 124–138, Mar. 2014. [13] R. Takashima, T. Takiguchi, and Y. Ariki, “Exemplarbased voice conversion using sparse representation in noisy environments,” IEICE Trans. Inf. and Syst., vol. E96-A, no. 10, pp. 1946–1953, Oct. 2013. [14] Z. Wu, T. Virtanen, E. Chng, and H. Li, “Exemplarbased sparse representation with residual compensation for voice conversion,” IEEE/ACM Trans. ASLP, vol. 22, no. 10, pp. 1506–1521, June 2014. [15] T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki,“Voice conversion in high-order eigen space using deep belief nets,” Proc. INTERSPEECH, pp. 369–372, Aug.2013. [16] L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, “Voice conversion using deep neural networks with layer-wise generative training,” IEEE/ACM Trans. ASLP, vol. 22, no.12, pp. 1859–1872, Dec. 2014. [17] L. Sun, S. Kang, K. Li, and H. Meng, “Voice conversion using deep bidirectional long short-term memory based recurrent neural networks,” Proc. ICASSP, pp. 4869–4873, Apr. 2015. [18] D. Erro, A. Moreno, and A. Bonafonte, “INCA algorithm for training voice conversion systems from nonparallel corpora,” IEEE Trans. ASLP, vol. 18, no. 5, pp. 944–953, 2010. [19] T. Hashimoto, D. Saito, and N. Minematsu, “Arbitrary speaker conversion based on speaker space bases constructed by deep neural networks,” Proc. APSIPA, pp. 1–4,Dec. 2016. [20] Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano,“Many-to-many eigenvoice conversion with reference voice,” Proc. INTERSPEECH, pp. 1623–1626, Sept.2009. [21] Y. Chen, M. Chu, E. Chang, J. Liu, and R. Liu, “Voice conversion with smoothed GMM and MAP adaptation,” Proc. INTERSPEECH, pp. 1–4, Sept. 2003. [22] H. Kawahara, J. Estill, and O. Fujimura, “Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited,” Proc. ICASSP1997, pp.1303–1306, 1997. [23] M. Morise, H. Kawahara, and T. Nishiura, “Rapid f0 estimation for high-snr speech based on fundamental component extraction,” IEICE Trans. Inf. & Syst. (Japanese Edition), vol.J93-D, no.2, pp.109–117, 2010.
|