|
[1] J. B. Allen and D. A. Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4):943–950, 1979. [2] S. Araki, H. Sawada, R. Mukai, and S. Makino. DOA estimation for multiple sparse sources with normalized observation vector clustering. In Proc. ICASSP, pages 33–36, 2006. [3] M. R. Bai, J.-G. Ih, and J. Benesty. Acoustic array systems: theory, implementation, and application. John Wiley & Sons, 2013. [4] J. Capon. High-resolution frequency-wavenumber spectrum analysis. Proceedings of the IEEE, 57(8):1408–1418, 1969. [5] J. H. DiBiase. A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. PhD thesis, Brown University, Providence, R.I., 2000. [6] L. Drude, C. Boeddeker, J. Heymann, R. Haeb-Umbach, K. Kinoshita, M. Delcroix, and T. Nakatani. Integrating neural network based beamforming and weighted prediction error dereverberation. In Proc. INTERSPEECH, pages 3043–3047, 2018. [7] O. L. Frost. An algorithm for linearly constrained adaptive array processing. Proceedings of the IEEE, 60(8):926–935, 1972. [8] S.-W. Fu, Y. Tsao, H.-T. Hwang, and H.-M. Wang. Quality-Net: An end-to-end non-intrusive speech quality assessment model based on BLSTM. arXiv preprint arXiv:1808.05344, 2018. [9] S.-W. Fu, C. Yu, T.-A. Hsieh, P. Plantinga, M. Ravanelli, X. Lu, and Y. Tsao. MetricGAN+: An improved version of MetricGAN for speech enhancement. arXiv preprint arXiv:2104.03538, 2021. [10] E. A. P. Habets, J. Benesty, I. Cohen, S. Gannot, and J. Dmochowski. New insights into the MVDR beamformer in room acoustics. IEEE Transactions on Audio, Speech, and Language Processing, 18(1):158–170, 2009. [11] J. Heymann, L. Drude, and R. Haeb-Umbach. Neural network based spectral mask estimation for acoustic beamforming. In Proc. ICASSP, pages 196–200, 2016. [12] G. Huang, J. Benesty, I. Cohen, and J. Chen. A simple theory and new method of differential beamforming with uniform linear microphone arrays. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28:1079–1093, 2020. [13] M.-W. Huang. Development of Taiwan Mandarin hearing in noise test. Master’s thesis, Department of speech language pathology and audiology, National Taipei University of Nursing and Health science, 2005. [14] W. Huang and J. Feng. Differential beamforming for uniform circular array with directional microphones. In Proc. INTERSPEECH, pages 71–75, 2020. [15] A. N. S. Institute S3.5-1997. Methods for calculation of the speech intelligibility index. American National Standards Institute (ANSI), 1997. [16] U. Kjems and J. Jensen. Maximum likelihood based noise covariance matrix estimation for multi-microphone speech enhancement. In Proc. EUSIPCO, pages 295–299. IEEE, 2012. [17] C. Knapp and G. C. Carter. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(4):320–327, 1976. [18] H. Kuttruff and E. Mommertz. Room acoustics. In Handbook of engineering acoustics, pages 239–267. Springer, 2012. [19] B. Kwon, Y. Park, and Y.-S. Park. Analysis of the GCC-PHAT technique for multiple sources. In Proc. ICCAS, pages 2070–2073, 2010. [20] X. Le, H. Chen, K. Chen, and J. Lu. DPCRN: Dual-path convolution recurrent network for single channel speech enhancement. arXiv preprint arXiv:2107.05429, 2021. [21] N. Le Goff, J. Jensen, M. S. Pedersen, and S. L. Callaway. An introduction to opensound navigator™. Oticon A/S, 2016. [22] C. Li, J. Benesty, and J. Chen. Beamforming based on null-steering with small spacing linear microphone arrays. The Journal of the Acoustical Society of America, 143(5):2651–2665, 2018. [23] Y. Liu and D. Wang. Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12):2092–2102, 2019. [24] P. C. Loizou. Speech enhancement: theory and practice. CRC press, 2007. [25] Y.-J. Lu, X. Chang, C. Li, W. Zhang, S. Cornell, Z. Ni, Y. Masuyama, B. Yan, R. Scheibler, Z.-Q. Wang, et al. ESPnet-SE+ +: Speech enhancement for robust speech recognition, translation, and understanding. arXiv preprint arXiv:2207.09514, 2022. [26] Y. Luo and N. Mesgarani. Conv-TasNet: Surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(8):1256–1266, 2019. [27] S. Markovich-Golan and S. Gannot. Performance analysis of the covariance subtraction method for relative transfer function estimation and comparison to the covariance whitening method. In Proc. ICASSP, pages 544–548, 2015. [28] U. Michel. History of acoustic beamforming. In Proc. 1st. BeBeC, pages 1–17, 2006. [29] S. Mohan, M. E. Lockwood, M. L. Kramer, and D. L. Jones. Localization of multiple acoustic sources with small arrays using a coherence test. The Journal of the Acoustical Society of America, 123(4):2136–2147, 2008. [30] R. P. Mueller, R. S. Brown, H. Hop, and L. Moulton. Video and acoustic camera techniques for studying fish under ice: a review and comparison. Reviews in Fish Biology and Fisheries, 16:213–226, 2006. [31] T. Ochiai, S. Watanabe, T. Hori, J. R. Hershey, and X. Xiao. Unified architecture for multichannel end-to-end speech recognition with neural beamforming. IEEE Journal of Selected Topics in Signal Processing, 11(8):1274–1288, 2017. [32] D. B. Paul and J. Baker. The design for the Wall Street Journal-based CSR corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, pages 357–362, 1992. [33] L. Pfeifenberger, M. Zöhrer, and F. Pernkopf. Deep complex-valued neural beamformers. In Proc. ICASSP, pages 2902–2906, 2019. [34] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In Proc. ICASSP, pages 749–752, 2001. [35] R. Roy and T. Kailath. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(7):984–995, 1989. [36] D. Salvati, C. Drioli, and G. L. Foresti. Incoherent frequency fusion for broadband steered response power algorithms in noisy environments. IEEE Signal Processing Letters, 21(5):581–585, 2014. [37] R. Scheibler, E. Bezzam, and I. Dokmanić. Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In Proc. ICASSP, pages 351–355, 2018. [38] H. Schepker, S. E. Nordholm, L. T. T. Tran, and S. Doclo. Null-steering beamformer-based feedback cancellation for multi-microphone hearing aids with incoming signal preservation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4):679–691, 2019. [39] H. Schepker, L. T. T. Tran, S. Nordholm, and S. Doclo. Acoustic feedback cancellation for a multi-microphone earpiece based on a null-steering beamformer. In Proc. IWAENC, pages 1–5, 2016. [40] H. Schepker, L. T. T. Tran, S. Nordholm, and S. Doclo. Null-steering beamformer for acoustic feedback cancellation in a multi-microphone earpiece optimizing the maximum stable gain. In Proc. ICASSP, pages 341–345, 2017. [41] R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3):276–280, 1986. [42] M. Souden, J. Benesty, and S. Affes. On optimal frequency-domain multichannel linear filtering for noise reduction. IEEE Transactions on audio, speech, and language processing, 18(2):260–276, 2009. [43] M. Souden, J. Chen, J. Benesty, and S. Affes. An integrated solution for online multichannel noise tracking and reduction. IEEE Transactions on Audio, Speech, and Language Processing, 19(7):2159–2169, 2011. [44] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7):2125–2136, 2011. [45] J. Thiemann, N. Ito, and E. Vincent. The diverse environments multi-channel acoustic noise database (DEMAND): A database of multichannel environmental noise recordings. Proceedings of Meetings on Acoustics, 19(1):035081, 2013. [46] N. T. N. Tho, S. Zhao, and D. L. Jones. Robust DOA estimation of multiple speech sources. In Proc. ICASSP, pages 2287–2291, 2014. [47] W.-Y. Ting, S.-S. Wang, Y. Tsao, and B. Su. IANS: Intelligibility-aware null-steering beamforming for dual-microphone arrays. arXiv preprint arXiv:2307.04179, 2023. [48] H. L. Van Trees. Optimum array processing: Part IV of detection, estimation, and modulation theory. John Wiley & Sons, 2004. [49] A. Varga and H. J. M. Steeneken. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12(3):247–251, 1993. [50] M. Wang, C. Boeddeker, R. G. Dantas, and A. Seelan. PESQ (perceptual evaluation of speech quality) wrapper for python users, May 2022. [51] D. B. Ward and R. C. Williamson. Beamforming for a source located in the interior of a sensor array. In Proc. ISSPA, volume 2, pages 873–876, 1999. [52] R. E. Zezario, S.-W. Fu, F. Chen, C.-S. Fuh, H.-M. Wang, and Y. Tsao. Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:54–70, 2022. [53] R. E. Zezario, S.-W. Fu, C.-S. Fuh, Y. Tsao, and H.-M. Wang. STOI-Net: A deep learning based non-intrusive speech intelligibility assessment model. In Proc. APSIPA, pages 482–486, 2020.
|