Hostname: page-component-7c8c6479df-xxrs7 Total loading time: 0 Render date: 2024-03-27T12:48:26.376Z Has data issue: false hasContentIssue false

A statistical method of evaluating the pronunciation proficiency/intelligibility of English presentations by Japanese speakers

Published online by Cambridge University Press:  23 May 2014

Hiroshi Kibishi
Affiliation:
Toyohashi University of Technology, Computer Science and Engineering, Japan (kibishi@slp.cs.tut.ac.jp)
Kuniaki Hirabayashi
Affiliation:
Toyohashi University of Technology, Computer Science and Engineering, Japan (kuniaki@slp.cs.tut.ac.jp)
Seiichi Nakagawa
Affiliation:
Toyohashi University of Technology, Computer Science and Engineering, Japan (nakagawa@slp.cs.tut.ac.jp)

Abstract

In this paper, we propose a statistical evaluation method of pronunciation proficiency and intelligibility for presentations made in English by native Japanese speakers. We statistically analyzed the actual utterances of speakers to find combinations of acoustic and linguistic features with high correlation between the scores estimated by the system and native English teachers. Our results showed that the best combination of acoustic features produced correlation coefficients of 0.929 and 0.753 for pronunciation and intelligibility scores, respectively, using open data for speakers at the 10-sentence level. In an offline test, we evaluated possibly-confusing pairs of phonemes that are often mispronounced by Japanese speakers of English. In addition, we developed an online real-time score estimation system for Japanese learners of English using offline techniques to evaluate the pronunciation and intelligibility scores in real-time with almost the same ability as English teachers. Finally, we show that both the objective and subjective evaluations improved after learning with our system.

Type
Research Article
Copyright
Copyright © European Association for Computer Assisted Language Learning 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Acoustical Society of America SII. Speech Intelligibility Index. http://www.sii.to/index.htmlGoogle Scholar
Aist, G. (1999) Speech recognition in computer-assisted language learning. In: Cameron, K. (ed.), Computer Assisted Language learning; Media, Design and applications. Lisse, The Netherlands: Swets & Zeitlinger, 165181.Google Scholar
ATR Institute of Human Information. (2000) Full version Scientific Progress Method for English speaking. Tokyo, Japan: Kodansha.Google Scholar
ATR. (1999) Full version Scientific Progress Method for English Speaking. Tokyo, Japan: Kodansha.Google Scholar
Cucchiarini, C., Strik, H. and Bovels, L. (2000) Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication, 30(2–3): 109119.Google Scholar
Eskenazi, M., Kennedy, A., Ketchum, C., Olszewski, R. and Pelton, G. (2007) The native accent pronunciation tutor: measuring success in the real world. Proceedings of SIG-SlaTE. Baixas, France: ISCA, 124127.Google Scholar
Falk, T. H., Chan, W. and Shein, F. (2012) Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54(5): 622631.Google Scholar
Franco, H., Neumeyer, L., Kim, Y. and Ronen, O. (1997) Automatic pronunciation scoring for language instruction. Proceedings of ICASSP. New York: IEEE, 14711474.Google Scholar
Fujisawa, Y., Minematsu, N. and Nakagawa, S. (1998) Evaluation of Japanese manners of generation word accent of English based on a stressed syllable detection technique. Proceedings of ICSLP. Baixas, France: ISCA, 31033106.Google Scholar
Garofalo, J. D., Graff, D. Paul, and Pallett, D. (2007) CSR-I (WSJ0) Complete Linguistic Data Consortium. Philadelphia, USA: LDC.Google Scholar
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L. and Zue, V. (1993) TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium. Philadelphia, USA: LDC.Google Scholar
Grant, T. (2008) Tactics for TOEIC Listering and Reading Test Student Book. Oxford, UK: Oxford University Press.Google Scholar
Hirabayashi, K. and Nakagawa, S. (2010) Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques. Proceedings of Interspeech. Baixas, France: ISCA, 598601.CrossRefGoogle Scholar
Holliday, J. J., Beckman, M. E. and Mays, C. (2010) Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children. Proceedings of Interspeech. Baixas, France: ISCA, 18861889.Google Scholar
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T. and Shikano, K. (1999) JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. Journal of the Acoustical Society of Japan (E), 20(3): 199206.Google Scholar
Karafiat, M., Janda, M., Cernocky, J. and Burget, L. (2012) Region dependent linear transforms in multilingual speech recognition. Proceedings of ICASSP. New York: IEEE, 48854888.Google Scholar
Kawahara, T. and Minematsu, N. (2011) Tutorial on Computer-assisted language learning (CALL) based on speech technologies. Proceedings of APSIPA Tutorial session. Hong Kong: APSIPA.Google Scholar
Kibishi, H. and Nakagawa, S. (2011) New feature parameters for pronunciation evaluation in English presentations at international conferences. Proceedings of Interspeech. Baixas, France: ISCA, 11491152.Google Scholar
Kibishi, H., Hirabayashi, K. and Nakagawa, S. (2012) Development of Online Evaluation System of English Pronunciation Score/Intelligibility for Japanese. Proceedings of Acoustical Society of Japan (in Japanese), Tokyo, Japan: ASJ, 499–502.Google Scholar
Kobayashi, T., Itahashi, S., Hayamizu, S. and Takezawa, T. (1992) ASJ continuous speech corpus for research. Journal of the Acoustical Society of Japan (J) (in Japanese), 48(12): 888893.Google Scholar
Koniaris, C. and Engwall, O. (2011) Perceptual differentiation modeling explains phoneme mispronunciation by non-native speakers. Proceedings of ICASSP. New York: IEEE, 57045707.Google Scholar
Li, H, Wang, S., Liang, J., Huang, S. and Xu, B. (2009) High performance automatic mispronunciation detection method based on neural network and TRAP features. Proceedings of Interspeech. Baixas, France: ISCA, 19111914.Google Scholar
Minematsu, N., Tomiyama, Y., Yoshimoto, K., Shimizu, K., Nakagawa, S., Dantsuji, M. and Makino, S. (2002) English Speech Database Read by Japanese Learners for CALL System Development. Proceedings of of the International Conference on Language Resources and Evaluation (LREC 2002) Paris, France: ERLA, 896–903.Google Scholar
Nakagawa, S., Reyes, A. A., Suzuki, H. and Taniguchi, Y. (1997) An English conversation and pronunciaiton CAI system using speech recognition technology. Proceedings of Eurospeech. Baixas, France: ISCA, 705708.Google Scholar
Nakagawa, S. and Ohta, K. (2007) A statistical method of evaluating pronunciation proficiency for presentation in English. Proceedings of Interspeech. Baixas, France: ISCA, 23172320.Google Scholar
Nakagawa, S., Reyes, A., Suzuki, A., Reyes, H., Allen, A. and Taniguchi, Y. (1997) An English conversation CAI system using speech recognition technology, (in Japanese). Trans. Information Processing Society in Japan, 38(8): 16491658.Google Scholar
Nakamura, N., Nakagawa, S. and Mori, K. (2004) A statistical method of evaluating pronunciation proficiency for English works spoken by Japanese. IEICE Trans. Information and Systems, E87–D(7): 19171922.Google Scholar
Neri, A., Cucchiarini, C. and Strik, H. (2008) The effectiveness of computer-based speech corrective feedback for improving segmental quality in L2 Dutch. ReCall, 20(2): 225243.Google Scholar
Neumeyer, L., Franco, H., Weintraub, M. and Price, P. (1996) Automatic text-independent pronunciation scoring of foreign language student speech. Proceedings of ICSLP. Baixas, France: ISCA, 14571460.Google Scholar
Ohta, K. and Nakagawa, S. (2005) A statistical method of evaluating pronunciation proficiency for Japanese words. Proceedings of Interspeech. Baixas, France: ISCA, 22332236.Google Scholar
Ramos, M., Franco, H., Neumeyer, L. and Bratt, H. (1999) Automatic detection of phone-level mispronunciation for language learning. Proceedings of EuroSpeech. Baixas, France: ISCA, 851854.Google Scholar
Ronen, O., Neumeyer, L. and Franco, H. (1997) Automatic detection of mispronunciation for language instruction. Proceedings of Eurospeech. Baixas, France: ISCA, 645648.Google Scholar
Smit, P. and Kurimo, M. (2011) Using stacked transformations for recognizng foreign accented speech. Proceedings of IEEE. New York: IEEE, 50085111.Google Scholar
Stenson, N., Downing, B., Smith, J. and Smith, K. (1992) The effectiveness of computer-assisted pronunciation training. CALICO Journal, 9(4): 519.Google Scholar
Tsubota, Y., Kawahara, T. and Dantsuji, M. (2002) Recognition and verification of English by Japanese students for computer-assisted language learning system. Proceedings of ICSLP. Baixas, France: ISCA, 12051208.Google Scholar
Wang, Y.-B. and Lee, L.-S. (2012) Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training. Proceedings of ICASSP. New York: IEEE, 50495052.Google Scholar
Witt, S. and Young, S. (1999) Computer-Assisted pronunciation teaching based on automatic speech recognition. In: Jager, S., Nerbonne, J. and Essen, A. V. (eds.), Language Teaching and Language Technology. Lisse, The Netherlands: Swets & Zeitlinger, 2535.Google Scholar
Wu, C., Su, H. and Liu, C. (2012) Efficient personalized mispronunciation detection of Taiwanese-accented English speech based on unsupervised model adaptation and dynamic sentence selection. Computer Assisted Language Learning, 23(5): 446467.Google Scholar
Yoon, S.-Y, Hasegawa-Johnson, M. and Sproat, R. (2009) Automated pronunciation scoring using confidence scoring and landmark-based SVM. Proceedings of Interspeech. Baixas, France: ISCA, 19031906.Google Scholar
Young, S. and Witt, S. (1999) Offline acoustic modeling of nonnative accents. Proceedings of Eurospeech. Baixas, France: ISCA, 13671370.Google Scholar
Zhao, Y. and He, X. (2001) Model complexity optimization for nonnative English speakers. Proceedings of Eurospeech. Baixas, France: ISCA, 14611463.Google Scholar