Language Training System Using Speech Processing Techniques

Master Thesis (1996) :

Language Training System Using Speech Processing Techniques

This work describes a Computer Assisted Language Learning (CALL) system which uses speech as the main medium of interaction with the students, thus enabling the students to improve their foreign language speaking ability.

Studies showed that for students of foreign language, with intermrdiate or high language profficiency, it is more important to correct prosodic errors than segmental errors. Therefore, the system concentrates on intonation teaching.

Rather than presenting pre-recorded audio feedback to the students, the system produces audio feedback according to the students' speech, trying to help the students hear the difference between their speech and the speech of a native speaker. To do this, several speech processing techniques are used: speech recognition, speech modification and speech synthesis. In addition, the system also performs analysis which is needed in order to detect significant errors in the input speech, and to decide what would be the effective way to demonstrate this to the students.

In particular, a novel method, using intonation modeling (based on the Fujisaki superpositional intonation model) was introduced.

In this figure, the outline of the system is given.

Speech processing techniques used in the first block (especially in recognition related problems) have been extensively studied and experimented with. However, currently the performance progress of these methods seems to be limited. Rather than trying to offer improvements to this aspect, this work concentrates on finding a useful way of utilizing the current performance level. Although these methods are fairly standard, work was needed to adjust these methods to work with each other (e.g. a method used to combine DTW and HMM results to achieve increased segmentation reliability), and to the special working conditions of a foreign language speech training system. For example, the system has to cope with irregular pronunciations, which can not be completely predicted in advance. This requires the use of a linguistic processing unit before standard recognition can take place.

Most of this work focused on the error detection and the feedback blocks. In the error detection block, several methods of detecting significant pronunciation errors were experimented with, including a new method based on the use of a super-positional intonation model for error detection, as well as more standard methods, such as looking for significant differences in the raw or stylized pitch contours of the student's speech.

In the feedback block, high quality speech production was attempted using time and frequency domain PSOLA based methods, as well as speech synthesis. Effective feedback strategies were studied and developed to help students hear and understand where they made mistakes, what these mistakes were, and how to correct them.

Master Thesis - "Language training system using speech processing techniques" (660kB)
ICSLP96 paper - Outlines the content of the thesis (32kB)

A demo of the system :

General explanation of the demo - Installing and running (43kB)
The SMALT package (1100kB) Runs on Linux (ELF), X windows, with a SoundBlaster 16 board.
Solaris version - Runs on Sun (Solaris 2.3), X windows, sound capability. (Incomplete) (1300kB)
Back to home page