Speech is the most natural means of communication among humans. It also plays a critical role in enhancing human-machine communication. In this course, we attempt to cover all fundamental aspects of digital speech processing, including both theoretical and practical topics, starting with the acoustics of speech sounds, followed by speech analysis and parameter extraction, speech modeling, theory of linear prediction and hidden Markov models. Finally speech applications, including speech coding, synthesis, recognition and verification, will also be introduced. The linkage to acoustics and language processing will also be discussed, including topics on language modeling and microphone arrays. MATLAB demos will be used in class for illustration. Some homework exercises will also be provided for after-class learning.Course Outline:
Speech Communications and Acoustics of Speech Sounds (2.5 hour)
Digital Speech Processing: Time and Frequency Domains (2.5 hours)
Modeling of Speech: Linear Prediction and Speech Parametrization (2.5 hours)
Speech Applications: Coding, Synthesis, Recognition and Verification (2.5 hours)
This short course is intended for researchers, engineers and professionals who are starting speech-related work and interested in more basic knowledge in digital speech processing, or those who are already involved in speech technology development and would like to learn more fundamentals. The course is designed with a broad coverage of all areas related to digital speech processing with linkages to language and acoustics.Textbook:
(1) L. R. Rabiner and R. W. Schafer, Theory and Applications of Digital Speech Processing, Prentice Hall, 2010.Supplement:
(2) C. Manning & I. Shutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999.Readings:
(3) C. Cherry, On Human Communications, MIT Press, 1968.
(4) D. G. Stork (ed.), HAL's Legacy, MIT Press, 1997.
Download wavesurfer from: http://www.speech.kth.se/wavesurfer/download.html