voice.kemt.fei.tuke.sk/eng

Funding: COST°277

Duration: 2001 - 2005

Co-ordinator:

Doc. Ing. Jozef Juhár, CSc.

The main objective of the action is to improve the voice services in telecommunication systems, through the development of new nonlinear speech processing techniques. The new technologies developed within the Action are to provide: higher quality speech synthesis, more efficient speech coding, improved speech recognition, and improved speaker identification and verification. The methods are expected: to contribute significantly to the acceptance of voice interfaces for information systems such as the mobile Internet (by improved synthesis and recognition) and to improve efficiency in future generations of speech coders used in wireless networks, including packet-based wireless networks.

The Action will achieve the stated advances with the following research strategies:

1. Speech Coding.
2. Speech Synthesis.
3. Speaker Identification and Verification.
4. Speech Recognition.

The main objectives for KEMT group members:

- In speech coding, it is possible to obtain good results using models based on linear predictive coding, since the residual can be coded with sufficient accuracy, given a high enough bit rate. However, it is also evident that some of the best results in terms of optimising both quality and bit rate are obtained from codec structures that contain some form of nonlinearity.
- In speech synthesis, action will focus on new techniques for the speech signal generation stage in a speech synthesizer based on concepts from nonlinear dynamical theory.
- The aim of next part of the project is to perform a set of experiments designed in the above view for a variety of speech sounds. Results of acoustic and perceptual analyses carried out on both consonants and vowels will be performed. The output of these analyses are the acoustic and perceptual attributes of classes of sounds.