Cover Image

Exploiting Wavelet and Prosody-related Features for the Detection of Voice Disorders

Upal Mahbub, Celia Shahnaz


An approach for the detection of voice disorders exploiting wavelet and prosody-related properties of speech is presented in this paper. Based on the normalized energy contents of the Discrete Wavelet Transform (DWT) coefficients over all voice frames, several statistical measures are first determined. Then, the idea of some prosody-related voice properties, such as mean pitch, jitter and shimmer are utilized to compute similar statistical measures over all the frames. A set of statistical measures of the normalized energy contents of the DWT coefficients is combined with a set of statistical measures of the extracted prosody-related voice properties in order to form a feature vector to be used in both training and testing phases. Two categories of voice samples namely, healthy and disordered are considered here thus formulating the problem in the proposed method as a two-class problem to be solved. Finally, an Euclidean Distance based classifier is used to handle the feature vector for the purpose of detecting the disordered voice. A number of simulations is carried out and it is shown that the statistical analysis based on wavelet and prosody-related properties can effectively detect a variety of voice disorders from the mixture of healthy and disordered voices.


Voice Disorder; Wavelet Transform; Pitch; Jitter; Shimmer; Statistical Measures


D. O’Shaughnessy, Speech Communications: Human and Machine, 2nd ed. 1em plus 0.5em minus 0.4em NY: IEEE Press, 2000

Titze IR. Principle of Voice production. 1em plus 0.5em minus 0.4em NJ: Prentice Hall, 1994

Boyanov B and Hadjitodorov S. Acoustic analysis of pathological voices: a voice analysis system for screening of laryngeal diseases. IEEE Engineering in Medical and Biology. 1997, 16:74-82

Mahmoudi Z, Rahati S, Ghasemi MM, Asadpour V, Tayarani H, and Rajati M. Classification of voice disorder in children with cochlear implantation and hearing aid using multiple classifier fusion. BioMedical Engineering. 2011. [Online]. Available:

Umapathy K, Krishnan S, Parsa V, and Jamieson DG. Discrimination of pathological voices using time-frequency approach. IEEE Trans. Biomedical Engineering. 2005, 52:421-430

Mallat S. A theory for multiresolution signal decomposition: Wavelet representation. IEEE Trans. Pattern Analysis and Machine Intelligmce. 1989, 11:674-693

Shahnaz C, Zhu WP, and Ahmad MO. Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme. IEEE Trans. Audio, Speech, Language Processing. 2012, 12:322-335

Ozdas A, Shiavi RG, Silverman SE, Silverman MK, and Wilkes DM. Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk. IEEE Transactions On Biomedical Engineering. 2004, 51:1530-1540

Texas voice centre, diseases of the larynx. [Online]. Available:

Shahnaz C, Fattah S, Mahbub U, Zhu WP, and Ahmad M. Detection of voice disorders based on wavelet and prosody-related properties. IEEE International Symposium on Circuits and Systems (ISCAS). 2012, 1030-1033

Moore E, Clements MA, Peifer JW, and Weisser L. Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Transactions On Biomedical Engineering. 2008, 55:96-107

Full Text: PDF


  • There are currently no refbacks.

AJBET Copyright © 2012-2017. All rights reserved. Published by Ivy Union Publishing, 3204 Valley Rush Dr, Apex, North Carolina 27502, United States