Abstract
Objectives.
Both acoustic and aerodynamic analyses are essential to evaluate the phonetic characteristics of voice pathology. The purpose of the study is to determine the magnitude of their correlation with the different types of bilabial plosive consonants.
Methods.
A controlled prospective study of 35 patients diagnosed with unilateral vocal fold paralysis was performed. The sustained vowel /a/ and bilabial voiceless consonants were used. Three common acoustic parameters were measured from a sustained vowel /a/ and aerodynamic parameters from a set of syllables /pi/, /phi/, and /p’i/. We determined the correlation coefficients between acoustic and aerodynamic measurements for the bilabial plosive consonants /pi/, /phi/, and /p’i/.
Results.
The mean values of acoustic parameters were higher than the thresholds of pathology. The mean values of aerodynamic parameters varied according to the types of consonants. The correlation between acoustic and aerodynamic parameters was significantly larger with the consonant /phi/ compared with the consonants /p’/ and /p/. The magnitudes of correlation were higher with the consonant /phi/ compared with the consonants /p’/ and /p/.
Voice assessment is multidimensional and includes laryngeal examination, acoustic evaluation, aerodynamic measurements, perceptual analysis and self-evaluation of the patient in relation to the frequency of symptoms and the influence of disturbance on their daily life [1]. The objective measures of voice quality are probably affected by confusing variables such as signal recording conditions, hardware and software specifications, data capture and analysis, individual variability (acoustic and aerodynamic), as well as the severity and type of vocal disturbances [2]. Recent studies suggest that acoustic analysis of pathological voice is a viable technique for early detection of laryngeal pathology or for clinical assessment of vocal improvement following voice therapy [3,4]. Acoustic parameters may be extracted directly from speech or throat contact signals or indirectly from glottal or residual inverse filtered signals. The four common acoustic parameters are fundamental frequency, jitter, shimmer, and harmonicsto-noise ratio (HNR). The fundamental frequency is a function of the mass, elasticity, compliance and length of the vocal cords. It also depends partly on the subglottal pressure and the configuration of the vocal tract [5]. Jitter is computed as the mean difference between the periods of adjacent cycles divided by the mean period, and is thus, a fundamental frequency- related measurement. Shimmer involves a similar computation of peak-to-peak amplitudes. There is currently insufficient standardization of the optimal algorithm HNR and insufficient knowledge about normative values for widespread clinical use [6].
Laryngeal aerodynamics with repeated phonation of bilabial consonant /pipi/ is a subspecialty dealing with the airflow and pressure changes that are produced within the larynx. Using appropriate instrumentation, the measurement of aerodynamic parameters can provide information regarding vocal efficiency, although only a stopwatch is needed for specific measurements [7,8]. Subglottal pressure was estimated for the vowel /i/ by measuring and interpolating the peak intraoral pressure during repeated production of bilabial stop consonant /pi/. The technique is based on the assumption that the vocal folds are fully abducted before the release of the burst during the production of /p/. At that instant, the intraoral pressure is equivalent to the subglottal pressure. The peak airflow and air pressure was recorded for the middle portions of repeated productions of /pi/. They were obtained by using a linear regression function calculated from the calibration signals.
The plosive consonants in English include three groups of plosives depending on the articulation: bilabial(/p/), dental(/t/) and palatal(/k/), and are based on voice/voiceless distinction. Korean plosive consonants also have been classified into three groups according to the place of articulation. However, each group is divided into three different types according to the manner of articulation. It can thus be concluded that both languages strongly differ in articulatory and phonatory physiology. Therefore, it would be very interesting to study how Korean students of English produce Korean and English plosives cross-linguistically. In Korean, a three-way distinction in articulation serves to differentiate the three plosive consonant phonemes: /p/, the weak bilabial plosive; /ph/, the strong aspirated bilabial (Fig. 1); and /p’/, the unaspirated bilabial. All Korean plosives occurring in the initial position are voiceless [9,10]. The vocal folds during the /p/ consonant are moderately abducted at explosion and the /p’/ consonant is characterized by the completed adducted state and stiffening of the vocal folds. However, the /ph/ type is characterized by extensive abduction of the vocal folds (Fig. 2).
Voice production is essentially an aerodynamic phenomenon, whereby the glottis transforms aerodynamic power into acoustic power. The aerodynamic power in the glottis results in sustained vibration of the vocal folds. Laryngeal aerodynamic analysis involves the interaction of both respiratory and laryngeal functions, indicating valve efficiency of the glottis during phonation [8]. The purpose of the study is to present and discuss the magnitude of the correlations between acoustic measures and aerodynamic measures according to the types of plosive consonants. The study also aims at comprehensively investigating the disorders of phonation resulting from vocal cord paralysis.
A total of 35 subjects were diagnosed with unilateral vocal fold paralysis (UVFP) at our ENT Department. All patients had laryngeal imaging results following an otolaryngological study. We selected all patients with peripheral types of UVFP, not high vagal or central types of UVFP. The causes and duration of vocal fold paralysis were characteristically variable. The causes distributed as 13 cases of lung cancer; 12 cases of idiopathic type; seven cases of thyroid cancer and three cases of esophageal cancer. The positions of paralyzed vocal folds were variable as 21 cases of paramedian position, 10 cases of median position and four cases of intermediate position. The gender distribution of voices involved 40% male (n=14) and 60% female (n=21). The average age of the patients was 47.6 years, with a range of 17 to 76 years. They underwent voice evaluation, including stroboscopic examination, acoustic analysis, aerodynamic measurement and perceptual analysis. All participants received the same voice evaluation at the same voice laboratory and all recordings were made by the same experienced examiner. The recordings lasted about 10 to 15 minutes for each subject and were conducted in a sound-proof booth at the Speech and Language Laboratory in our speech center. The recording session consisted of two blocks of tasks. The first block entailed acoustic recording, and the second block involved assessment of aerodynamic recording. Following the two tasks, the acoustic recording data were transmitted to a professional speech-language pathologist to rate all voice samples in a blinded manner.
During the acoustic analysis, the subjects were fitted with a head-mounted microphone. A microphone was placed 5 cm from the mouth. Subjects were asked to phonate and sustain the modeled steady vowel /a/ for at least 3 seconds into a microphone. They were asked to phonate at their most comfortable pitch and loudness, and were allowed one to two trials before actual recording. Digitally recorded data was transferred to a computer at the sampling frequency of 44 100 Hz to facilitate the analysis using a Multidimensional Voice Program (MDVP) from Kay Elemetrics (Lincoln Park, NJ, USA). The sound was digitalized and its duration was determined by placing cursors at the beginning and end of the segment lasting 2 seconds, which was at the most stable signal graph. Four common acoustic parameters were analyzed and calculated. They consisted of fundamental frequencies, jitter (relative average perturbation), shimmer (amplitude perturbation quotient), and HNR. The voice samples of the 35 segments were evaluated and the results confirmed on the survey questionnaire for each subject.
During the aerodynamic analysis, the subjects were seated comfortably. They were instructed to hold a mask firmly against the face, with the nose and mouth covered using the intraoral tube placed between the lips and above the tongue. The examiner confirmed the correct placement of the transducer and ensured that the mask fitted firmly. The examiner explained the procedure clearly and demonstrated a continuous and repeated pronunciation of Korean consonants /p/, /ph/, and /p’/. The subjects were asked to continuously repeat the consonant-vowel syllables /pipi/, /phiphi/, and /p’ip’i/ three to four times into the circumvented mask. They were asked to generate a comfortable pitch and loudness to obtain three to four stable peaks of intraoral pressure, and pause for a second after each segment. The rate and style of production was validated by the examiner. The actual recording was preceded by one or two practice runs on the monitor screen. Digitally recorded data were transferred to a computer for analysis using the Aerophone II software (Model 6800; Kay Elemetrics). The most stable segment was used following the completion of three to four segments. The recorded waveform of each stop consonant /p/, /ph/, and /p’/ was “zoomed in” to a maximum precision and rounded to the nearest 0.5 ms. The recorded waveform was analyzed by placing the cursors on two points. The right cursor was set at the first upward-going zero crossing which signaled voicing onset. The left cursor was set at the sharp increase in wave form energy, which signaled the release of each of the stop consonants /p/, /ph/, and /p’/. The software analyzed the waveform and provided the values of maximum flow rate (MFR), mean airflow rate (MAR), mean sound pressure level (MS), mean air pressure (MAP), mean power (MP) and mean efficiency. The typical procedure is shown in Fig. 1.
The means and standard deviations of acoustic parameters for the vowel /a/ and aerodynamic parameters for the stops were evaluated. Results of the Pearson’s correlation coefficient between acoustic parameters and aerodynamic measurements of bilabial stop consonants, /p/, /ph/, and /p’/, were also evaluated.
The mean values of jitter, shimmer and HNR were 3.736%, 5.198%, and 0.157 dB, respectively, which were higher than the thresholds of pathology (Table 1). According to MDVP, the threshold of pathology was ≤1.04% for jitter and ≤3.81% for shimmer, which represent signs of potential pathology. The mean values of each objective parameter for voice efficiency of aerodynamic parameters of 35 subjects for the consonants /p/, /ph/, /p’/ are shown in Table 2. We found that the mean values for MFR, MAR, MS, and MAP of consonant /ph/ were higher than that of consonant /p/ and /p’/. However, the MP and mean efficiency were higher in the consonant /p/ and /p’/ than that of the other consonants.
The Pearson correlation coefficients between acoustic parameters and aerodynamic measurements of consonant are shown in Table 3. For the acoustic and aerodynamic parameters of consonant /pi/, we found significant correlations of jitter with MAP (r=0.340), of shimmer with MFR (r=0.368) and MS (r=0.439), and of HNR with MFR (r=0.512), MS (r=0.415), and MAP (r=0.414). In the stop consonant /phi/, we found mostly significant correlations (P<0.01) among acoustic and aerodynamic parameters. Jitter was strongly correlated with MFR (r=0.512), MAR (r=0.592), MS (r=0.453), and MP (r=0.716). Shimmer was strongly correlated with MFR (r=0.452), MAR (r=0.578), MS (r=0.519), and MP (r=0.669). HNR showed also showed strong correlations with MFR (r=0.501), MAR (r=0.573), MS (r=0.466) and MP (r=0.711). In the consonant /p’i/, Jitter was correlated with MFR (r=0.451), MAP (r=0.429) and MP (r=0.516). Shimmer was correlated with MFR (r=0.490), MAR (r=0.344), MS (r=0.505), and MP (r=0.503). HNR also correlated with MFR (r=0.477), MAP (r=0.466), and MP (r=0.520).
As shown in the schematic diagram illustrating the correlation between acoustic and aerodynamic parameters (Fig. 3), we found a significant correlation with the consonant /phi/ compared with the consonants /p’/ and /p/. We also found higher magnitudes of correlation coefficient with the consonant /phi/ compared with the consonants /p’/ and /p/.
Vocal fold paralysis limits the movements of vocal fold necessary for vocal function and alters the physiology underlying the generation of plosive consonants. Vocal fold paralysis produces a glottal gap during phonation both at the membranous and cartilaginous glottis. Paralytic dysphonia is usually characterized by a weak breath and hoarseness, with an increase in the fundamental frequency, and decrease in maximum phonation duration [11-13]. An abnormally high level of physical energy may be required to compensate for phonatory ineffectiveness. During a paralytic falsetto, the voice is characterized by an abnormally high fundamental speaking frequency as the mobile vocal cord over-compensates while closing the chink, to eliminate breathiness, but it is stretched tightly resulting in a vibrational frequency higher than normal [12].
Acoustic data allows objective and noninvasive measures of vocal cord behavior [14]. Jitter, which is related to the absolute difference between durations of consecutive cycles, is higher in vocal cord paralysis. These higher values may be attributed to the vocal cord asymmetry following vocal cord paralysis and the resulting vibrational irregularities in frequency, which alters the jitter values. Similarly shimmer, which is related to the absolute difference between the amplitudes of consecutive cycles, is also higher in subjects with vocal cord paralysis. The higher value of shimmer may be due to the asymmetry caused by vocal cord paralysis leading to vibrational irregularities in amplitude and is also increased by a poor and inconsistent contact between vocal cords, which is frequently observed in vocal cord paralysis. The HNR is obtained from the ratio between the harmonic and noise components of the signal. In patients with vocal cord paralysis, who have higher relative noise amplitude than the normal subjects during phonation, diminished HNR values are observed in pathologic cases.
The study limitations relate to the lack of gender-based classification of the subjects compared with normal subjects. However, our study focused on the correlations between acoustic and aerodynamic measures in patients with unilateral vocal cord paralysis.
A multi-factorial analysis plays a critical role in understanding the laryngeal function and voice quality [15]. Changes in the acoustic features of the speech waveform are also associated with physiological changes in vibratory behavior of the vocal cords, and are often related to aerodynamic changes. The aerodynamic studies facilitate phonetic characterization of voice disorders. Air flow rate and air pressure represent laryngeal aerodynamic parameters for the measurement of voice evaluation. Air flow is the volume of air across the vocal folds during the phonation in 1 second, and air pressure is the amount of pressure exerted on the vocal folds during adduction [16]. Subglottal pressure is estimated by measuring the intraoral air pressure during the repeated pronunciation of /pipi/ syllables. A thin catheter is introduced into the mouth via the labial commissure. In the absence of closure of the vocal folds during the production of a voiceless consonant, the intraoral air pressure is similar to the pressure elsewhere in the respiratory tract. Thus, the pressure behind the lips reflects the pressure available to drive the vocal folds during vibration [15,16]. Vocal efficiency refers to the ratio of acoustic power to aerodynamic power and is calculated by dividing the acoustic intensity and the air pressure. Laryngeal airway resistance is the quotient of peak intraoral pressure divided by the peak flow rate, and reflects the overall resistance (hyperfunctional, hypofunctional or normal) of the glottis. Laryngeal efficiency is the converse of laryngeal aerodynamic resistance and reflects the conductance for airflow at the level of glottis.
Both acoustic and aerodynamic analyses are essential to evaluate the phonetic characteristics of voice pathology and their correlation reflects underlying voice pathology. Yu et al. [14] investigated a multiparametric protocol including acoustic and aerodynamic measurements for objective voice analysis in dysphonic patients. They reported that oral air flow and jitter might have been redundant in their study. The oral air flow and jitter have been widely used for the evaluation of pathological voice and correlated with the intensity of dysphonia [17]. In this study, we compared the relationship between acoustic parameters and aerodynamic measurements for the patients with UVFP, because the aerodynamic characteristics of UVFP is more typical rather than as are seen with the other vocal pathologies. We found significant correlation between aerodynamic variables involving the three consonant types and common acoustic parameters, particularly shimmer, HNR, and jitter. The correlation coefficients between acoustic and aerodynamic parameters of bilabial plosive consonant /pi/ showed a significant correlation of jitter with MAP, shimmer with MFR and MS, and HNR with MFR, MS, and MAP. Significant correlation existed between common acoustic parameters and MFR, volume, MAR, MS, and MP for bilabial plosive consonant /phi/.
In English, the bilabial plosive consonant /p/ shows a voice/ voiceless distinction. However, a three-way distinction exists in the articulation of plosive consonants in the laryngeal speech in Koreans: the unaspirated /p’/, slightly aspirated /p/, and strongly aspirated /ph/ consonants. The /p’/ consonant is characterized by the completed adducted state and stiffening of the vocal folds, and the abrupt decrease in stiffness near voice onset at explosion. The /ph/ type is characterized by extensive abduction of the vocal folds and the heightened subglottal pressure at explosion (Fig. 2). The vocal folds during the /p/ consonant are moderately abducted at explosion [9,10,18]. However, subjects with vocal fold paralysis cannot adduct due to immobility of both vocal folds. Hong et al. [10] investigated the phonetic characteristics of patients with bilateral vocal fold paralysis without tracheotomy. They found that the mean SPL, peak air flow and mean air flow of the /ph/ consonant were higher than those of /p/ and /p’/. However, the mean value of maximum SPL for /p’/ was higher than that of /ph/ and /p/ consonants. In this study, we observed that the mean values of MFR, MAR, MS, and MAP involving the stop consonant /ph/ were higher than that of plosive consonant /p/ and /p’/. Actually, the pathophysiology that /phi/ shows higher values of aerodynamic analysis which could not be exactly explained. We suggest that the laryngeal dynamics for the production of Korean /p/ is more similar to the English voiced /b/, even though the Korean /p/ is shown in theory to be voiceless. Notably, the Korean /p/ is lax and characterized as a slightly aspirated consonant. Accordingly some normal speaking people produce the Korean /p/ sound as voiced /b/ or voiceless /p/, or sometimes as a produce mixed nature. However, in general the Korean /ph/ is a definite aspirated consonant such as with the English voiceless /p/. The aerodynamic features of the Korean /ph/ are distinctly different to the slightly aspirated Korean /p/. Significantly, the Korean /p’/ is produced with a completed adducted state at explosion, such as with the voice /b/.
In summary, the Korean /p/ consonant is generally a setting for aerodynamic analysis. We analyzed the values with a significant correlation between the acoustic and aerodynamic parameters, according to the types of plosive consonant. The magnitudes of correlation between acoustic and aerodynamic parameters were higher with the stop consonant /phi/ compared with the stop consonants /p’/ and /p/. The stop consonant /phi/ was more reliable for aerodynamic evaluation during the production of bilabial stop consonant than the consonants /p/ or /p’/, especially in the UVFP.
Our study elucidated various parameters of acoustic and aerodynamic analysis in patients with vocal fold paralysis because the aerodynamic characteristics of UVFP is more typical rather than as are seen with the other vocal pathologies. The correlation coefficients between acoustic and aerodynamic measurements were evaluated for the bilabial consonants /pi/, /phi/, and /p’i/. The plosive consonant /phi/ might represent a more desirable and essential investigative consonant than /p/ or /p’/ in the aerodynamic analysis of voice pathology. These results provide a highly specific, comprehensive and objective voice evaluation in patients with vocal fold paralysis in the clinical setting. However, we failed to compare all the acoustic and aerodynamic parameters, and merely focused on common acoustic and aerodynamic parameters.
▪ The purpose of the study is to present and to discuss the magnitude of the correlations between acoustic measures and aerodynamic measures according to the types of plosive consonants.
▪ The correlation between acoustic and aerodynamic parameters was significantly larger with the plosive consonant /ph i/ compared with the stop consonants /p’/ and /p/. The magnitudes of correlation were higher with the plosive consonant /ph i/ compared with the consonants /p’/ and /p/.
▪ The consonant /ph i/ may represent a more valuable investigative consonant than the consonants /p/ or /p’/ for aerodynamic analysis of voice pathology, especially in the unilateral vocal fold paralysis.
REFERENCES
1. Lopes LW, Cabral GF, Figueiredo de Almeida AA. Vocal tract discomfort symptoms in patients with different voice disorders. J Voice. 2015; May. 29(3):317–23.
2. Vaz Freitas S, Melo Pestana P, Almeida V, Ferreira A. Integrating voice evaluation: correlation between acoustic and audio-perceptual measures. J Voice. 2015; May. 29(3):390.
3. Patel R, Parsram KS. Acoustic analysis of subjects with vocal cord paralysis. Indian J Otolaryngol Head Neck Surg. 2005; Jan. 57(1):48–51.
4. Piccirillo JF, Painter C, Fuller D, Fredrickson JM. Multivariate analysis of objective vocal function. Ann Otol Rhinol Laryngol. 1998; Feb. 107(2):107–12.
5. Davis SB. Acoustic characteristics of normal and pathological voices. Speech Lang. 1979; 1:271–335.
6. Dejonckere PH. Assessment of voice and respiratory function. In : Remacle M, Eckel H, editors. Surgery of larynx and trachea. Berlin, Heidelberg: Springer;2009. p. 11–26.
7. Rosenthal AL, Lowell SY, Colton RH. Aerodynamic and acoustic features of vocal effort. J Voice. 2014; Mar. 28(2):144–53.
9. Hong KH, Kim HK, Niimi S. Laryngeal gestures during stop production using high-speed digital images. J Voice. 2002; Jun. 16(2):207–14.
10. Hong YT, Park MJ, Shin YJ, Minh PH, Hong KH. The phonetic characteristics in patients of bilateral vocal fold paralysis without tracheotomy. Clin Exp Otorhinolaryngol. 2017; Sep. 10(3):272–7.
11. Hartl DM, Hans S, Vaissiere J, Riquet M, Brasnu DF. Objective voice quality analysis before and after onset of unilateral vocal fold paralysis. J Voice. 2001; Sep. 15(3):351–61.
12. Dedo HH. Injection and removal of Teflon for unilateral vocal cord paralysis. Ann Otol Rhinol Laryngol. 1992; Jan. 101(1):81–6.
13. Oguz H, Demirci M, Safak MA, Arslan N, Islam A, Kargin S. Effects of unilateral vocal cord paralysis on objective voice measures obtained by Praat. Eur Arch Otorhinolaryngol. 2007; Mar. 264(3):257–61.
14. Yu P, Ouaknine M, Revis J, Giovanni A. Objective voice analysis for dysphonic patients: a multiparametric protocol including acoustic and aerodynamic measurements. J Voice. 2001; Dec. 15(4):529–42.
15. Hirano M. Objective evaluation of the human voice: clinical aspects. Folia Phoniatr (Basel). 1989; 41(2-3):89–144.
16. Schutte HK. Aerodynamics of phonation. Acta Otorhinolaryngol Belg. 1986; 40(2):344–57.
Table 1.
Variable | Mean±SD (range) |
---|---|
Jitter (%) | 3.736±5.346 (0.541–32.591) |
Shimmer (%) | 5.198±5.141 (1.645–24.599) |
HNR (dB) | 0.157±0.132 (0.067–0.825) |