|Year : 2014 | Volume
| Issue : 1 | Page : 6-11
Comparison of esophageal and tracheoesophageal speech modes in dual-mode alaryngeal speakers
Santosh Maruthy1, Marie Karla Mallet2, Rajashekhar Bellur2
1 Department of Speech Language Sciences, All India Institute of Speech and Hearing, Manasagangothri, Mysore, Karnataka, India
2 Department of Speech and Hearing, Manipal College of Allied Health Sciences, Manipal University, Karnataka, India
|Date of Web Publication||22-Sep-2014|
All India Institute of Speech and Hearing, Naimisham Campus, Manasagangothri, Mysore - 570006, Karnataka
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Objective: The main purpose of this study was to compare different speech related parameters in dual-mode esophageal and tracheoesophageal speakers. A second purpose was to compare the speech characteristics of these (tracheo) esophageal speakers with age- and gender matched controls. Materials and Methods: Four male laryngectomees who were proficient esophageal and tracheoesophageal speakers provided audio recordings of sustained vowels and connected speech using both alaryngeal methods. The participants from the control group also followed the same procedure. From the recorded samples, fundamental frequency (F0), maximum phonation duration (MPD), formant frequencies, and speech rate related parameters were extracted. Results: Although there was no statistically significant difference between the two alaryngeal modes for any of the measured parameters, the absolute fundamental frequency and MPD values were higher in TE mode. However, when compared to controls, both the alaryngeal modes depicted significantly shorter MPD values, higher first formant frequency values, slower speech rate, and higher frequency of pauses. Conclusion: The results suggest that most group differences found between esophageal and tracheoesophageal speech in the past may be due to large inter-subject variability, and that within speakers, similarity is more between esophageal and tracheoesophageal speech than with laryngeal speech. These results have implications for understanding the pseudoglottic voice mechanism.
Keywords: Acoustic analysis, esophageal speech, formant frequencies, laryngectomy, maximum phonation duration, tracheoesophageal speech
|How to cite this article:|
Maruthy S, Mallet MK, Bellur R. Comparison of esophageal and tracheoesophageal speech modes in dual-mode alaryngeal speakers. J Laryngol Voice 2014;4:6-11
| Introduction|| |
Total laryngectomy necessitates the use of alternate modes for voice production. Over the years, the three most commonly used alaryngeal methods are Esophageal Speech (ES), Electrolarynx, and Tracheoesophageal Speech (TE).These three methods differ in their source for voice production, and the way voice is produced. Electrolarynx makes use of artificial mechanical vibrator, which is placed against the neck and the person is taught to articulate movements with the sound of the vibrator. Because of the mechanical sounding voice and high cost constraints, this method is not preferred by many laryngectomy participants. Esophageal speech is produced by either injecting or insufflating air through the esophagus, and by making pharyngo-esophageal (PE) segment vibrate, whereas in TE speech, a puncture is made between the trachea and esophagus and a small silicon prosthesis is inserted into this opening. This prosthesis allows the air to pass from lungs to upper esophagus through a one-way valve action. Hence, like normal laryngeal voicing, TE speech is pulmonary driven, i.e. air from lungs is used to vibrate the PE segment. Because of the air supply from lungs, TE speech is supposed to result in longer phonation time and a higher intelligibility and acceptability ratings compared to ES and electrolarynx speech.
Various studies have compared TE and ES speakers, and report that TE speakers differ significantly from ES speakers. TE speakers' speech had comparatively higher F0, ,,, more stable F0 control, , lower perturbation values,  longer maximum phonation duration (MPD), and faster rate of speech ,,, than ES speakers. Few other studies have also reported that TE speakers had higher formant frequencies,  longer vowel duration,  but no difference in LTAS characteristics than ES speakers.  Further, studies investigating intelligibility and acceptability characteristics between TE and ES speakers have also reported TE speakers' speech to be more intelligible and acceptable compared to ES speakers. ,,,,,,
Most of the above mentioned studies which have compared TE and ES modes of alaryngeal speech have used different speakers for each speaking mode, and generally report large inter-participant variability among the participants within each mode. Although such an approach is useful in investigating the obvious differences between TE and ES modes, the differences noticed in those studies, between two alaryngeal modes, can also be influenced by other factors such as differences in the anatomy of voice source (placement of PE segment), differences in the length of the vocal tract, articulation ability, and speaking proficiency. This limitation can be solved by comparing both these modes within same speaker who is proficient in both these alaryngeal modes. Then the obvious differences noticed can solely be attributed to differences in the mode of speaking. In other words, if one were to assume that articulatory behavior remains constant in the two modes of speech, it could - potentially - help us to isolate the effects of ''source'' characteristics from the ''filter'' or vocal tract changes in these speakers. To the best of our knowledge, there is no study till today which has compared ES and TE modes within same alaryngeal speaker. Such a study will fill the gap in the literature by providing more valid and reliable information by comparing the two modes of speaking within the same speakers. Hence, the aim of this study is to compare different speech related parameters in two different modes (ES and TE) of alaryngeal speech production in dual-modealaryngeal speakers.
| Materials and methods|| |
Four male laryngectomees, proficient in both ES and TE modes, participated in the present study. Among them, three had total laryngectomy and one wide-field laryngectomy. Post-surgery all of them underwent training for esophageal mode of speaking. The duration between surgery and esophageal speech acquisition varied from 4 to 8 months. The number of sessions required for acquiring functionally serviceable ES mode varied from 22 to 44 sessions (each session was of 1 h duration). All the four participants used esophageal speech for their day-to-day communication successfully. Although they were proficient in the esophageal mode of communication, they were not very happy with it as they needed to frequently pause to inject air into their esophagus and bring it out as a modified burp. Hence, they opted for TE mode of speaking. The duration between laryngectomy and the acquisition of TE mode varied from 32 to 68 months. The number of sessions required to acquire TE mode of speech ranged from 3 to 5 sessions (each session for 1 h duration). Three participants used Blom-Singer's duckbill voice prosthesis while one used low-pressure voice prosthesis. All of them used digital occlusion for TE speech production and not the hands free tracheostoma valve. Apart from laryngectomy, they were otherwise healthy with normal hearing and no other speech or neurological problems. [Table 1] shows the demographic details of the four participants.
At the time of inclusion for this study, the proficiency of each laryngectomy participant in each mode of speaking was assessed by three experienced listeners. These listeners were trained in the rating of alaryngeal speech. The laryngectomy participants' reading samples were played to listeners for proficiency rating. The proficiency was rated on a 4-point scale, (fair, good, very good, and excellent). The proficiency was rated based on pitch, voice quality, rate, stomal noise and intelligibility.  The first laryngectomy participant had excellent proficiency rating on both modes, and other three had very good proficiency in both the modes. For the comparison purposes, five age and gender matched normal laryngeal speakers (NL) were included in the study.
Material and procedure
Speech samples were recorded directly on to the computer with the condenser microphone connected to the external hardware unit of the Vaghmi software (Voice and Speech Systems, Bangalore). The microphone-to-mouth distance was approximately 10 centimeters. The recorded samples were digitized at the rate of 16 kHz sampling frequency. The digitized samples were subjected to further analysis. Following tasks were recorded from each participant:
First, phonations of/a/,/i/, and/u/vowels were recorded. The laryngectomy participants were instructed to speak in their esophageal mode and TE modes separately. The order of phonation was counterbalanced across four participants. Three trials of each vowel in each mode of phonation were recorded from each participant. The recorded vowels were used for measuring fundamental frequency (F0) and maximum phonation duration (MPD). Fundamental frequency was analyzed using ''Acoustic Analysis'' program of Vaghmi diagnostics software. Before estimating the F0, the signal was low-pass filtered at 750 Hz in order to mark the periodic segments accurately. The ''Acoustic Analysis'' program uses LPC-autocorrelation method to compute fundamental frequency, and it estimates the F0 only for the quasi-periodic segments of voice. The fundamental frequency for three trials of/a/was averaged and then considered as the average fundamental frequency for phonation for/a/. Maximum phonation duration (MPD) is the maximum duration for which an individual can sustain phonation. The MPD was measured using the waveform window of Vaghmi diagnostics software. The vowel duration was noted by keeping a cursor at the beginning and end of the vowel. The MPD was again determined for three trials of/a/,/i/and/u/, and the averaged value was calculated for each vowel.
Second, participants were asked to read three meaningful but non-emotional Kannada (Dravidian Language spoken in South India) sentences: (1) "/ıðʊ pa: pʊ/(This is a child), (2)/ıðʊko: θı/(This is a monkey), (3)/ıðʊkɛmpʊbɑŋa/(This is red color). Each sentence was recorded three times in each mode of speaking. These recordings were used to obtain first two formant frequencies (F1 and F2) for vowels/a/,/u/(as in/pa: pʊ/) and/i/, (as in/ıðʊ/). Both the formants were measured directly using the LPC and FFT spectra of Speech Science Lab software (Voice and Speech Systems, Bangalore). The formants were estimated for each manually marked vowel and the formants were measured for several frames and the averaged value for each vowel was computed. In all the speakers both the formants could be accurately estimated.
Third, participants read a standardized passage in Kannada, developed and routinely used for speech and voice evaluation in speech clinic. They were instructed to read the passage at their comfortable loudness and rate. Laryngectomy participants read the passage in both ES and TE modes. Recording of reading passage enabled the measurement of following parameters: (1) speech rate, (2) frequency of pauses, (3) duration of pauses and, (3) articulation rate. Speech rate was defined as the number of syllables spoken in 1 min. The time needed to read the passage was measured with SSL software by marking the beginning and end of the reading and noting down the difference in time. The speech rate was extrapolated as syllables per minute (SPM) for each participant. The articulation rate is the time during which speech segments were produced, minus pauses. The pauses were removed from the reading samples (using SSL software) and the length of the utterance was measured directly from the waveform display on the screen of the computer. The articulation rate was again extrapolated as SPM for each participant. Frequency of pauses was obtained by counting the number of pauses, and the duration of each was also noted. The averaged value for both frequency and duration of pauses was calculated.
| Results|| |
The results for fundamental frequency (F0), maximum phonation duration (MPD) and formant frequencies are summarized in [Table 2]. Owing to the small sample size, descriptive comparison between ES and TE modes was done for all the parameters. The visual inspection of individual data suggested that the F0 values were higher in TE mode when compared to ES mode in all four participants with laryngectomy. Further, F0 in TE mode was comparable to NL speakers. Among the four participants, participant L × 2 showed significant difference in the mean F0 between ES and TE modes (40 Hz difference). The MPD values were significantly longer in TE mode of speaking when compared to ES mode in all the four participants. However, the MPD values were still significantly shorter in TE mode compared to NL speakers. Between modes comparison (between ES and TE alaryngeal modes) for both F1 and F2 values did not differ much in all the four participants. Compared to NL speakers, F1 values were higher in both ES and TE modes for all four participants.
|Table 2: Comparison of fundamental frequency (F0 in Hz), maximum phonation duration (MPD in seconds), and formant frequencies (F1 and F2 in Hz) between esophageal, tracheoesophageal and normal laryngeal modes. In the table along with the individual data, the median and inter-quartile range values are also given for each mode of speaking. In the table the MPD and formant frequency values are provided for each vowel separately|
Click here to view
Speech rate was comparatively faster in TE mode than ES mode [Table 3]. Further, speech rate in TE mode was comparable to NL speakers in two (Lx 1 and Lx 3) out of four participants with laryngectomy. Except in participant Lx 4, in other three participants there was not much of difference between the two alaryngeal modes for the articulation rate. Only participant Lx 1 had articulation rate comparable to NL speakers. Frequency of pauses was found to be much more in the ES mode than TE mode. Except in participant Lx 4, in other three participants, frequency of pauses in TE mode was comparable to NL speakers. For the duration of pauses, except participant Lx 1, in other three participants, there was not much of a difference found between ES and TE modes. Further, three out of four participants had their duration of pauses longer than NL speakers in both alaryngeal modes.
|Table 3: Comparison of rate of speech (syllables per minute), articulation rate (syllables per minute), frequency, and duration of pauses between esophageal, tracheoesophageal and normal laryngeal modes. In the table along with the individual data, the median and inter-quartile range values are also given for each mode of speaking|
Click here to view
| Discussion|| |
The main purpose of this study was to compare esophageal and tracheoesophageal speech modes in dual-modealaryngeal speakers. When the comparison is done between two alaryngeal modes (ES and TE) within each speaker, the voicing source remains the same with different air source. In ES mode, it is the air from the insufflated esophagus, while in TE mode, it is the pulmonary air. Hence, the differences noted can be specifically attributed to differences in mode of phonation. The results revealed several points of interest. First, the fundamental frequency values were higher in TE mode when compared to ES mode. Previously, some authors using two different groups of alaryngeal speakers have also reported no significant differences in fundamental frequency values between two alaryngeal modes. ,,, Higher F0 in TE mode suggests that, to a certain extent, laryngectomy speakers are capable of adjusting their voicing source on a myoelastic basis to influence changes in F0 (as in normal mechanism).  According to Moon and Weinberg, this could be because of three mechanisms. One, in TE mode, the mass and stiffness of the upper esophageal sphincter may vary in accordance with changes in F0. Second, through head and neck movements, upper esophageal sphincter can be indirectly adjusted. Third, by systematically modifying the voicing source, mediated through unspecified air pressure and/or airflow-dependent reflex responses. Based on the present results, it is logical to suggest that, at least forsustained vowels for these four dual mode speakers, regulation of the PE segment appears to be a stronger contributor towards control of F0 than airflows/pressure.
The MPD values in esophageal modes were significantly shorter than TE mode. Reduced MPD values in esophageal mode can be attributed to the limited insufflated esophageal air volume (80 cc).  In TE mode, the same laryngectomy speakers had an access to larger volumes of pulmonary air supply (5000 cc approximately). Hence, they were able to sustain phonation longer than their esophageal mode (although not significantly). However, lower MPD values were observed in TE mode compared to the NL speakers. Similar findings are mentioned in the literature. ,,,,, The reduced MPD in TE mode has been attributed because of higher trans-source flow rates. It can also be because of poor occlusion of stoma. If the stoma is not occluded properly it leads to leakage of air before it is directed into the esophagus via prosthesis.  Further, it has also been suggested to consider the resistance offered by the prosthesis to partially explain the reduced MPD in the TE mode.  The MPD values in NL speakers of this study were lower than the normative data reported for adult males in Indian population.  However, it was in agreement with the results from another study.  The mean MPD for/a/in TE mode of speaking was less than the values reported by two previous reports. , However, it matched with the reports of another study. 
The comparison of F1 and F2 values between two alaryngeal modes did not reveal much of a difference. This highlights that if the vocal tract length remained same, there is little effect of type of alaryngeal mode on the formant frequencies. In other words, change in mode of alaryngeal phonation does not bring in changes in the vocal tract resonance characteristics. In the present study, however, both the alaryngeal modes had higher F1 formant frequency values compared to NL speakers. Earlier studies ,,, have also reported higher values for formants frequencies (F1 and F2) for alaryngeal speakers than normal laryngeal speakers. In order to explain the differences in the formant frequencies between laryngectomy speakers and normal laryngeal speakers, two speculations have been posited. First, it is suggested that alaryngeal speakers may articulate vowels with more fronted and high tongue positions relative to that in normal laryngeal speakers.  Second, laryngectomy participants (both ES and TE speakers) pose a reduced vocal tract length and thus leading to an alteration in the vocal cavity transmission characteristics in TE and esophageal speakers. ,, This reduced vocal tract length can be because of either shortened vocal tract cavity after surgery or the location of the PE segment. Physiological (Cineradiographic images) evidence suggests that PE segment is located higher (C4-C6) than the vocal folds (C5-C6) in normal laryngeal speakers. ,
In ES mode speech rate was slower compared TE mode. However, both the alaryngeal modes were significantly slower in rate of speech and had higher number of frequency of pauses compared to normal laryngeal speakers. Present results support similar findings by other studies. ,,, The reduced rate of speech in ES speech has been attributed to the increased pause time needed to insufflate the pseudoglottis.  Although in TE mode the same laryngectomy speakers had an access to air from lungs, their rate of speech was still slower than normal laryngeal speakers. Hence, access to pulmonary air supply is not sufficient for normal speech rate characteristics. The differences in efficiency of airflow control during normal and alaryngeal modes have been attributed to differences in speech rate related parameters. 
The articulation rate has been calculated in order to get the actual rate of speech of the person when all the pauses have been removed from the speech sample. The results of this study show that there is no significant difference in articulation rate between the two different alaryngeal modes in three out of four alaryngeal speakers. This could be because of the same vocal tract and articulators are being used for both the ES and TE modes. However, significance difference was noticed for articulation rate between two alaryngeal modes and normal laryngeal speakers.The results suggested that the articulatory system of alaryngeal speakers is not biologically tuned to execute speech movements faster.  Hence their speech movements are slower than normal laryngeal speakers.
To conclude, although the number of laryngectomy subjects was relatively small in the present study, results indicated some interesting findings with respect to comparison of TE and ES modes in bimodal speakers. First, there was no significant difference between two alaryngeal modes for any of the measured speech characteristics. This suggests that, if we compare TE speech and ES speech modes, when the persons are equally proficient in both these modes, then there may not be much difference between two modes. Second, both the alaryngeal modes had reduced MPD values, higher first formant frequency values, slower speech rate and higher frequency of pauses. Articulatory rate appeared to be comparatively better parameter to assess changes in speech rate in this population. Further studies need to be done with large number of participants to validate present findings.
| References|| |
|1.||Blood GW. Fundamental frequency and intensity measurements in laryngeal and alaryngeal speakers. J CommunDisord 1984;17:319-24. |
|2.||Debruyne F, Delaere P, Wouters J, Uwents P. Acoustic analysis of tracheo-esophageal versus oesophageal speech. J LaryngolOtol1994;108:325-8. |
|3.||Globlek D, Stajner-Katusic S, Musura M, Horga D, Liker M. Comparison of alaryngeal voice and speech. LogopedPhoniatrVocol 2004;29:87-91. |
|4.||Robbins J, Fisher HB, Blom EC, Singer MI. A comparative acoustic study of normal, esophageal and tracheoesophageal speech production. J Speech Hear Disord 1984;49:202-10. |
|5.||Sedory SE, Hamlet SL, Connor NP. Comparisons of perceptual and acoustic characteristics of tracheoesophageal and excellent esophageal speech. J Speech Hear Disord 1989;54:209-14. |
|6.||Cervera T, Miralles JL, González-Alvarez J. Acoustical analysis of Spanish vowels produced by laryngectomized subjects. J Speech Lang Hear Res2001;44:988-96. |
|7.||Ng ML, Liu H, Zhao Q, Lam PK. Long-term average spectral characteristics of cantonesealaryngeal speech. AurisNasus Larynx 2009;36:571-7. |
|8.||Blom ED, Singer MI, Hamaker RC. A prospective study of tracheoesophageal speech. Arch Otolaryngol Head Neck Surg 1986;112:440-7. |
|9.||Bridges A. Acceptability ratings and intelligibility scores of alaryngeal speakers by three listener groups. Br J DisordCommun 1991;26:325-35. |
|10.||Cullinan WL, Brown CS, Blalock PD. Ratings of intelligibility of esophageal and tracheoesophageal speech. J Commun Disord1986;19:185-95. |
|11.||Doyle PC, Danhauer JL, Reed CG. Listener's perception of consonants produced by esophageal and tracheoesophageal talkers. J Speech Hear Disord1988;53:400-7. |
|12.||Law IK, Ma EP, Yiu EM. Speech intelligibility, acceptability, and communication-related quality of life in Chinese alaryngeal speakers. Arch Otolaryngol Head Neck Surg 2009;135:704-11. |
|13.||Tardy-Mitzell S, Andrews ML, Bowman SA. Acceptability and intelligibility of tracheoesophageal speech. Arch Otolaryngo l1985;111:213-5. |
|14.||Williams SE, Watson JB. Speaking proficiency variations according to method of alaryngeal voicing. Laryngoscope 1987;97:737-9. |
|15.||Bellandese MH, Lerman JW, Gilbert HR. An acoustic analysis of excellent female esophageal, tracheoesophageal and laryngeal speakers. J Speech Lang Hear Res 2001;44:1315-20. |
|16.||Most T, Tobin Y, Mimran RC. Acoustic and perceptual characteristics of esophgeal and tracheoesophageal speech production J Commun Disord 2000;33:165-81. |
|17.||Lundström E, Hammarberg B, Munck-Wikland E, Edsborg N. The pharyngoesophageal segment in laryngectomees--videoradiographic, acoustic, and voice quality perceptual data. LogopedPhoniatrVocol 2008;33:115-25. |
|18.||Moon JB, Weinberg B. Aerodynamic and myoelastic contributions to tracheoesophageal voice production.J Speech Hear Res1987;30:317-95. |
|19.||Baggs JW, Pine SJ. Acoustic characteristics: Tracheoesophageal speech. J CommunDisord1983;16:299-307. |
|20.||Pindzola RH, Cain BH. Duration and frequency characteristics of tracheoesophageal speech. Ann OtolRhinolLaryngol1989;98:960-4. |
|21.||Krishnamurthy BN. The measurement if mean Airflow Rate in normals [Masters Thesis]. Mysore: University of Mysore; 1986. |
|22.||Rajashekhar B. Acoustic Analysis of Alaryngeal Speech (TEP with B.S. Prosthesis and Esophageal Modes) [Doctoral Thesis]. Mysore: University of Mysore; 1991. |
|23.||Liu H, Ng ML. Formant characteristics of vowels produced by Mandarin esophageal speakers. J Voice 2009;23:255-60. |
|24.||Kazi RA, Prasad VM, Kanagalingam J, Nutting CM, Clarke P, Rhys-Evans P, et al. Assessment of theformant frequencies in normal and laryngectomized individuals using linear predictive coding. J Voice2007;21:661-8. |
|25.||Sisty N, Weinberg B. Vowel formant frequency characteristics of oesophageal speech. J Speech Hear Res 1972;15:439-48. |
|26.||Damsté PH, Lerman JW. Configuration of the neoglottis: An X-ray study. Folia Phoniatr (Basel) 1969:21:347-58. |
|27.||van As CJ, Op de Coul BM, Van den Hoogen FJ, Koopmans-van Beinum FJ, Hilgers FJ. Quantitative videofluroscopy: A new evaluation tool for tracheoesophageal voice production. Arch Otolaryngol Head Neck Surg2001;127:161-9. |
|28.||Christensen JM, Weinberg B. Vowel duration characteristics of esophageal speech. J Speech Hear Res 1976;19:678-89. |
|29.||Gandour J, Weinberg B, Rutkowski D. Influence of postvocalic consonants on vowel duration in esophageal speech. Lang Speech1980;23:149-58. |
|30.||Ng ML, Gilbert HR, Lerman JW. Fundamental frequency, intensity and vowel duration characteristics related to perception of Cantonese alaryngeal speech. Folia Phoniatr Logop 2001;53:36-47. |
[Table 1], [Table 2], [Table 3]