The application of magnetic resonance imaging to the field of diagnostic medicine has been heralded as potentially the most important achievement since the discovery of x-rays. Until the discovery of MRI, the only methods of acquiring images from inside the human body involve the use of ionizing radiation, radioactive isotopes, or ultrasound waves. MRI uses magnetic fields, radio waves, and computers to create highly detailed anatomical images. Hydrogen nuclei abundant in live tissue behave like small randomly oriented bar magnets. When tissue is exposed to a strong magnetic field from an external magnet, the nuclei are oriented within the field. By passing radio waves through the body, some of the protons absorb some of the energy and change direction. When the radio waves are turned off, the protons return to their original orientation, releasing the absorbed energy and generating a signal that can be picked up by an antenna or a specific body coil and processed by a computer into an image. Because different tissue types have different proton densities, the resulting image is a precise representation of soft tissue anatomy. MRI data can be acquired in single slices or volumetrically. The technology permits direct imaging of anatomy in the axial, sagittal, or coronal planes without the need for computer reformatting of the data acquired in one set of parallel planes to produce images in an orthogonal plane. The general electric sign system is a superconducting magnet which can be operated at either high or low field strengths, although it was designed for imaging at 1.5 Tesla. Specific antenna or coils have been designed to image the head and neck. For experiments imaging the entire vocal tract, either the head or neck coil are used depending on the individual subject variation in size and shape of the head and neck. A magnetocompatible microphone for recording speech signals is attached to the head coil prior to data acquisition. The subject is positioned with the head secured and oriented in the sagittal plane. In the original speech experiments, acoustic recordings were made just preceding and immediately following the noise of the acquisition sequence. In subsequent studies, shorter acquisition time and the development of gating technology allow acquisition of acoustic recordings between segments of acquisition noise. These images represent data from the first MRI experiments in the vocal tract conducted by Haskins Laboratories in collaboration with Yale beginning in 1988. The purpose of these experiments was to gather basic data to apply in computational models of speech articulation. Previously, data on vocal tract shape and dimensions were obtained using x-ray. MRI provided a relatively non-invasive method for obtaining precise measurements on the entire vocal tract without the use of x-ray. Portions of the pharynx which were difficult to view by other means were well defined using MRI. Images were obtained in axial, coronal, and sagittal planes of reference during production of four-point vowels. Area functions describing the individual tract shapes were obtained by measurements performed on the MRI images. Digital filters derived from these functions within use to re-synthesize the vowel sounds. Two views of the vocal tract are shown during static production of the vowel E and the vowel A. Images illustrate the superior to inferior progression of an axial series of slices through the vocal tract. Coronal sections illustrate the soft tissue outline of airspace formed during the production of the two vowels. Note the tongue groove present in anterior sections as well as piriform sinus definition in the posterior pharynx. These computer-generated views of the vocal tract were constructed from measurements of axial, sagittal, and coronal sections and represent the inside dimensions of the vocal tract during production of the vowel A. The application of technology for defining the configuration of vocal tract has subsequently been applied to research involving populations with speech impairments to establish the contribution of specific changes in the vocal tract to the acoustic product. These sagittal views represent vocal tract configuration during static production of vowels A and E in an individual with Parkinson's disease. The excursion of the tongue for the production of E is reduced while the constriction of the vocal tract is reduced for A as well as lip aperture for A. This reduction of range of placement is consistent with data from the limbs in such populations illustrating reduction in range of motion of the articulators throughout the vocal tract. Data from pre and post-operative scans of the vocal tract in an individual with supraglottic carcinoma illustrate the change in shape of the pharynx and change in base of tongue posturing during the production of vowels A and E. The purpose of this investigation was to further define the nature of voicing deficits in individuals who undergo supraglottic laryngectomy where the larynx is spared. The acoustic product, though the vocal folds are preserved, is often quite different than the preoperative voice. These studies show that the acoustic quality of the speech of the post-surgical subject is affected both by changes in the dimensions of the pharynx and in the surgical adjustment of laryngeal tension. These changes in tension are the result of surgical removal of the epiglottis and suturing of the thyroid cartilage to the hyoid bone and base of the tongue. For images acquired during steady-state phonation, there is a trade-off between image quality and the time required to acquire the images. For example, a method of imaging called turboflash allows acquisition of images at a rate of one per second but produces slices of a lesser resolution. The following series illustrates in real-time transitions in the production of E, O, and OO. In subsequent work, study of the posture of the tongue during articulatory gestures employed the use of pseudo pellets secured to the subject's tongue. Images acquired with the pellets in place allowed the investigation of correspondence between movements measured using other technologies and those obtained using MRI. This information is valuable because it defines placement of markers used in dynamic technologies such as magnetometer and x-ray microbeam experiments. It also is vital to testing the computational modeling in which articulatory movements in the vocal tract are used to generate changing area functions. These area functions are then used to generate the acoustic signal for a given utterance. Data have been collected on movement and MRI for the same utterances for the same speakers. This allows comparison of the articulatory movements and acoustic signal collected for these utterances with modeled utterances in which the measurements from the subjects own vocal tract is used for simulation. The pseudo pellets are crucial for relating the two types of data movement and vocal tract image. MRI is a rapidly evolving technology. There is currently no standard for using this methodology in speech research although acoustic limitations remain. The reduction of acquisition time and the advances made in gating images to specific speech events make this the most promising technology for obtaining quantifiable data on the structural composition of the vocal tract. In addition advances in functional imaging technology provide an avenue for exploring the relationships between central control of speech movements and the actual speech events. This is an ultrasound image of my tongue moving as I speak. Ultrasound provides images of muscles and soft tissues of the oral cavity as they move in real time. Ultrasound imaging provides new and important information for the study of speech and swallowing. It can be applied to clinical and research uses for linguistics, for patient care, and even for speech recognition. Ultrasound works by measuring the reflected echoes of sound waves as they bounce off the surfaces of objects. An image is produced by holding an ultrasound transducer under the chin. I'm going to put this gel on the transducer. It will couple the transducer to your skin so that the air in the pores of your skin won't affect the sound wave. Let me hold this, just look straight ahead. Excellent, good. Now I want you to just say a few things. Start by saying Allah three times. The transducer is composed of a series of piezoelectric crystals which each fire in turn, or each emit a sound wave in turn. We hold the transducer in one direction to get a lengthwise or sagittal image and we rotate the transducer 90 degrees in order to get a crosswise or coronal image. Ultrasound creates a slice of tissue, it's a tomograph, that's two millimeters thick, and the sound waves when they reflect back are picked up by computer which then reconstructs them into a visual image. Looking again at the image of my tongue, the surface of my tongue is a bright white line in the middle of the screen. This is the lengthwise surface. The right hand side of the image is the front of the mouth. The left hand side is the back of the mouth. The conical shaped shadow on the right side is the jaw and the conical shaped shadow cast on the left side is cast by the highway bone. As I speak you can see how complex tongue movement actually is. The tongue, the normal tongue during speech, undulates in this manner. Because the tongue is composed entirely of soft tissue, whenever it moves it also deforms. Looking now at the cross-sectional surface of the tongue, the bright white line is again the surface, the left hand side of the tongue is on the left and the right hand side on the right. In this plane we can also see the tremendous complexity showing the compression and extension of different portions of the tongue. Well actually we know very little about the inner workings of the vocal tract. Ultrasound allows us to see patterns and behaviors, even such simple behaviors as vowels and diphthongs. Bo, bo, boop, boop, boop, boop, boop, bye, bye, bow, bow, boy, boy. Pete's job was to keep the baby happy. In addition to seeing simple behaviors like vowels and diphthongs, we can look at complex movements such as sentences. George is at the church watching a magic show. We rode with Lucy around the tall tower in her new yellow car. While the sagittal images provide us with interesting information, the really unique contribution of ultrasound is in the coronal or cross-sectional plane. These images show us movements that we simply cannot see in any other way. Now, eh-leh, eh-leh, eh-leh, eh-leh, eh-leh, ah-rah, ah-rah, ah-rah, ah-rah, ah-rah, oh-loh, oh-loh, oh-loh, oh-loh. Now that we can look at normal tongue movements, we are interested in seeing how they differ from pathologies, particularly the deaf and other kinds of disorders that would affect tongue movement. Pete's job was to keep the baby happy. Today, Dick told Patty about it. The girls were baking the biggest cake from the tag. Their brother would have baked the car. In a speaker who is hearing impaired, we expect to see restricted tongue movements. We see a loss of flexibility in the tongue, and particularly in the posterior tongue, we don't see the kind of undulations that we see in a normal, normally hearing speaker. George is at the church watching a magic show. We rode with Lucy around the tall tower in her new yellow car. One of the obvious applications for a system like this, where you get visual information of previously hidden behaviors, is biofeedback. So a person who has hearing impairment or another speech disorder can now see the movement of their tongue, whereas before they had to rely on hearing the output. Arr, arr, arr. Now, rrr. Rrr, rrr. Good, okay. Now I'll do it. Arr, arr, arr, rrr. Rrr. There you go. That's much better. In ultrasound images, we can of course see the surface of the tongue, and then over time, we can see where the tongue has moved. It's very easy for us to track upward and downward movement of the tongue on ultrasound, but it's not so easy to extract the anterior-posterior components of that movement. In other words, if the tongue compresses and extends. If we attach a pellet to the surface of the tongue, however, and track the movement of the pellet, then we can extract all of the directional components of the movement. A pellet is attached to the tongue using dental impression material, which hardens and holds it in place. When we look at the pellet on the ultrasound image, we cannot actually see the pellet itself, but instead we see a reverberation artifact, which looks like a beam of light emanating from the point where the pellet is on the surface of the tongue. It's often called a comet tail. In order to produce a truly reliable image, we need to know exactly where the transducer is relative to the head at all times. Therefore, we use this holder. The holder allows us to position the transducer in the sagittal or the coronal planes at any angle we like, and to be able to measure that position relative to various structures of the head. We're going to look at your swallowing patterns now on ultrasound. I'll hold the transducer under your chin again, and this time Rob will use the syringe to inject the water. That way we can control the amount of water that goes in your mouth and that you swallow. Another major application of ultrasound is to swallowing. We first see the water enter the image, and we see the tongue deform around the water bolus. Typically, if it's a command swallow, the subject will hold the water in their mouth for a moment and then begin to swallow. While the water is held steady in the mouth, we can see the pellet bone because the ultrasound passes through the water all the way to the pellet. As the subject swallows, we see the hyoid or the shadow of the hyoid, usually on the left side of the image, swing up and into the picture and then back out. When the hyoid is at its maximum position, we know that the airway is protected. Now we want to look at continuous swallowing, so just put the straw in your mouth and I'll tell you when to start and then just keep swallowing. Ultrasound imaging can also be used to image children swallowing or even babies swallowing from a bottle, in which case we see the tongue deforming around the nipple of the bottle. This is an advantage because there are many procedures that are unpleasant for children to have to experience, and ultrasound is a pleasant, comfortable procedure children don't mind using. Until now we have focused primarily on ultrasound images of the vocal tract, but in addition, ultrasound can be used to image the vocal folds. We're going to look at some ultrasound images of your vocal folds now. These are the source of your sound when you speak. In order to do that, I'm going to place the ultrasound transducer under your chin against your neck in the transverse plane that is pointed directly backwards, and we can then look at your vocal folds. I'd like to start by having you just say ah ah ah. Go ahead. Good. In this image we can see the vibration of the vocal folds a little bit and the opening and closing as it goes into the ah, but of course in fact the vibration of the vocal folds is too fast for the video frame rate. We also see the vocal folds opening and shutting abruptly for the cough. Ultrasound has a number of advantages or strengths as a measurement system in the vocal tract. The first is that it's real time. This allows us to measure speech as it's actually being spoken. We don't have to sustain sounds. The second is that it's not an invasive technique. It's a very comfortable procedure, and it doesn't involve any radiation, so that allows us to have long recording sessions to use multiple repetitions to not worry about patient comfort or biohazards. In addition, another advantage is that because ultrasound creates a slice of tissue, a very thin slice of tissue, we don't have to worry about the tongue being obscured by the teeth and the jaw. That is also because we have a good soft tissue definition. Ultrasound was designed to image soft tissue. As with any instrument, ultrasound has some limitations as well. The first is that if one were to actually buy an ultrasound machine rather than affiliate with a local hospital, it can be rather expensive purchase as much as $200,000 for a sophisticated machine. A second and more important disadvantage is that ultrasound will not jump an air gap so that while we can get images of the surface of the tongue, we don't in the normal course of events get images of the palate or the posterior pharyngeal wall. Sometimes we don't see the tongue tip as well because there is air beneath that. Finally, and in the same vein, while ultrasound images soft tissue quite well, it doesn't image hard tissue or bone and therefore we see the shadows of the hyoid bone and the jaw, but we do not see the structures themselves. Once we've collected the ultrasound images, we need to take it a step further and analyze them. We don't want to just look at them, but we want to be able to distinguish between normals and patients, between different languages, and in order to do that we have to extract features that are meaningful within the images and quantify them. In order to analyze the data, we have to first transfer the images from the videotape onto the computer. Here we're watching the videotape in motion on the screen of the computer. In order to collect these frames, I now tell it to begin collecting and then we can watch it move over time as we scroll frame by frame through the images. In this case, I selected 25 frames, so the first 25 frames it came upon it took. In order to analyze this data, I need to detect the edge of the tongue or the upper surface of the tongue and then when I'm all finished, I smooth the contour. We can also compare images by overlaying one contour upon another. The data itself is stored in a table which can then be saved and used in other files for further data analysis. The next thing we want to do is graph the data. This is now an image of the upper surface of the tongue in cross-section during the vowel E. The sides of the tongue deform around the sides of the palate and of course the tongue is quite elevated in midline. Fitting it with a curve will fit it with a quadratic or second order fit. This number gives us very important information about the tongue, about the shape of the tongue at this moment in time. The negative sign indicates that it's an arched tongue rather than a grooved tongue and the absolute value of the number tells me how steeply arched the tongue is, whether it's flat and gradual or whether it's very steeply arched as it is in this case. In order to watch the tongue moving over time, it's often very useful to display the edges one after another in sequence. This is a sequence of E-Li with E at the bottom, the L in the middle, and then the second E at the top and over time we can see the changes in the shape of the tongue as it moves from the E to the L and then back into the E. These extracted measures and a number of other techniques are very important in allowing us to finalize and get really good information from our data. Ultrasound is a unique instrument at this point in time. It's the only way we can really see cross-sectional movements of the tongue, pharyngeal movements of the tongue. It creates an entire new perspective in the way we look at the vocal tract and the way we think of how the structures move. Namely, it causes us to think three dimensionally instead of two dimensionally. Great. That was just great. As we begin to understand more about the relation between vocal tract shapes and the properties of the sound that results from vocal tract movements, it becomes possible to infer something about vocal tract movements from measurements of the sound. Such measurements have the obvious advantage that they are non-invasive. An additional positive aspect of acoustic measurements is that frequently acoustic data can provide a more sensitive measure of an articulatory movement than direct measurement of the movement itself. For example, when the size of an opening or a constriction is relatively small, a critical change of a millimeter or so in the position of an articulator may be difficult to measure accurately but may have an important influence on the properties of the sound output. We shall give here three examples of situations in which acoustic measurements can be used to infer articulatory movements. These examples are selected to illustrate three different ways in which the acoustics can be interpreted. The first example has to do with the velopharyngeal opening and in which we are looking at the nasalization in a vowel. And this first example is a representation of this word bender. And we have in the upper display the waveform of this entire utterance. We have in the lower display here a segment of that waveform with an expanded time scale and in the lower left corner we have a spectrogram of the utterance. You see the first vowel, the nasal consonant, and the second vowel. And then in this picture here you see a spectrum of the utterance which is sampled at this moment at the beginning of the first vowel and you can see the window here. What we're going to do is step through the vowel step by step here and watch the change in the spectrum and from this change in the spectrum we'll be able to infer what's happening to the velopharyngeal opening as we move into the nasal consonant. So here's the spectrum just after release of the B and there's very little evidence of nasal ization here, perhaps a small evidence of a zero at about 800 Hertz. Now I'll step through this window and examine how the spectrum changes as we do that. So we're going to step through 10 milliseconds at a time and as we move over 10 milliseconds we begin to see, say 20 milliseconds, we begin to see a deeper zero in the spectrum and a little bit of evidence of a nasal pole here. And we'll step through a little more, 10 milliseconds at a time as we move into this first vowel and again we see a zero and around a thousand Hertz and evidence of a nasal pole and that zero is gradually shifting up in frequency. The nasal pole is becoming more prominent and as we approach this nasal consonant over here we're seeing more and more nasalization in the preceding vowel. And so now we are up close to the edge of the nasal consonant and we see this again the evidence of the nasal resonance and the zero here. The results of the analysis we've just shown for this word bender can be plotted as in this next figure. The vertical line at zero milliseconds is the place where closure for the consonant is made as estimated from the discontinuity in the acoustic signal. The measured frequencies of the pole and zero are displayed as a function of time. At the time of release at about minus 130 milliseconds there is no venal pharyngeal opening and the pole and zero cancel each other. Early in the vowel the pole and zero separate and both increase in frequency and the spacing between the pole and the zero also increases. At the time of closure there is an abrupt increase in the frequency of zero of the zero as expected from theory. Interpretation of this plot in terms of the actual venal pharyngeal opening follows a process something like this in this figure. A schematization of the vocal tract is shown with the nose and the mouth and the glottis. The poles are the frequencies where the sum of the susceptances at the intersection is zero. That is where b sub n is equal to minus b sub p plus b sub n. The poles then are the intersection of these two graphs the minus bp plus bm graph and the bn graph. Susceptance bn looking into the nose changes as the velopharyngeal port increases. The nasal pole in this case increases as in frequency as the area of the opening increases. The lowest zero of the combined output u sub n plus u sub m from the mouth and the nose is between the zero of each output separately. The frequency of the zero also increases with the area of the velopharyngeal port. Preliminary calculations show that the area increases up to about 0.3 to 0.4 square centimeters by the time closure occurs. A similar plot for another ratternce of the same kind is shown in this figure for the word dander. The nasal pole starts out at a bit higher in frequency here but the pattern is similar. The main point of these examples is that some estimate of the velopharyngeal opening can be made based on acoustic measurements. This example illustrates how we can use the acoustic data to study in detail what happens at the release of an african african consonant. Okay in this case the african is a cha. Here's the utterance. A cha. A cha. Okay here's the waveform again with the african consonant here. The release of the african at that point. Here's the expanded waveform. Let's just center it here and we're interested in this noise event at the release of this consonant. Over here we have the spectrogram and so what we want to do is through acoustic analysis make some inference about what's happening in the mouth with the tongue tip and the tongue blade as you release this consonant cha and move into the following vowel. Here's the spectrum at the release of this cha. There's a little click, a little burst here at the release of the cha and the spectrum with this narrow time window has a peak in here at about 35, 3300 Hertz and later on when we shows the graphs we'll make some inference of what's happening with the tongue tip at this point. Okay so now we can move this window through this african consonant. I'm now going to look at the spectrum at different places throughout this initial consonant and the method I'll use is when we're looking at these noisy signals is we're going to do some averaging over about a 15 millisecond time interval. So this first picture here gives an average spectrum over a time interval of 15 milliseconds and there's the spectrum right there and we see a peak again at about 3500 or 3300 Hertz as we move through this Africa the spectrum changes gradually and going through 10 milliseconds at a time and we see that at some point here we start building up energy at lower frequencies in this case around 2900 Hertz and that can be interpreted as noise as the changing position of the tongue blade which we will show in these in these pictures that I will display later. So we're moving through and we observe this change in spectrum we're now coming towards the end of the Africa we're now seeing an additional peak down here at about 1500 Hertz and then finally we move into the vowel. Let's now try to interpret these observations in terms of articulatory shapes and movements. Some of the observations we've made are summarized in this figure which is actually from a different utterance. The spectrogram is shown here with the release at this point and the onset of voicing at this point. We'll show a number of spectra at different points throughout this release. At the initial release there is a transient presumably due to the abrupt discharge of the compressed air in the vocal tract and we see a peak in the spectrum around three to three and a half kilohertz. As shown in the schematized and sagittal section here, this transient excites the front cavity including some sublingual space a total length of about 2.5 to 3 centimeters yielding a resonance of about 3 to 3 and a half kilohertz. As we move into the Africa the noise decreases and then the airflow from the inward moving walls and from the lungs gets going and generates turbulence noise at this same narrow constriction or at the lower incisors downstream from the constriction. The frequency of the spectral peak about 17 milliseconds after the release is about the same as at the release indicating about the same front cavity resonance. About 30 milliseconds later the situation has changed so that a second major peak appears below the first one at about 2,600 Hertz. By this time the constriction formed by the tongue blade has opened up and the resonance of the long tongue blade constriction is excited by the noise. This is actually the third formant. Some second formant excitation is also evident probably from a noise source at the glottis. As we move on another 40 odd milliseconds the source shifts almost entirely to the laryngeal area and all formats are excited by the noise. Finally glottal vibration begins. Once again then we can trace instant by instant the movements of the tongue blade by examining the details of the acoustic signal. In this case when turbulence noise or a transient is the source of excitation for the system. It's a male speaker and he's producing a stop consonant D. Here's what it sounds like. Now what I want to do is make some inference as to how in the release of this consonant how the tongue blade is moving and how the jaw might be falling as you release the consonant. So what we're going to do is make those inferences on the basis of studying the movement of the first format here. The first place in this release that you can see the first format is in the first glottal pulse which is shown by the cursor here and if one looks on the spectrogram you see the burst and then shortly after and you see the first glottal pulse. So let's go back to that and center the cursor on this first pulse. When we do that we can see a spectrum showing the first second and third format frequencies and at the moment we're just going to concentrate on measuring the frequency of the first format which is 473 Hertz. We move along pitch period by pitch period and we observe how the format frequency changes. Here it's moved up quite rapidly from 473 to 567. We move along a step further. Here the next third glottal pulse and the second the first format frequency has gone up a little farther and we can continue this process each time measuring the frequency of the first format. We show here several examples of f1 and f2 contours for VCV utterances ebe, ede, egge, and ada. All of the f1 contours we have measured have in common the attribute that the time taken for closure to occur is considerably less than the time taken for the release. During the few milliseconds immediately preceding closure or following release the Helmholtz formula corrected for the effects of the vocal tract walls is a reasonable approximation to the relation between f1 and the area of the constriction and permits calculation of the area from f1. At the closure we observe that the rate of movement of f1 is greatest for labial, slower for the alveolar, and slowest for the velar consonant. This different rate of movement of f1 close to the implosion can be accounted for in part by differences in the length of the constriction for the different types of stops even without assuming differences in rate of change of cross-sectional area. However it is probable that there are also differences in rate of change of cross-sectional area for the three different articulators. Rates of change in durations of f2 movements at closure indicate slower and longer movements of f1 in relation f2 in relation to f1 for labials and alveolars. Not evident though in the Edde example. These different rates of movement are predictable from theoretical considerations involving calculation of the natural frequencies as the constriction size changes. At the consonant release a somewhat different story emerges based on the limited acoustic data we have looked at with three speakers. Again there are even clearer differences in rate of movement of f1 for consonants produced with the three different articulators with labials being the fastest and velars the slowest. Second there seems to be a slower component to the f1 rise after the initial rapid rise probably attributable to the lowering of the mandible which moves more slowly than the initial movement of lips or tongue blade. Third in the releases one often sees evidence of a component attributable to forces on the articulator due to pressure behind the constriction here here and possibly here. These forces cause in effect an early release of the articulator with a subsequent temporary slowing down of the movement after the release. One can often see perturbations in the f1 movement following the release particularly for the velars and the alveolars. These few examples illustrate the potential of acoustic data for inferring articulatory movements. In these examples we have been particularly interested in interpreting acoustic data in the vicinity of consonantal landmarks. Other examples of the use of acoustic measurements include derivation of laryngeal configurations, interpretation of bursts at stop consonant releases, and interpretation of acoustic spectra in the vicinity of liquid consonants. As models for the production of these various classes of sounds become more refined it is expected that acoustic data can provide even greater insight into articulatory movements. Electromyography is the recording of the action potentials of muscles during contraction. The purpose of this kind of examination is the investigation of activities and pathologies of the muscles and their nervous system control. The particular merit of EMG studies for speech research is that they can provide information about speech gestures in their natural units and that the directory reflects the motor command from the central nervous system carried by neural impulses. Say sad again. Say sad again. One motor neuron cell and the multiple muscle fibers it controls are called a neuromuscular unit. The number of muscle fibers in one neuromuscular unit which is called innovation ratio varies widely for different muscles. It has known that muscles controlling fine movements and adjustments have the smallest number of muscle fibers per motor unit. Laryngeal muscles are said to be some of the muscles that have the smallest units with from 30 to 250 fibers per unit. When the impulse reaches the level of the neuromuscular unit the excitation is conducted from the neuromuscular junction along the muscle fiber and it induces the depolarization of the cell membrane and action potential. Electromyography is the recording of the action potential using extra cellular electrodes. The muscle fibers of one neuromuscular unit activated synchronously and show spike signals with from one to three phases. As it's shown in this figure 25 neuromuscular unit action potentials are summed into the signal at the bottom. This is called an interference pattern. Two basic types of electrodes are commonly used for EMG studies. Surface electrodes and inserted electrodes. The surface electrode is shown on the top. Inserted electrodes can consist either of two monopolar electrodes or one bipolar electrode. A monopolar electrode is shown in the middle. A bipolar electrode at the bottom. These are silver disc surface electrodes. They consist of a detection surface that senses the current on the skin through its skin electrode interface. These are silver ball surface electrodes. Another kind of surface electrodes. These do not require an electrically conductive shell. A monopolar hooked wire electrode consists of a thin wire that has a thin insulation coating. The wire is inserted into a 30 gauge needle and the end of the wire to make a hook. Bipolar hooked wire electrodes contain a pair of thin insulated wires inserted into a 26 or 27 gauge needle. They are conveniently used for most of the articular tree muscles. This is a schematic illustration of a bipolar electrode. The needle contains two thin wires that are bent close to the edge to make two hooks. After insertion into the muscle, the needle is removed. This is a diagram of our EMG system. It has three instrumental levels. Interfaces, electronics for recordings, and analysis. The first level is electrode which is the interface between muscles and recording system. Second level is the recording system. Amplifier, filter, and tape recorder or A to D converter and computer. The third level, analysis system, the computer. In placing surface electrodes, the dead surface layer of the skin along with its protective oils must be removed to lower electrical impedance. An alcohol swab is often used for the light operation needed to do this. Here silver disk surface electrodes are being applied to the subject's lower lip. In this scene, silver ball surface electrodes are being applied. In both cases, the wires are connected to the recording system. Ground electrode is treated in the same way as surface electrodes. After the light operation, the ground is attached to the subject. And then to the recording system. In this scene, hooked wire electrode insertion into anterior belly of digastric muscle is being shown. After sterilization, a monopolar electrode is inserted through the neck surface. A second electrode is being inserted into the same muscle about one centimeter away from the first insertion. After application of topical anesthetic with a pan jet, a bipolar hooked wire electrode is being inserted into the medial pterygoid muscle of this subject. And then a grounded electrode will be applied to his ear. Now, he'll have some EMG data from this subject. The speech-audio signal is shown on the top, anterior belly of digastric is next, and medial pterygoid muscle activity at the bottom. Here are some data along with some movement data after they were digitized. Again, audio data on the top, jaw raising and lowering movement next, EMG data from anterior belly of digastric ABD3 in red and medial pterygoid MPT at the bottom in blue. You can see the discrete activity of ABD for jaw lowering and MPT for jaw raising movements. This figure shows one way that EMG signals are often processed. The raw signal is at the top, the rectified signal in the middle in red, and smooth signal at the bottom in blue. Rectification involves the concept of rendering only positive deflections of the signal. Here, negative values are inverted. This is called full wave rectification. Since the rectified signal reflects the random nature of the signal amplitude, smoothing it is useful to extract the amplitude-related information. Smoothing involves suppression of high-frequency fluctuation in a signal. Shown here is smoothing with a window size of 43 ms. This figure shows the data after rectification and smoothing of say-sat again. The discrete activity of ABD for jaw lowering and MPT for jaw raising becomes easier to be observed. Here we see several smooth data traces and some average data. Averaging signals from several repetitions or tokens of an utterance helps us get general information about muscle activity. First, the signals must be aligned and then the average calculated. Here, each token is aligned at the peak of the second burst of activity, which is marked by red vertical lines, and the resulting ensemble average signal is shown at the bottom in blue. This figure shows average signals of say-sat again. Ten tokens are aligned at the peak jaw displacement of sat, which is marked by the vertical lines. It displays the general information of ABD and MPT activities of this subject. EMG recordings capture natural speech performance. EMG is one way of approaching the study of muscle activity, which itself a reflection of the underlying motor control systems. As such, EMG is a very useful tool for revealing the neural motor network to the speech researchers. you