Physiol Rev Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Physiol. Rev. 84: 541-577, 2004; doi:10.1152/physrev.00029.2003
0031-9333/04 $15.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (74)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by JORIS, P. X.
Right arrow Articles by REES, A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by JORIS, P. X.
Right arrow Articles by REES, A.

Neural Processing of Amplitude-Modulated Sounds

P. X. JORIS, C. E. SCHREINER and A. REES

Laboratory of Auditory Neurophysiology, Division of Neurophysiology, K.U. Leuven, Leuven, Belgium; Coleman Laboratory, Department of Otolaryngology, Keck Center for Integrative Neuroscience, University of California at San Franscisco, San Francisco, California; and School of Neurology, Neurobiology, and Psychiatry, The Medical School, University of Newcastle upon Tyne, Newcastle upon Tyne, United Kingdom

ABSTRACT
I. TEMPORAL DIMENSIONS OF SOUND
II. HUMAN SENSITIVITY TO AMPLITUDE MODULATION
III. NEURAL RESPONSE MEASURES
IV. AUDITORY NERVE: BOTTLENECK TO THE CENTRAL NERVOUS SYSTEM
    A. Basic Auditory Nerve Properties
    B. Average Response Rate and Magnitude of Synchronization
    C. Phase of Synchronization
V. COCHLEAR NUCLEUS: PARALLEL CHANNELS
    A. Basic Organization of the CN
    B. AM Responses of Neuronal Types in the CN
VI. SUPERIOR OLIVARY COMPLEX: AN EXAMPLE OF TIME-TO-RATE CONVERSION
VII. THE NUCLEI OF THE LATERAL LEMNISCUS
VIII. AMPLITUDE MODULATION ENCODING IN THE INFERIOR COLLICULUS: A CENTER FOR CONVERGENCE
    A. Basic Organization of the IC
    B. Modulation Transfer Functions for IC Units: Synchronization
    C. Modulation Transfer Functions for IC Units: Average Rate
    D. What Determines the MTF Upper Limit in the IC?
    E. Is AM Encoded in the IC by Rate or Synchronization?
    F. Relationship Between AM Responses and Other Neuronal Properties
    G. Is Modulation Frequency Represented Topographically in the IC?
    H. Responses to Interaural Time Disparities in Modulation Envelopes
    I. Contribution of Nonlinearities
IX. AMPLITUDE MODULATION ENCODING IN AUDITORY THALAMUS AND CEREBRAL CORTEX
    A. Basic Layout of the Thalamocortical System
    B. Temporal Responses in the MGB
    C. Responses to AM in Primary Auditory Cortex: Synchronization
    D. Responses to AM in Primary Auditory Cortex: Average Rate
    E. Responses to AM in Primary Auditory Cortex: Influence of Modulation Parameters
    F. Differences of Temporal Coding Between Cortical Fields
    G. Cortical Mechanisms
    H. Temporal Coding of Complex Sounds
    I. Plasticity of Temporal Coding Properties in Auditory Cortex
X. NEUROPHYSIOLOGICAL AND PSYCHOLOGICAL STUDIES IN HUMANS
XI. CONCLUSION

    ABSTRACT
 Top
 Next
 References
 
Joris, P. X., C. E. Schreiner, and A. Rees. Neural Processing of Amplitude-Modulated Sounds. Physiol Rev 84: 541–577, 2004; 10.1152/physrev.00029.2003.—Amplitude modulation (AM) is a temporal feature of most natural acoustic signals. A long psychophysical tradition has shown that AM is important in a variety of perceptual tasks, over a range of time scales. Technical possibilities in stimulus synthesis have reinvigorated this field and brought the modulation dimension back into focus. We address the question whether specialized neural mechanisms exist to extract AM information, and thus whether consideration of the modulation domain is essential in understanding the neural architecture of the auditory system. The available evidence suggests that this is the case. Peripheral neural structures not only transmit envelope information in the form of neural activity synchronized to the modulation waveform but are often tuned so that they only respond over a limited range of modulation frequencies. Ascendingthe auditory neuraxis, AM tuning persists but increasingly takes the form of tuning in average firing rate, rather than synchronization, to modulation frequency. There is a decrease in the highest modulation frequencies that influence the neural response, either in average rate or synchronization, as one records at higher and higher levels along the neuraxis. In parallel, there is an increasing tolerance of modulation tuning for other stimulus parameters such as sound pressure level, modulation depth, and type of carrier. At several anatomical levels, consideration of modulation response properties assists the prediction of neural responses to complex natural stimuli. Finally, some evidence exists for a topographic ordering of neurons according to modulation tuning. The picture that emerges is that temporal modulations are a critical stimulus attribute that assists us in the detection, discrimination, identification, parsing, and localization of acoustic sources and that this wide-ranging role is reflected in dedicated physiological properties at different anatomical levels.


    I. TEMPORAL DIMENSIONS OF SOUND
 Top
 Previous
 Next
 References
 
Among the sensory systems, audition excels in its speed of operation. This is perhaps not too surprising, since our entire sense of hearing depends on the analysis of rapid changes in acoustic pressure at the two ears. The importance of the temporal dimension is manifest in many structural and functional specializations, starting at the peripheral sense organ and carried through the subsequent stages in the central nervous system. The striking sensitivity of auditory structures to temporal features of the acoustic stimulus has been observed since the earliest electrophysiological recordings, and this sensitivity is equally prominent in behavioral observations of humans and experimental animals.

Importantly, there are multiple temporal dimensions in acoustic stimuli (238). It is useful to distinguish "fine-structure" and "envelope" as two components of a time waveform. The fast pressure variations that determine the spectral content constitute the fine-structure. This fine-structure waxes and wanes in amplitude, and the contour of this amplitude modulation (AM) is the envelope. For example, the waveform of a speech utterance shows bursts of energy that correspond to phonemes. The temporal characteristics of these bursts carry much information (44, 108, 214, 265, 272, 281), but their dominant modulation frequency is rather slow (typically 3–4 Hz, extending up to ~20 Hz) vis-à-vis the temporal capabilities of the peripheral auditory system. Faster modulations of several hundred Hertz are also very common, e.g., in segments of voiced speech where they are perceptually associated with voice pitch. These envelope components arise from interactions between fine-structure components and are not present as such, i.e., as acoustic energy, in the waveform. This is illustrated by the superposition of two sine waves, equal in amplitude but separated by a small difference frequency (fd): constructive and destructive interference of the two components generate AM in the form of "beating" at frequency fd. The same principle extends to environmental sound sources, which commonly produce quasi-periodic signals consisting of a range of frequency components (harmonics) that are multiples of a fundamental frequency: the combination of even a limited number of components, e.g., within a cochlear filter, reconstitutes the fundamental frequency in the form of a temporal envelope modulation. (For examples of spectrograms, waveforms, and treatment of AM, see Refs. 99, 100, 177, 180, 302.)

The laboratory stimulus most often used in physiological studies of modulation is a pure tone (sinusoid) modulated by another tone. Figure 1A and Equation 1 represent the waveform [s(t)] of a tone with frequency fc (the carrier), whose amplitude is modulated by a lower frequency fm (the modulator) at a modulation depth m (0 <= m <= 1)

(1)
For fc >> fm the first term [1 + msin(2{pi}fmt)] is the time-varying amplitude or envelope.1 Using trigonometric identities, s(t) can be rewritten as the sum of three components at fc and at fc ± fm (the upper and lower sidebands)

(2)
This signal does not contain energy at fm (Fig. 1, A and B); the modulation in the time waveform is due to the interaction of the components in the signal which are separated by a difference frequency fm.



View larger version (32K):
[in this window]
[in a new window]
 
FIG. 1. A: superimposed waveforms of an unmodulated 1,000-Hz tone (thin line) and the same tone sinusoidally amplitude modulated (AM) (thick line) at 100% with a modulation frequency of 100 Hz, according to Equation 1. Dashed lines indicate the envelope. The amplitude is referenced to the peak amplitude of the unmodulated tone. B: idealized spectrum of the AM tone in A. At 100% modulation, the amplitude of the sidebands is half that of the carrier, i.e., a difference of 6 dB. C: average response in the form of a poststimulus time (PST) histogram of a nerve fiber to the signal shown in A (stimulus duration, 50 ms). D: spectrum of the PST histogram in C. The components at carrier frequency (fc) and fc ± modulation frequency (fm) indicate that there is phase-locking to the fine-structure of the stimulus waveform. The component at fm is prominently present in the response but is absent in the stimulus (B). The small circle on the ordinate indicates the average firing rate.

 

The sinusoidal AM stimulus is special because its envelope consists of a single sinusoidal component. In real-world stimuli, a range of modulations is usually present, which can be summarized by the modulation spectrum: the distribution of modulation energy for the whole waveform or for a selected band of carrier frequencies in the waveform. The subjectively experienced quality of a modulated signal depends on modulation frequency so that the modulation spectrum also defines different perceptual ranges (see sect. II).

The impetus in early physiological studies to use modulated stimuli (57, 62, 78, 183, 196) was a desire to go beyond the arsenal of simple stimuli (pure tones, clicks, noise) that dominated much of the research at that time. Somewhat similar to gratings in the visual domain, AM and frequency modulation (FM) were regarded as elementary features of natural stimuli, which could reveal dynamic properties of the auditory system not addressed with simpler stimuli. Interest in responses to AM was rekindled in the 1980s and 1990s through a convergence of different lines of research concerned with the "dynamic range problem," speech coding, pitch, and spatial localization of high-frequency sounds, among others. However, AM signals are more than just a convenient laboratory tool to study a diversity of psychophysical and physiological phenomena. The question that we are concerned with here is whether envelope processing is embedded in the auditory system, as may be expected from the ecological prominence of envelopes.

Given the theory of natural selection, one can assume that animals are well adapted to their specific acoustic environment and that the statistical structure of the natural auditory environment or the "acoustic ecology" (5) is reflected in the structure and function of the auditory system. Acoustic ecology can be defined as the total ensemble of sounds present in an animal's environment, from both inanimate as well as biological sources. Indeed, the auditory systems of acoustically specialized animals have revealed the existence of highly developed adaptations. Prominent examples include the echolocation system of bats (e.g., Ref. 61), the mating call detection system in frogs (245), and the alarm call differentiation in vervet monkeys (275). Common to these examples is that particular behaviors are elicited by a small set of signals with specific, fairly invariant acoustic properties. Characterization of these lower order physical sound attributes led to the discovery of special neuronal mechanisms.

Relatively little work has been done on the quantitative analysis of amplitude modulation statistics in acoustic ecologies and their consequences for neuronal processing. Not only overtly specialized but all animals are likely to exploit consistencies in statistical properties of the acoustical environment. Nelken et al. (194) found that low-frequency amplitude modulations are prominent in natural environments and are often coherent over different frequency regions, and may be exploited by the auditory system in signal detection. Voss and Clarke (288) computed temporal correlations of music passages and discovered a 1/f scaling relation over a few decades. More recently, Attias and Schreiner (6) decomposed music, speech, and animal vocalizations into narrow-band frequency channels and studied the statistics of the amplitude and phase distributions for each channel. They also found a distribution of modulation frequencies following a power-law, indicating that the amplitude modulation statistics of natural sound are non-Gaussian, cover a wide range of modulation frequencies, and scale universally, i.e., the frequency dependence is similar over different frequency ranges. Using a mutual information metric between stimulus and spike trains, it was also found (7) that neurons in the cat inferior colliculus are more efficient at coding naturalistic stimuli than nonnaturalistic stimuli: the information rate per spike for naturalistic stimuli was more than 60% higher than for nonnaturalistic signals. Similar results have been seen in the frog (232). This implies that neural processing is adapted and perhaps optimized for the encoding of naturally occurring modulation information.

Our purpose is to review physiological mechanisms that may be important for the processing of temporal envelope information. We first briefly highlight findings from human psychophysics to illustrate some of the perceptual consequences of AM, but we refrain from a more substantial discussion of the relationship between physiological mechanisms and perception. Rather, our focus is on a simpler and more basic question; namely, within what limits is AM encoded by single auditory neurons, and does the form of encoding suggest that the temporal envelope dimension is a fundamental organizing principle in the auditory system; in the manner that tuning to orientation, direction, or spatial frequency are considered fundamental in vision.

For reasons of space, only occasional reference will be made to the extensive research in bats or nonmammalian vertebrates, even though AM is often an important feature in echolocation signals (156, 198, 258) and their study often preceded the research reviewed here.


    II. HUMAN SENSITIVITY TO AMPLITUDE MODULATION
 Top
 Previous
 Next
 References
 
The ability of human listeners to detect and discriminate AM has been a topic of study since the 18th century. The earliest means of producing a sound with a fluctuating amplitude envelope was to mix two pure tones differing slightly in frequency to generate beats. Thomas Young and Helmholtz (287) both described the sensation of fluctuating amplitude experienced when listening to beats, and Helmholtz described the changing quality of the sound as the beat frequency was increased. He noted that "the ear easily follows slow beats of not more than 4 to 6 in a second" while at 30 beats/s it is still possible to hear the pulses of the tone, but it is no longer possible to hear them as distinct events and they have a "jarring and rough" quality.

With improvements in technology, subsequent studies (see Ref. 131 for historical review) extended and quantified these findings. Zwicker (324) showed that the threshold for detecting AM is very small at low modulation frequencies (threshold m ~2% for fm of 1–4Hzand fc of 1 kHz) and increases to a maximum with increasing fm (m ~5% for fm of 32 Hz and fc of 250 Hz; and for fm of 125 Hz and fc of 4 kHz). Above this maximum, threshold decreases and falls below the values obtained at low modulation frequencies, but in this range subjects perceive the carrier and the modulation frequency as distinct tones. Zwicker (324) also determined that, for a given carrier, thresholds for the detection of AM and FM measured in terms of their modulation depths coincide on the upper side of the maximum at a modulation frequency he termed the Phasengrenzfrequenz. This led Zwicker to postulate that above the Phasengrenzfrequenz [now termed the critical modulation frequency (CMF) (250, 263)] the carrier and sideband components are analyzed in different critical bands (auditory filters), and thus subjects are not sensitive to differences in the relative phase of the modulation components that enable them to distinguish AM from FM below the CMF. More recent evidence suggests that the situation is more complex than this (180, 263), but nevertheless, it appears that when listening to AM imposed on pure tone carriers detection may rely on spectral rather than temporal cues over some ranges of modulation frequency.

One means of eliminating spectral cues, and therefore estimating the temporal resolving power of the auditory system, is to measure the detection of sinusoidal modulation imposed on noise rather than a tonal carrier. The broadband spectrum of the noise precludes the listener detecting the individual spectral components of the stimulus spectrum. The use of such stimuli (9, 285) demonstrated that the relationship between threshold and modulation frequency (the psychophysical temporal modulation transfer function) is essentially a low-pass function with a 3-dB cut-off around 50 Hz and a slope of –4 dB/octave. The minimum threshold modulation depth is ~5% at low modulation frequencies (<10 Hz) where subjects detect the individual amplitude changes in the stimulus. The upper limit of modulation detection extends to ~2.2 kHz (68, 285, 286). As will become apparent later, this coincides with the very highest limits of neural phase-locking to envelopes obtained for some neurons in the auditory periphery in cats (Fig. 2, Refs. 127, 229) and exceeds the limit for phase-locking to envelopes in more central neurons. This raises questions as to the nature of modulation encoding in the central auditory system, even when one takes into account the encoding of modulations by changes in average rate that become apparent at more central sites.



View larger version (35K):
[in this window]
[in a new window]
 
FIG. 2. Amplitude modulation (AM) stimuli generate different percepts that encompass several regions of modulation and carrier frequencies. At very low fm, most strongly near 4 Hz and disappearing around 20 Hz, a sensation of fluctuation or rhythm is produced (hatched). The rate at which the temporal envelope of fluent speech varies is also typically 4 Hz (syllables/s). Fluctuation makes a smooth transition to a percept of roughness, which starts at ~15 Hz (bottom curved line), is strongest near 70 Hz, and disappears below 300 Hz (top curved line). Harmonic complex tones produce a pitch that corresponds to a frequency close to the fundamental frequency. However, the lower harmonics can be removed without affecting the pitch, resulting in "residue pitch" if fc and fm are chosen within the shaded region. Finally, small interaural time differences (ITD) can be detected between modulated stimuli to the two ears for a region of combinations of fm and fc that overlaps with the region for residue pitch (thick line). Note that these are regions in stimulus space where modulation is perceptually relevant, but the precise relationship of these percepts to physiological response modulation is usually unclear. For reference, the small dots indicate –10 dB cutoff values for modulation transfer functions (MTFs) of auditory nerve fibers (cf. Fig. 3C) [based on further analysis of data reported by Joris and Yin (127)]. Delineation of psychophysical regions is based on References 16, 104, 233, 278, 325. The ordinate is truncated at 4 Hz.

 



View larger version (20K):
[in this window]
[in a new window]
 
FIG. 3. Basic dimensions and manipulations in an AM signal and their effect on auditory nerve activity. The relationship of an auditory filter (curve) and AM spectrum are shown schematically for variations in modulation depth m (A), sound pressure level (SPL) (B), modulation frequency (fm) (C), and carrier frequency (fc) (D). For each manipulation, three measures of the responses of an auditory nerve fiber are shown: average rate (rate, dashed line), synchronization magnitude (R, solid line), and synchronization phase ({phi}, thin line).

 
Although, as Zwicker noted, a distinct pitch at the frequency of modulation is perceived when components of the stimulus spectrum can be resolved, weaker but nevertheless clear pitches are also perceived with modulations containing no resolved components (179, 233). Even modulations imposed on noise carriers can generate pitches which though weaker than those generated with tonal stimuli are able to support melody recognition (21, 22). Taken together, these findings demonstrate that the periodicity or residue pitches of some modulations must result solely from temporal analysis, but when resolved components are present, pitch salience is increased. Figure 2 schematically indicates the combinations of carrier and modulation frequencies resulting in the percepts of fluctuation, roughness, and residue pitch. (Sensitivity to binaural envelope disparities is discussed in section VI.)

Two competing models have been proposed to explain the detection of AM. The first consists of a bandpass filter and half-wave rectifier representing processing by the cochlea, followed by a low-pass filter (285). Some measure of the output of this filter provides the basis for the subject's response (see Ref. 181 for discussion). In essence, therefore, this model is an envelope detector. The second scheme models the detection of modulation by a bank of bandpass filters that are sensitive to different ranges of modulation frequency. A channel or filterbank model of modulation analysis was first proposed by Kay and colleagues (84, 132) on the basis of adaptation studies with FM and AM. Subsequently, the adaptation paradigm was questioned (178, 289), but the concept of a modulation filterbank persists because studies using different psychophysical paradigms have since reported findings which support the concept of modulation frequency tuning. Evidence for such selectivity comes from modulation masking experiments (8, 107), and modulation detection interference (MDI), a phenomenon in which the detection of AM is influenced by modulation at the same frequency but on a very different carrier (318). Dau et al. (36) invoked a model consisting of a modulation filterbank associated with each auditory filter to account for the detection and masking of sinusoidally amplitude-modulated narrowband noise. The latter model was extended (283) to account for comodulation masking release, another phenomenon, like MDI, that indicates some element of modulation waveform analysis across different carrier frequencies (96) (see Ref. 180 for review). Such across-frequency interactions between similar modulation envelopes are likely to contribute to grouping and the construction of auditory images (90). Despite different lines of evidence favoring some form of modulation filterbank, the concept remains controversial, and the experimental findings discussed above do not concur in their estimates of the bandwidths for these putative channels.


    III. NEURAL RESPONSE MEASURES
 Top
 Previous
 Next
 References
 
In neurophysiology, one can generally think of a variety of ways in which stimulus features may be "encoded" and processed (208), and it is not immediately obvious which aspects of neuronal behavior are the most relevant for the perceptual task at hand. With few exceptions, the response measures used in studies of AM are average discharge rate (i.e., the number of spikes evoked over several modulation cycles), or some measure of synchronization of the timing of action potentials to the envelope waveform.

The earliest single-unit studies of peripheral auditory neurons already reported synchronization to the fine-structure of tones, in the sense that discharges occur at a particular phase of the cyclical waveform. For example, auditory nerve fibers have the striking capability to "phase-lock" to low-frequency tones up to several kilo-Hertz [4–5 kHz in the cat (121), but the upper limit is species dependent (298)]. Phase-locking also occurs to stimulus envelope; both forms of phase-locking are immediately apparent in the poststimulus time (PST) histogram (Fig. 1C) to the AM stimulus of Figure 1A. The fine spacing of peaks at intervals of 1 ms indicates phase-locking to the 1-kHz fine-structure; the grouping into broader peaks spaced by 10 ms indicates phase-locking to the 100-Hz envelope. In contrast to the stimulus spectrum (Fig. 1B), the response spectrum (Fig. 1D) shows energy at fm, i.e., the AM signal is demodulated. Several cochlear nonlinearities with asymmetry between the positive and negative part of the transfer function can contribute to this demodulation, the most important being half-wave rectification in the relationship between displacement of hair cell stereocilia and receptor potential, and in the absence of negative firing rates (135). The response spectrum also shows a value at 0 Hz (Fig. 1D: small circle on ordinate) which equals the average firing rate. In this review, we will use the terms envelope synchronization and envelope phase-locking synonymously to refer to synchronization of the response to the stimulus envelope waveform, and use the term rate coding for changes in average firing rate during manipulation of the stimulus modulation parameters.

Different synchronization measures have been used, sometimes leading to seemingly contradictory statements. The most popular metric is "vector strength" R, also called synchronization index (81). Each spike is treated as a vector of unit length and with phase {theta}i between 0 and 2{pi} measured as the spike time modulo the stimulus period of interest. The x- and y-components of the vector are xi = cos{theta}i and yi = sin{theta}i. The n spikes in a response are combined by vector addition, and the resultant vector is normalized to n

(3)
which takes values between 0 and 1. R can also be obtained from the Fourier spectrum of the PST or period histogram, in which case it equals the magnitude of the first harmonic, normalized by the DC component (average firing rate). Phase {phi} is also retrieved with either technique. Statistical significance of synchronization is usually quantified with the Rayleigh test (23, 168).

As will become clear in this review, envelope coding at peripheral stages is predominantly temporal rather than rate-based, but these two aspects of the response progressively reverse in prominence at successive stages along the neuraxis. Because both average firing rate and synchronization may contribute to the impact that a neuron has on its postsynaptic targets, many experimenters have combined the two metrics by multiplication (nR, with n = total number of spikes, variously called "modulated rate,""phase-locked rate,""synchronized rate"), or, equivalently, by reporting the unnormalized Fourier component, expressed in spikes per second (33, 141, 224, 314). Recently, some authors have used 2nR2, which is also the statistic used in the Rayleigh test of significance (157, 266). Finally, envelope synchronization is often reported as a gain value (in dB), defined as 20 log10 (2R/m), which relates output directly to input and facilitates comparison across studies which use different modulation depth m.

The vector strength metric, often under different names (e.g., selectivity index), has found general use in the quantification of periodic neural signals in sensory and even motor physiology (43). Despite its pervasive use, it is important to be aware of its limitations. First, the metric gives only the degree to which the response is modulated to the frequency at which R is calculated (we use the subscripts m and c to indicate modulation frequency and carrier frequency, respectively). It does not capture the full harmonic content of the cycle histogram at fm so that histograms with a rather different shape can result in the same Rm value (see Ref. 127 for an example). An Rm value of one only results from perfect alignment of all spikes at one phase, but a value of zero does not necessarily indicate a random distribution of spike times. For example, if spike times are equally divided between phase {phi} and {phi} + {pi}, the average vector has zero magnitude. Thus a low vector strength should not necessarily be equated to absence of temporal structure in the spike train, but rather is an indication of lack of energy at the frequency for which R was calculated. Second, high R values indicate that spikes are distributed over a narrow time window relative to the period of interest, but such values do not imply a faithful replica of the stimulus modulation waveform in the probability of discharge. As a reference, a PST histogram that closely resembles a half-wave rectified sinusoidal AM signal with m = 1 gives R = 0.5. Higher R values are obtained when the period histograms are more "peaked" than the original sinusoidal modulation signal. Third, R is a compressive metric and is therefore sometimes graphed on an expansive scale (120). Finally, a problem at a more general level is that calculation of Rm requires knowledge of fm, a strategy that the brain cannot use. It may be argued that a "clock" signal is available in the form of the highly synchronized discharge of some types of cochlear nucleus neurons, which could be used to perform a vector strength type calculation in which degree of synchronization is translated into average firing rate, e.g., as suggested in the periodicity extraction scheme by Langner (150). Some authors have used interspike interval or autocorrelation analysis to bring out the time structure of responses that may be more relevant to the operations performed by the central processor (27, 85, 123, 141, 226, 301). In this context it is important to remember that the envelope of most natural sounds is not strictly periodic in the first place and that the raw acoustic waveform is not available as such to the auditory nervous system. Rather, this waveform is decomposed into a multitude of waveforms by virtue of cochlear narrowband filtering (reviewed in Refs. 206, 234). This process profoundly affects the modulation spectrum present in each frequency channel, which is thus determined jointly by the spectrotemporal properties of the acoustic stimulus and of those of the peripheral filtering process (for illustrations, see Ref. 286). In summary, while most studies discussed here have used deterministic stimuli with periodic envelopes and have applied the R metric, it is important to keep in mind that, for natural stimuli, the relationship between neural response modulation and stimulus modulation is more complex and that the neural operations by which the central processor extracts envelope information likely differ fundamentally from the analytical ways of the experimenter.

The bulk of studies on AM coding have used the same stimulus strategy, which is to tailor the stimulus to the cell under study. Early work (78, 183) established that peripheral neurons display envelope phase-locking only if the stimulus energy falls within a cell's tuning curve. For example, Javel (114) shows the lack of response of an auditory-nerve fiber tuned to 800 Hz to a high-frequency AM complex (fc = 5 kHz) modulated at 800 Hz. Most studies using AM stimuli with tonal carriers match fc to the neuron's characteristic frequency (CF, frequency of lowest rate threshold), and usually also optimize other stimulus parameters for the cell under study. The complementary approach, in which the population response of cells at many different CFs is studied to a limited set of stimuli, has been little used (27, 293).

A description employed both acoustically, psychophysically, and physiologically, is the modulation transfer function or MTF, which is response modulation relative to input modulation as a function of modulation frequency. Schroeder (257) predicted more than 20 years ago that the concept of MTF would increase in importance because the modulation rather than the carrier usually contains the important information and because highly nonlinear transmission systems often exhibit a quasi-linear response to modulation. Physiologically, MTFs are usually measured as the phase-locking to AM tones of fixed m and fc presented at consecutive modulation frequencies, but other methods have been employed (see sect. IXB). Marked effects on average rate occur so that a distinction between temporal MTF (tMTF) and rate MTF (rMTF) is usually drawn.


    IV. AUDITORY NERVE: BOTTLENECK TO THE CENTRAL NERVOUS SYSTEM
 Top
 Previous
 Next
 References
 
A. Basic Auditory Nerve Properties

Activity in the auditory nerve represents both the output of the cochlea and the input to the central nervous system, and studies of envelope phase-locking have been conducted both to gain more insight into cochlear processing and to define the limits within which the central processor has to operate. Compared with optic and peripheral somatic nerves, the auditory nerve is highly uniform both morphologically (in caliber and branching pattern) and physiologically. We only discuss type I auditory nerve fibers, which form the bulk of the nerve, since near to nothing is known about the physiology of the unmyelinated type II fibers. Because each type I nerve fiber contacts only a single inner hair cell, its activity can, to a first approximation, be understood from basilar membrane motion at a single point in the cochlea followed by further signal modifications by the inner hair cell and hair cell/nerve synapse (76, 136, 137, 243). The most salient properties are 1) sharp V-shaped tuning to a narrow range of frequencies; 2) a limited dynamic range of ~20–30 dB, reflected in an sigmoidal rate-level function; 3) adaptation of firing rate to sustained stimuli, rather modest compared with adaptation of peripheral nerve fibers in other systems; and 4) phase-locking to low-frequency pure tones (<4–5 kHz in the cat).

Auditory nerve fibers show a bimodal distribution of spontaneous rate (SR), on the basis of which several classes of fibers are defined that differ in a number of properties (158, 246, 305). Fibers with high SR (>18 spikes/s), which in cat form ~60% of the total population, have low thresholds and limited dynamic range. Fibers with medium and low SR have higher thresholds and tend to have "sloping" saturation, i.e., their rate-level functions show a decrease in slope at ~30 dB above threshold but do not fully saturate. Also, low-SR fibers show less adaptation than high-SR fibers (230). Differences between the SR classes have been documented mostly with pure tone and spectrally complex stimuli, but AM stimuli have revealed response differences in the time domain as well. We first discuss how the basic AM parameters m, sound pressure level (SPL), fm, and fc (Fig. 3) influence synchronization and average rate, then describe the response phase.

B. Average Response Rate and Magnitude of Synchronization

When a tone is presented at a fiber's CF at a fixed suprathreshold level and is modulated with increasing depth, the nerve fiber shows a monotonic, saturating increase in synchronization Rm (Fig. 3A). Although Rm increases with m in absolute terms, synchronization magnitude decreases in relative terms, i.e., the gain (response modulation relative to stimulus modulation) decreases (127). The gain can be as large as 10 dB for m of 10% and decreases to values near 0 dB for m of 100%.

Responses to AM as a function of stimulus intensity have been studied extensively in a variety of animals (guinea pig, Ref. 33; chinchilla, Ref. 114; cat, Refs. 127, 135, 294; gerbil, Ref. 270). The rate-level function with AM shows only small differences relative to the function obtained with an unmodulated carrier wave (127, 270). The synchronization-level (Rm vs. SPL) function shows a stereotypic nonmonotonic shape; a maximum is reached at low suprathreshold levels, with a decrease in Rm for further increases in SPL (Fig. 3B). It is easy to see how this relationship is expected from the compressive relationship between firing rate and SPL, especially when the modulation depth m is small; maximal modulation of firing rate should occur for amplitude changes centered on the steepest part of the rate-level function, between firing threshold and saturation. At high SPLs, amplitude fluctuations should not translate into fluctations in firing rate because firing rate is saturated. Qualitatively the synchronization-level function does indeed show the expected nonmonotonic shape. However, compared with quantitative predictions based on the rate-level function, the observed synchronization shows 1) larger maximal R values, 2) a maximum that is displaced towards a higher SPL, and 3) higher synchronization values at high SPLs and a shallow downward slope. These deviations are predicted when adaptation over a short time scale is taken into account (33, 270, 311). Basically, adaptation boosts the coding of stimulus changes so that the operating range over which changes in SPL result in changes in firing rate is larger for responses to AM than for steady-state responses to pure tones.

There are systematic differences in AM responses of the different SR classes of auditory nerve fibers. One descriptor commonly used to compare envelope phase-locking across cell populations is the maximal R value of the synchronization-level function (Rmax). Cells with low and medium SR tend to have higher Rmax values than cells with high SR, and this difference is particularly marked at low CFs (<5 kHz) (127, 294). However, the difference in synchronization between these different auditory nerve classes strongly depends on the synchronization metric used (33, 127, 183, 295). In contrast to earlier reports, Cooper et al. (33) concluded that fibers with high SR showed larger envelope synchronization values than low SR fibers. Their result is less of a conflict than it appears if it is taken into account that the metric used by these authors was (unnormalized) modulated rate rather than Rm, that the average discharge rate of fibers with low SR is generally lower than that of fibers with high SR (158), and that the sample of Cooper et al. is biased to high CFs (>8 kHz).

Synchronization is robust in high SR cells at low SPLs and in low and medium SR cells at mid and high SPLs (294). However, the different fiber populations reach maximal synchronization at the same level relative to rate threshold (33, 294). Low SR fibers have a larger dynamic range over which significant modulation is present (33), lending further support to the general hypothesis that these fibers are particularly important for hearing at high SPLs.

The narrow bandpass filtering by the cochlea limits the range of modulation frequencies transmitted by nerve fibers. As schematized in Figure 3C, increase of fm causes the sidebands in the stimulus spectrum to move away from fc. If fc is centered at the CF of the fiber studied, the energy in the sidebands is increasingly attenuated, resulting in a loss of modulation at the output of the peripheral filter. The response as a function of fm is usually referred to as the modulation transfer function (MTF) and again one should clearly distinguish effects on average rate (rMTF) from effects on synchronization to fm (tMTF). The rMTF is usually flat but may show some decrease in rate with increasing fm, particularly in low-SR fibers (127). In contrast, tMTFs all have a low-pass shape (guinea pig, Ref. 203; cat, Ref. 127; rat, Ref. 186; Fig. 3C). These functions are smooth and do not show any structure related to harmonic ratios, i.e., whether or not the AM components (fc and the two sidebands) are integer multiples of fm is inconsequential. The absolute bandwidth of frequency tuning curves, e.g., at 10 dB above threshold, increases with CF (59, 86, 230), and the cut-off frequency of tMTFs shows a concomittant increase with CF (Fig. 2). At very low CFs (a few hundred Hz), a tMTF cut-off frequency can often not be determined because of the broad frequency tuning. Interestingly, for CFs above ~10 kHz, the increase in cut-off frequency is not commensurate with the increase in bandwidth of frequency tuning at these high CFs. This presumably reflects temporal filtering at the hair cell/synaptic level rather than spatial filtering at the mechanical level (86, 127). The highest modulation frequency at which significant envelope phase-locking is observed, in high-CF nerve fibers, is ~2 kHz (127, 229). A less marked feature of many tMTFs is a shallow positive slope in the low-frequency skirt (94, 127). According to Cooper et al. (33), this slope tends to become steeper at high SPLs, consistent with models that include effects of response adaptation (311).

Clearly, the extent of envelope phase-locking in the auditory nerve is sufficiently wide to encompass psychophysical existence regions (Fig. 2). Javel and Mott (115) attributed the disappearance of residue pitch at fc >5 kHz to increased sharpness of tuning of high-CF fibers (59, 230). However, while bandwidth limitations may contribute to the upper fm limit of ~800 Hz, they do not explain the disappearance of residue pitch altogether.

The dependence of envelope phase-locking on carrier frequency, relative to CF, has not been explored in great detail (114, 127, 295). It merits further study because the available data suggest an important effect. If fc is moved away from CF, the synchronization-level function shifts to higher SPLs. Consequently, for moderate to loud stimuli, strongest phase-locking is present in fibers with CFs that differ from fc, provided that the stimulus is able to excite these fibers (Fig. 3D). Thus, for all but the weakest signals, the representation of stimulus envelope may be carried mainly by fibers tuned to frequencies that differ from fc.

C. Phase of Synchronization

Few studies reported phase or latency data for AM stimuli. For a given fiber, the phase of response to the envelope shows a slight lead with increasing SPL (127) and, at fixed suprathreshold levels, varies little with changes in carrier frequency (122). In contrast, response envelope phase increases nearly linearly with fm. The slope of this relationship has been used as an estimate of the total delay accrued between the acoustic stimulus and the site of recording, similar to earlier such measurements on responses to pure tones in low-CF fibers (4). The linearity of the phase-fm relationship indicates that it is mostly determined by fixed mechanical and neural transmission delays. Consistent with other delay or onset latency measures, the values obtained vary systematically and inversely with CF (127, 294), as expected from the travelling wave on the basilar membrane which starts at the base of the cochlea and reaches its more apically located maximum after some delay. However, many processes contribute to the total delay (242, 244). Gummer and Johnstone (93) scanned envelope delay of nerve fibers near their tuning curve threshold, using AM complexes of fixed fm and low modulation depth over a large range of carrier frequencies. They found a delay component that was large for carrier frequencies near CF and smaller in the tuning curve tail, and the authors provide several arguments to suggest that this component reflects a delay associated with cochlear bandpass filtering.

The preceding descriptions are based on synchronization of the response to the envelope frequency. Again, it is important to bear in mind that such descriptions are incomplete. The shape of cycle histograms can depart severely from the shape (usually sinusoidal) of the stimulus envelope, particularly at high SPLs and at large modulation depths. Therefore, the spectrum of the cycle histogram typically consists of a number of spectral peaks, of which the peak at fm is only one, and not necessarily the largest, component (135, 294). Also, the most salient temporal information present in the discharge patterns is not necessarily revealed by calculation of synchronization to stimulus components. For example, robust phase-locking to fm does not imply that the most common interspike intervals are at the period of fm: for envelope periods of several tens of milliseconds multiple spikes occur per envelope cycle, while periods shorter than a few milliseconds succeed each other too fast to allow a spike in every envelope cycle. An interesting discrepancy between envelope phase-locking and dominant interspike intervals is in "pitch-shift" effects of changes in fc (27, 114): phase-locking to fm stays roughly constant, while the most dominant interspike interval shifts in a direction which parallels the subjective pitch of the AM stimulus.

In summary, envelope information is abundantly available in auditory nerve discharges in temporal form. Each nerve fiber transmits envelope information over a stereotypical range of modulation frequencies, carrier frequencies, and intensities. These ranges are consistent, at least at a qualitative level, with known auditory nerve properties of frequency tuning, compression, adaptation, and spontaneous activity, and computer models incorporating these properties reproduce the main features of AM responses (105, 117, 271). The main way in which the auditory nerve is a bottleneck to the central nervous system for AM signals is in the extent of modulation frequencies over which synchronization occurs. This range cannot be enlarged centrally, except possibly for frequencies at which fine-structure information is available (<4–5 kHz), because AM arises from a time-domain interaction of stimulus components.


    V. COCHLEAR NUCLEUS: PARALLEL CHANNELS
 Top
 Previous
 Next
 References
 
The key dynamic properties of cells in the cochlear nucleus (CN) and the differences with the auditory nerve were described in the pioneering studies of Møller (183, 184, 187): enhanced gain over a large dynamic range, low levels of distortion to sinusoidal modulation, i.e., a rather faithful tracking of the sinusoidal envelope, presence of bandpass tMTFs particularly at high SPLs, and similar tMTF shape for different forms of modulation (sinusoidal AM of pure tone or noise carriers, noise-modulated tones, noise-modulated noise). However, the marked diversity of CN cells supports a variety of AM response patterns, evident in the earliest CN studies (78), and necessitates a discussion of AM responses per cell type rather than global statements about the CN or its subdivisions. Limited attempts have been made (not reviewed here) to uncover the mechanisms underlying the auditory nerve to CN transformations, for gain enhancement in particular (72, 228, 296, 323).

A. Basic Organization of the CN

An important insight that emerged from study of the CN with simple stimuli was that a limited number of response patterns or "classes" could be discerned and that these patterns are related to morphological cell classes (18, 202). Especially through the technique of intracellular labeling, many of the structure-function relationships that were surmised earlier on the basis of indirect evidence were solidified. The physiological diversity of these different cell types, combined with the diversity of their central projections (297), led to the concept of functionally specialized, parallel pathways (for review, see Refs. 26, 69, 112, 227, 319).

Briefly, three subnuclei are defined on the basis of the bifurcation pattern of the auditory nerve. The anteroventral cochlear nucleus (AVCN) has three principal cell types. Stellate cells project to the inferior colliculus (IC) and respond to tones with a burst of regularly spaced action potentials called a "chopper" pattern. Bushy cells, which derive their name from their small and confined dendritic tree and which are remarkable for their strong inputs from the auditory nerve, occur in two types. Spherical bushy cells receive large calyceal auditory nerve terminals (end bulbs of Held) and show responses similar to auditory nerve fibers and are therefore called "primary-like" (PL). Their main projection is to binaural nuclei in the superior olivary complex. Globular bushy cells also receive large nerve terminals in the form of modified end bulbs of Held, and show a characteristic "primary-like-with-notch" (PLN) pattern in response to tones. Their main projection is contralaterally in the superior olivary complex where they give rise to giant calyceal endings on cells in the medial nucleus of the trapezoid body, which are inhibitory on binaural cells in the lateral superior olive (LSO). The posteroventral cochlear nucleus (PVCN) contains octopus cells that project to the ventral nucleus of the lateral lemniscus (VNLL) and show pure onset (Oi) responses to tones. It also contains inhibitory multipolar cells that project to the dorsal cochlear nucleus (DCN) and the contralateral CN and which show onset-chopper (Oc) responses. The principal neurons of the DCN are the fusiform cells, which project to the IC and display remarkably nonlinear spectral properties. These properties arise through local inhibitory interactions with interneurons in DCN (type II cells) and presumably with the Oc cells (195).

The classification of CN cells is mostly based on subjective criteria, which contributes to discrepancies in conclusions of different studies. Although there is by no means an agreed upon "task" for each of these circuits, it is clear that each cell type performs a different analysis of the auditory nerve input and conveys its output to a different part of the auditory brain stem. The bushy cells are clearly involved in binaural analysis important for spatial localization of sounds. Stellate cells are able to represent vowel spectrum over a wide range of intensities. Fusiform cells integrate somatosensory and spectral information and may signal important auditory events. Responses to AM offer another illustration of how CN cell types differ in their processing of auditory nerve input.

B. AM Responses of Neuronal Types in the CN

The relationship between AM coding and physiological cell class, as defined by the response to pure tones, was first examined by Frisina and co-workers in the gerbil (70, 71). These authors found that envelope phase-locking in ventral cochlear nucleus (VCN) was generally enhanced relative to the auditory nerve, and they described a hierarchy of enhancement that correlated with the precision of timing of response onset to pure tones. Of the four physiological VCN cell types studied, cells with well-timed onset responses showed the highest gains, followed by choppers, PLN, and PL. The decrease in synchronization with increasing intensity is less than in the auditory nerve and in some cell types depends on fm, resulting in a peaked or tuned tMTF at high SPLs. Particularly these latter two response features, extended dynamic range and selectivity to fm, received much attention in later studies (Fig. 4). The general behavior of synchronization as a function of SPL and fm described by Frisina et al. (71) was confirmed and extended to other cell types in many subsequent studies, even though not all studies agree on the exact hierarchical ordering and the discreteness of the ordering.



View larger version (14K):
[in this window]
[in a new window]
 
FIG. 4. Two important transformations between the auditory nerve (dashed lines) and cochlear nucleus (solid lines). A: enhancement of envelope synchronization and extended dynamic range is present in many cell types. B: some cell types show bandpass tMTFs.

 

Some of the most interesting responses were observed in cells with chopper responses. Choppers are temporally tuned for fm, as reflected in bandpass tMTFs particularly at higher SPLs (gerbil, Ref. 71; cat, Ref. 229). A small percentage of choppers also shows bandpass tuning in their rMTFs (228). The fm causing the strongest synchronization is called the temporal best modulation frequency (tBMF). The occurrence of bandpass tuning is of obvious importance to the concept of a "modulation frequency filter bank" or "modulation channels" (131). This concept has some popularity, particularly in the psychophysical literature (see sect. II), and will be taken up again in our discussion of IC and auditory cortex.

As mentioned, "chopping" reflects the intrinsic tendency to fire a regular burst of spikes at the beginning or sometimes entire duration of the stimulus, and these cells have therefore been viewed as resonators or intrinsic oscillators (150). SPL-dependent bandpass tuning and oscillatory responses were also described earlier by Møller (187) in the rat. In a subclass of cells in the guinea pig, the intrinsic behavior is invariant with SPL and affects the temporal characteristics of the response to nondeterministic stimuli (301). There is a possibility that the intrinsic properties make these cells function as envelope filters that decompose the envelope spectrum, much in the way that inner hair cells in the turtle cochlea decompose stimulus frequency by virtue of an intrinsic electrical resonance mechanism (63). Several authors have therefore looked for correlations between AM and intrinsic oscillation behavior. Frisina et al. (71) compared the frequency of chopping with the tBMF for a sample of sustained choppers in VCN. The tBMFs spanned a range (170–700 Hz) roughly similar to the range of chopping frequencies (80–520 Hz), but the correlation between the two response properties was poor. There was a suggestion of interaction between chopping frequency and fm in that the tBMF only rarely exceeded the chopping frequency, which therefore seemed to set an upper bound. In a subpopulation of choppers (sustained choppers with a well-defined tBMF between 150 and 450 Hz), Rhode and Greenberg (229) noted a tendency for maximal envelope synchronization when fm matched the discharge rate to a tone at the same intensity.

A strong and more general relationship, not restricted to choppers, was found by Kim et al. (141) in DCN/PVCN neurons of the unanesthetized decerebrate cat. In this study, the "intrinsic oscillation" frequency of a neuron was measured from the autocorrelation of its responses to pure or AM tones. Frequency of intrinsic oscillation and BMF were well correlated (r = 0.86) with regression close to the diagonal of equality, and the frequency ranges were roughly similar (50–500 Hz) to those reported for VCN choppers (71, 229). Importantly, the remarkably good correlation arose from the pooling of different cell groups, rather than from a within-population trend, complicating any AM-coding scheme based on intrinsic oscillators. At least five cell types contributed to the data, surprisingly also including auditory nerve fibers.

Besides choppers, the other main constituent cell types of the AVCN are the two types of bushy cells with PL and PLN responses. As expected from their powerful auditory nerve inputs, PL and PLN cells resemble auditory nerve fibers in many regards, and indeed, their Rmax and tMTF cut-off frequency distributions at different CFs largely overlap that of the auditory nerve (129, 229). For PL cells this overlap is virtually complete, but for CFs below ~7 kHz, PLN cells synchronize much better to envelopes than auditory nerve fibers. At very low CFs some bushy cells have enhanced synchronization to both fine-structure and envelopes (124).

Comparisons of cell types across studies illustrate that one has to be careful with simple characterizations to multi-dimensional stimuli like AM. As remarked by Rhode and Greenberg (229), a single response parameter is not sufficient to characterize envelope synchronization. The highest gains found in choppers exceed those of PL cells but are mostly at fm values below 500 Hz (129, 229) so that at higher modulation frequencies PL cells are superior to choppers in transmitting envelope information. Consequently, the hierarchy of modulation enhancement strongly depends on the range of modulation frequencies of interest and also, as pointed out earlier (see sect. IVB), on the chosen metric (266). Rather than providing an exhaustive listing of response parameters for all cell types, we emphasize here the properties by which different CN cells stand out most from the auditory nerve and from each other. For chopper cells this is the bandpass tuning of tMTFs; for bushy cells it is the extent of the tMTF (high cut-off frequencies).

The two main response types found in PVCN are onset (Oi and Oc), associated with the octopus and multipolar morphology, respectively. Both cell types show remarkable envelope phase-locking, in line with the precision of their onset response to pure tones. Oc cells have been particularly well-studied (cat, Refs. 125, 140, 228, 229). These cells show some of the highest gains, over the widest fm and SPL range, which is why Kim et al. (140) proposed that these cells have a special role in the extraction of the fundamental frequency of voiced speech sounds. Moreover, large changes in fc and even use of a wideband carrier have little effect on magnitude of synchronization (228). Oi cells have been studied very little, but the few existing data reveal interesting properties, in line with their biophysical specializations (199). These cells show the highest gains of all CN cells, reaching Rm values near 1 (228). Moreover, their tMTFs are high in gain and invariant for SPL, but all-pass. The rMTFs of these two classes of onset cells also appear unique among CN cell classes because they can be sharply bandpass. It is unclear whether these bandpass rMTFs can sustain a rate code for modulation frequency: among the handful of Oi cells reported, the range of rBMFs was only 350–450 Hz.

Onset units have wider frequency tuning than auditory nerve fibers (80, 118, 231). They therefore provide a test case of the suggestion that is sometimes made that tMTF bandwidths may broaden centrally by virtue of convergence of cells tuned to different CFs (180, 286). However, this would require phase information on the individual spectral components of the AM stimulus, and for frequencies above the pure-tone phase-locking range (>4–5 kHz in cat), such information is not available to the central processor. Indeed, despite their wider frequency tuning, tMTF cut-off frequencies of onset cells do not exceed the limits imposed by the auditory nerve (125, 228, 229).

The DCN has traditionally been regarded as a part of the CN which has poor timing properties (79, 82, 154), and initial studies with AM seemed consistent with that view (horseshoe bat, Ref. 282; kangaroo rat, Ref. 29). However, more recent studies emphasized good AM coding in DCN (cat, Refs. 125, 229, 254; guinea pig, Refs. 322, 323) and specific roles for DCN in temporal processing have been proposed [pitch (150); extraction of envelopes in background noise (73) or at high SPLs (229)]. The tMTFs are typically low-pass or bandpass and differ from other CN cell types in their upper fm limit of phase-locking which never exceeds 800 Hz. To some extent, differences between studies reflect the complexity of this nucleus, both in diversity of response types and in nonlinearity of behavior (319). Oc cells can be found in deep DCN and may explain some of the high-gain responses to AM reported for DCN. Second, simple measures like maximum synchronization or cut-off frequency do not reveal the full complexity of DCN responses and give DCN a misleading "AVCN-like" appearance. Even though DCN interneurons and principal neurons can display high gain responses to AM stimuli, their response often shows strong nonmonotonicities, not only in average rate but also in magnitude and phase of envelope synchronization (125, 254, 322). These nonmonotonicities are likely a manifestation in the temporal domain of the intricate inhibitory and excitatory interactions that have been invoked to explain similar complexities in the frequency domain.

A preliminary study by Frisina et al. (73) in the chinchilla suggests that envelope synchronization of DCN neurons can be enhanced by background noise, but more systematic data and comparisons with auditory nerve and VCN are needed to evaluate whether DCN neurons are special in this regard. Rhode and Greenberg (229) studied envelope synchronization in the presence of wide-band noise in different CN cell types of the cat and found that in general there is remarkable preservation of envelope synchronization even at high noise levels.

As in the auditory nerve, few authors have systematically reported envelope phase data. Cells in the CN also show a linear increase in envelope phase with increasing fm, but the slopes are systematically steeper than in the auditory nerve, consistent with additional time delays required for conduction and synaptic transmission (125, 129). Delays calculated from response envelope phase are more tightly distributed and shorter than traditional measures of latency based on response onset (94, 185), as is the case for delay estimates based on fine-structure (65). Most CN studies of AM coding considered only tMTF magnitude and not phase when trying to infer functional consequences of AM tuning for the perception of natural stimuli. Delgutte et al. (40) used both tMTF magnitude and phase of responses in auditory nerve, CN, and IC to predict responses of the same neurons to speech utterances (see below) and stressed the importance of incorporating phase, particularly at very low modulation frequencies, to make succesful predictions.

To summarize, the CN shows marked differences in AM coding relative to its auditory nerve input: wider dynamic ranges, higher gains, appearance of bandpass tMTFs, and less sensitivity to the presence of background noise. Furthermore, different cell types show marked diversity in their synchronization and average rate behavior to AM signals. A simple hierarchical ranking does not do justice to the differences among cell types and depends on whether one emphasizes Rmax values (71, 295), breadth of the tMTF (129), or statistical reliability of phase-locking (266). As in the nerve, AM coding is almost entirely temporal: bandpass rMTFs occur rarely, in a few cell classes.

Our knowledge of CN responses to AM is still lacking in many ways and basically does not go far beyond phenomenology. Perhaps the most pressing question is the robustness and relevance of bandpass tMTFs, which many investigators regard as genuine envelope filters. More studies are needed to determine how invariant tMTF tuning is with stimulus parameters, what range of tBMFs is spanned at different CFs, and whether tMTF tuning indeed supports filtering of envelope energy in natural stimuli. Such information would be particularly valuable for carrier frequencies in the range of phase-locking to fine-structure (<4–5 kHz), which is poorly sampled in most studies in small animal species with higher-frequency hearing than humans. There are other lacunae. Data are sparse for certain cell types, most notably pure onset units in PVCN. In most studies, the stimulus is optimized for the cell under study; there is a need for population studies in which the response to a limited set of stimuli is examined for an entire population. Finally, there is currently no evidence for any kind of within-class topographic organization (e.g., within an isofrequency strip) of AM response properties in the CN.


    VI. SUPERIOR OLIVARY COMPLEX: AN EXAMPLE OF TIME-TO-RATE CONVERSION
 Top
 Previous
 Next
 References
 
Part of the CN output is directed toward nuclei in the superior olivary complex (SOC). This is an amalgam of large and small nuclei some of which take part in well-studied circuits whose function is in feedback to the periphery (middle ear reflex and the olivocochlear efferent systems) or in the extraction of binaural differences important in spatial hearing. The preceding and following sections illustrate that, with some notable exceptions, envelope coding in the CN is largely temporally based while at the level of the IC partial conversion to a rate code is apparent. In our discussion of SOC physiology we highlight one aspect of these circuits: the conversion of an envelope time code to an average rate code.

The duplex theory of sound localization holds that the azimuthal spatial position of low-frequency signals is determined primarily on the basis of the minute differences in time at which the acoustic waveform reaches the two ears, interaural time differences (ITDs), while high-frequency signals are localized on the basis of interaural SPL or level differences (ILDs). This classical psychophysical theory seems to be embodied anatomically and physiologically in two binaural circuits in the SOC of most mammals. The circuit centered on the medial superior olive (MSO) detects ITDs and contains primarily low-frequency cells. Another circuit, centered on the lateral superior olive (LSO), detects ILDs and has a bias towards high CFs. The detailed physiology of these circuits and their afferents is beyond the scope of this review (see Refs. 279, 312, 316).

Starting in the mid-1970s, a number of investigators reported that humans can reliably discriminate ITDs of high-frequency signals at thresholds approaching those for low-frequency signals, i.e., <20 µs, provided that the signals are not pure tones but have a time-varying envelope, as in AM sounds with the parameters illustrated in Figure 2. Clearly, subjects can detect the on-going envelope differences that occur when complex stimuli are delayed between the two ears with high precision. Physiological studies in the IC of cat (317) and rabbit (12) provided evidence for ITD sensitivity to AM signals but indicated that this sensitivity was probably generated at a lower level. Subsequent recordings in the SOC indeed revealed cells that were sensitive to interaural delays of AM signals, and this ITD sensitivity could be understood from the binaural interactions known to occur in these nuclei and the AM coding properties of their afferents.

In the MSO, ITD sensitivity to AM signals is generated by a multiplicative, cross-correlation type operation. These cells behave as coincidence detectors, which has been particularly well-documented for low-frequency signals (81, 126, 313) but holds for modulated signals as well. The average firing rate of high-CF MSO cells to AM signals varies with ITD (Fig. 5A). Moreover, the optimal ITD is predicted from the phases measured from the monaural response to an ipsi- or contralaterally presented AM signal: the firing rate is high when the envelope signals from the two ears arrive in-phase at the site of convergence (10, 122, 313).



View larger version (26K):
[in this window]