2012 IEEE International Symposium on Circuits and Systems 2012
DOI: 10.1109/iscas.2012.6271438
|View full text |Cite
|
Sign up to set email alerts
|

Real-time speaker identification using the AEREAR2 event-based silicon cochlea

Abstract: This paper reports a study on methods for real-time speaker identification using the output from an event-based silicon cochlea. These methods are evaluated based on the amount of computation that needs to be performed and the classification performance in a speaker identification task. It uses the binaural AEREAR2 silicon cochlea, with 64 frequency channels and 512 output neurons. Auditory features representing fading histograms of inter-spike intervals and channel activity distributions are extracted from th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
27
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
3
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 31 publications
(27 citation statements)
references
References 10 publications
0
27
0
Order By: Relevance
“…Understanding the fidelity of the reconstructed audio from the spike trains of the AEREAR2 cochlea spikes can offer insights into the results from various auditory tasks that use the hardware spiking cochlea such as speech recognition (Verstraeten et al, 2005 ; Uysal et al, 2006 , 2008 ; Chakrabartty and Liu, 2010 ), speaker identification (Chakrabartty and Liu, 2010 ; Liu et al, 2010 ; Li et al, 2012 ) localization (Finger and Liu, 2011 ; Liu et al, 2014 ), and sensory fusion (Chan et al, 2012 ; O'Connor et al, 2013 ).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Understanding the fidelity of the reconstructed audio from the spike trains of the AEREAR2 cochlea spikes can offer insights into the results from various auditory tasks that use the hardware spiking cochlea such as speech recognition (Verstraeten et al, 2005 ; Uysal et al, 2006 , 2008 ; Chakrabartty and Liu, 2010 ), speaker identification (Chakrabartty and Liu, 2010 ; Liu et al, 2010 ; Li et al, 2012 ) localization (Finger and Liu, 2011 ; Liu et al, 2014 ), and sensory fusion (Chan et al, 2012 ; O'Connor et al, 2013 ).…”
Section: Discussionmentioning
confidence: 99%
“…The binaural 64-channel AEREAR2 cochlea model also uses the APFC circuit of Lyon and Mead ( 1988 ) as the base for the front-end with subsequent circuits for modeling the inner hair cell and the spiral ganglion cells. The spike outputs of this sensor system have been applied in various auditory tasks such as digit recognition (Abdollahi and Liu, 2011 ), speaker identification (Chakrabartty and Liu, 2010 ; Liu et al, 2010 ; Li et al, 2012 ), source localization (Finger and Liu, 2011 ; Liu et al, 2014 ), and sensory fusion (Chan et al, 2006 , 2012 ; O'Connor et al, 2013 ). The analogous APFC model is described by the Lyon cochlea model (Lyon, 1982 ) implemented in Matlab within a widely used toolbox by Slaney ( 1998 ).…”
Section: Introductionmentioning
confidence: 99%
“…With improved dynamic range, binaural structure, integrated microphone preamplifiers, and biasing circuits for stability against voltage and temperature variance, this sensor provides precise timing of spikes over a USB interface. This approach was used in complex applications like speaker identification (Li et al, 2012 ). A thorough comparison between conventional cross-correlation approaches and spike-based sound localization algorithms shows that event-driven methods are about 40 times less computationally demanding (Liu et al, 2014 ).…”
Section: Neuromorphic Auditory Sensorsmentioning
confidence: 99%
“…In literature, neuronal models of varying degrees of complexity, ranging from the classic Hodgkin-Huxley model ( [8]), FitzHugh-Nagumo model ( [9]) and Izhikevich models ( [10]) to the simple integrate-and-fire model ( [11]) have been used. However, practical implementations of the cochlea usually resort to a simpler form of these models ( [12]), where only the limit-cycle statistics like average rates or inter-spike intervals of the spike-trains produced are then used for generating auditory features for recognition or classification( [13], [14]). At this level of abstraction, it is not evident how the shape, the nature and the dynamics of each individual spike is related to the overall system objective, and how a population of neurons when coupled together can self-optimize itself to produce an emergent spiking or population response ( [15]).…”
Section: Introductionmentioning
confidence: 99%