Microphone Array Presentations | NIST

Source: Microphone Array Presentations | NIST

To reduce the complexity of the design, and make it modular, it was decided to separate the functions on two different types boards. First, the Microboard, which is a sound capture device performing eight channels of digitization and offering a serial data stream, and second a Motherboard which captures and formats data from the eight Microboards and sends the resulting sixty four channels as a UDP packet stream via Ethernet a Data Flow Client for processing. This architecture is shown at a high level below:

The Microboard performs three stages of processing:

  • Microphone amplification to line level
  • Analog to digital conversion,
  • Serial connection to the motherboard

The Motherboard is connected to 8 of these Microboards via cables, and has an FPGA as its main processor. It also has support logic to provide:

  • 4 MBytes of SRAM for buffering and retransmitting of data
  • Fast Ethernet physical layer device (PHY)
  • DIP switch to configure the MAC address
  • A clock synchronization signal connection to other possible microphone arrays
  • PROM to contain firmware that is loaded at power up
  • Condition indicator LEDs.

More information about the microphone array is available from the download section.

Installation Steps of version 2

Step 01 (8 times)


Step 02 (8 times)


Step 03


Step 04


Step 05


Step 06

The step 01 and 02 have to be repeated 8 times for each board. BE CAREFUL there is an order to put the cards (cf user manual). the whole system should be tested with the digital oscilloscope provided below.


NIST Speech Signal to Noise Ratio


The NIST Speech SNR Measurement

In the service of the NIST mission to facilitate industrial advanced technology development, we focus on measurement science and standards development. Since the Smart Spaces of the future will require sensor based interfaces, particularly audio based for speech and speaker recognition, we have developed a signal-to-noise measurement method that will allow more precise measurement of speech signal strength in relatively high background levels. This is designed to facilitate the development of noise reduction algorithms as applied to speech acquired from a variety of sources including microphone arrays.

Broadly, speech is composed of voiced and unvoiced parts, for example the word six being spoken as a phonetically as the four phonenems /s/ /ih/ /k/ /s/, with the two /s/ phones being unvoiced, and having a much lower volume than the /ih/ phone.

Since we are never allowed to observe speech without some degree of background noise, we have developed a method based on sequential Gaussian mixture estimation. Experimental measurements of background noise amplitudes received at our microphone array are well represented by a single Gaussian component, and tested with a Kolmogrov-Smirnov statistic for goodness of fit. A good degree of fit to a single component indicates that no speech is present in a given sample. If a single component hypothesis can be rejected, then we proceed to fit a two component model to the sample time series. A good fit to a two component model might indicate a non-speech speech signal, or speech in a very high level of background noise which masks the unvoiced portion of the speech. If a two component model does not provide a reasonably good fit, we proceed to a three component model, which indicates that there is a fairly good signal-to-noise ratio.

These mixtures are estimated using the classic Expectation Maximization technique, but modified to reflect a constraint that all of the means are equal and zero. We provide a highly optimized C-language implementation of this estimation algorithm as part of our open source toolkit. We take as the SNR estimate as the ratio of the smallest standard deviation to the largest on the decibel scale of 20*log10(s/n).

The pictures show the SNR algorithm estimates of the component standard deviations, from a single microphone and our microphone array. We can see that we go from nine to twenty-one db in the same setting using a delay and sum beam former, and a codec filter that limits the frequency from about 100Hz. to 8,000Hz.


One microphone signal.


One microphone signal distribution.


Microphone array signal.


Microphone array signal distribution


Adaptive Noise Cancellation

Noise is everywhere and in most applications that are related to audio and speech, such as human-machine interfaces, hands-free communications, voice over IP (VoIP), hearing aids, teleconferencing/telepresence/telecollaboration systems, and so many others, the signal of interest (usually speech) that is picked up by a microphone is generally contaminated by noise. As a result, the microphone signal has to be cleaned up with digital signal processing tools before it is stored, analyzed, transmitted, or played out. This cleaning process is often called noise reduction and this topic has attracted a considerable amount of research and engineering attention for several decades. One of the objectives of this book is to present in a common framework an overview of the state of the art of noise reduction algorithms in the single-channel (one microphone) case. The focus is on the most useful approaches, i.e., filtering techniques (in different domains) and spectral enhancement methods. The other objective of Noise Reduction in Speech Processing is to derive all these well-known techniques in a rigorous way and prove many fundamental and intuitive results often taken for granted. This book is especially written for graduate students and research engineers who work on noise reduction for speech and audio applications and want to understand the subtle mechanisms behind each approach. Many new and interesting concepts are presented in this text that we hope the readers will find useful and inspiring.

Source: Noise Reduction in Speech Processing – Jacob Benesty, Jingdong Chen, Yiteng Huang, Israel Cohen – Google Books


  • Gliffy




  • Daum Equation Editor




  • Free Vector Icons




  • Audacity



  1. 在Edit->Preferences->Effects/Enable Effects中添加软件对不同类型插件的支持;
  2. Audacity音频处理是通过插件完成的,可以在官网(Link)上找到功能丰富、不同类型的插件。


Understanding Microphone 麦克风概述











ADMP411 MEMS麦克风参数表


麦克风的灵敏度是指其输出端对于给定标准声学输入的电气响应。 用于麦克风灵敏度测量的标准参考输入信号为 94dB 声压级 (SPL) 或 1 帕( Pa, 衡量压力的单位) 的 1 kHz正弦波。 对于固定的声学输入, 灵敏度值高的麦克风比灵敏度值低的麦克风输出的电信号幅度高。 麦克风灵敏度(用dB 表示) 通常是负值, 因此, 灵敏度越高, 其绝对值越小。
务必注意麦克风灵敏度参数的单位。 如果两个麦克风的灵敏度不是采用同一单位来规定, 那么直接比较灵敏度值是不恰当的。 模拟麦克风的灵敏度通常用 dBV 来规定, 即相对于1.0 Vrms 的比值(dB)。 数字麦克风的灵敏度通常用dBFS 来规定, 即相对于满量程数字输出(FS))的 比值(dB)。 对于数字麦克风, 满量程(全“1”)是麦克风输出数字编码可以表征的最大值; 关于该参数更详尽的描述,参见”最大声学输入“部分。

灵敏度指输入压力与电气输出( 电压) 的比值。对于模拟麦克风, 灵敏度通常用 mV/Pa 来衡量, 其结果可通过下式转换为 dB 值 :


其中 OutputREF 为 1 V/Pa (1000 mV/Pa) 。

对于数字麦克风, 灵敏度表示为 94 dB SPL 输入所产生的输出占满量程输出的百分比。 数字麦克风的换算公式为 :


其中 OutputREF 为满量程数字输出水平(1.0)。

较高的灵敏度并不总是意味着麦克风的性能更佳。 麦克风的灵敏度越高, 则它在典型条件(如交谈等) 下的输出水平与最大输出水平之间的裕量通常也越小。 在近场(近距离谈话) 应用中, 高灵敏度的麦克风可能更容易引起失真,这种失真常常会降低麦克风的整体动态范围。


上表列出麦克风的灵敏度-46dBV,根据此参数换算输出电压与声压的关系:10^(-46/20) = 0.00501 V/Pa = 5.01 mV/Pa

输入声压比如120 dBSPL (20Pa) 的声音,麦克风的输出 = 5.01 mV/Pa * 20 Pa = 100.2 mV (RMS)

结合上文中公式,可以做个倒推计算,对于输出强度例如5.01 mV/Pa的麦克风,换算出其灵敏度:20 * log [(0.00501 V/Pa)/(1 V/Pa)] = -46 dBV // @94 dBSPL

附件为换算麦克风灵敏度的小工具Mic Sensitivity and dB Convertor


信噪比(SNR) 表示参考信号与麦克风输出的噪声水平的比值。 这种测量包括麦克风元件和 MEMS 麦克风封装中集成的 ASIC 二者所贡献的噪声。 SNR 为噪声水平与标准 1kHz、 94 dB SPL 参考信号的 dB 差。

要计算 SNR, 须在安静、 消声环境下测量麦克风的噪声输出。该参数通常表示为 20 kHz 带宽内的 A 加权值 (dBA), 这意味着它包括一个与人耳对不同频率声音的灵敏度相对应的校正系数。当比较不同麦克风的 SNR 时, 必须确保它们采用相同的加权方式和带宽 ; 在较窄带宽下测得的 SNR 优于在整个 20 kHz 带宽下测得的 SNR。

动态范围(Dynamic Range)

麦克风的动态范围衡量麦克风能够做出 线性响应的最大SPL与最小SPL之差, 它不同于SNR( 相比之下,音频ADC 或 DAC 的动态范围与 SNR 通常是等同的)。

麦克风的 SNR 衡量噪底(EIN)与 94 dB SPL 的参考水平之差,但在该参考水平以上,麦克风仍然有相当大的有用信号响应范围。 麦克风能够对 94 dB SPL 至最高 120 dB SPL(即AOP) 的声学输入信号做出线性响应。 因此,MEMS 麦克风的动态范围等于其 SNR + 26 dB, 其中 26 dB = 120 dB(AOP) − 94 dB。 例如,ADMP404 的 SNR 为 62 dB, 而动态范围为 88 dB。

下图显示了声音输入(用 dB SPL 衡量)与麦克风电压输出(用dBV 衡量) 的关系。 动态范围和 SNR 显示于这两个刻度轴之间,以供参考。 图11利用 −38 dBV 灵敏度和 65 dB SNR的 ADMP504 来显示这些关系。


 模拟麦克风的dBSPL输入与 dBV输出的关系

图12显示了数字麦克风的 dB SPL 输入与 dBFS 输出之间的类似关系。 注意, 在此图中,120 dB SPL 的声学过载点(AOP)映射 为 0 dBFS 输出信号。 只要声学过载点对应 0 dBFS 并且设置为 120 dB SPL, 则数字麦克风始终具有 −26dB 的灵敏度。 这是由灵敏度的定义(在 94 dB SPL 下测量)所决定的, 而不是可以通过改变麦克风 ASIC 的增益进行调整的设计参数。


图12. 数字麦克风的dBSPL输入与 dBFS输出的关系

注:以上示例均假定麦克风的声学过载点(AOP)为120dB SPL

频率响应(Frequency Response)

麦克风的频率响应描述其在整个频谱上的输出水平。 频率上限和下限用麦克风响应比 1 kHz 的参考输出水平低 3 dB时的频率点来描述。 1kHz的参考水平通常归一化为 0 dB。

频率响应特性还包括通带内偏离平坦响应的限值。 这些值表示为 ±x dB, 说明 -3 dB 点之间输出信号与标称 0 dB 电平的最大偏差。

MEMS 麦克风数据手册用两幅图来显示此频率响应 : 一幅图显示频率响应模板, 另一个幅图显示典型实测频率响应。频率响应模板图显示整个频率范围内麦克风输出的上限和下限, 麦克风输出保证位于此模板范围内。 典型频率响应图显示麦克风在整个频段内的实际响应。 图13和图14的示例为选自 ADMP404 数据手册的两幅图。

fig13_fr_template图13. 频率响应模板


图14. 典型频率响应(实测)



总谐波失真(THD) 衡量在给定纯单音输入信号下输出信号的失真水平, 用百分比表示。 此百分比为基频以上所有谐波频率的功率之和与基频信号音功率的比值。
ADI MEMS 麦克风的 THD 利用基波的前五次谐波计算。计算公式如下:


THD 值越高,说明麦克风输出中存在的谐波水平越高。

此测试的输入信号通常为 105 dB SPL, 比 94 dB SPL 参考高11 dB。 与其它参数相比, THD 在较高的输入 SPL 下测量,这是因为随着声学输入信号水平提高, THD 测量结果通常也会提高。 根据经验, 输入水平每提高 10 dB, THD 会提高 3 倍。 因此,如果在 105 dB SPL 时 THD 小于 3%, 则在95dB SPL 时 THD 将小于 1%。
注意不要将此参数与总谐波失真加噪声 (THD + N) 混淆,后者不仅衡量谐波水平, 而且包括输出中的所有其它噪声影响。



图15. THD + N vs 输入声压


图16. 麦克风线性度


PSR: Power Supply Rejection
PSRR: Power Supply Rejection Ratio


电源抑制测量之所以使用217Hz频率, 是因为在 GSM电话应用中,217Hz开关频率通常是电源的一个主要噪声源。




图17. 典型的PSRR与频率关系曲线(模拟麦克风)




表2. PSR和PSRR的对比


AOP: Acoustic Overload Point




图18. 麦克风削波特性

Continue reading