Audio Toolbox

Design and analyze speech, acoustic, and audio processing systems

Audio Toolbox provides tools for audio processing, speech analysis, and acoustic measurement. It includes algorithms for processing audio signals such as equalization and time stretching, estimating acoustic signal metrics such as loudness and sharpness, and extracting audio features such as MFCC and pitch. It also provides advanced machine learning models, including i-vectors, and pretrained deep learning networks, including VGGish and CREPE. Toolbox apps support live algorithm testing, impulse response measurement, and signal labeling. The toolbox provides streaming interfaces to ASIO, CoreAudio, and other sound cards; MIDI devices; and tools for generating and hosting VST and Audio Units plugins.

With Audio Toolbox you can import, label, and augment audio data sets, as well as extract features to train machine learning and deep learning models. The pre-trained models provided can be applied to audio recordings for high-level semantic analysis.

You can prototype audio processing algorithms in real time or run custom acoustic measurements by streaming low-latency audio to and from sound cards. You can validate your algorithm by turning it into an audio plugin to run in external host applications such as Digital Audio Workstations. Plugin hosting lets you use external audio plugins as regular MATLAB objects.

Get Started:

Streaming Acquisition and Playback with Audio Interfaces
Machine Learning and Deep Learning
Audio Processing Algorithms and Effects
Real-Time Audio Prototyping
Acoustic Measurements and Spatial Audio
Generate and Host Audio Plugins
Target Embedded and Real-Time Audio Systems

Streaming Acquisition and Playback with Audio Interfaces

Connect to standard laptop and desktop sound cards for streaming low-latency multichannel audio between any combination of files and live inputs and outputs.

Connectivity to Standard Audio Drivers

Read and write audio samples from and to sounds cards (such as USB or Thunderbolt™) using standard audio drivers (such as ASIO, WASAPI, CoreAudio, and ALSA) across Windows^®, Mac^®, and Linux^® operating systems.

Audio Support from Audio Toolbox

Real-Time Audio in MATLAB

Multichannel sound card examples.

Low-Latency Multichannel Audio Streaming

Process live audio in MATLAB with milliseconds of round-trip latency.

Audio I/O: Buffering, Latency, and Throughput

Measure Audio Latency

Measure Performance of Streaming Real-Time Audio Algorithms

Live raw input from a four-channel microphone array.

Machine Learning and Deep Learning

Label, augment, create, and ingest audio and speech datasets, extract features, and compute time-frequency transformations. Develop audio and speech analytics with Statistics and Machine Learning Toolbox, Deep Learning Toolbox, or other machine learning tools.

Pre-Trained Deep Learning Models

Use deep learning to carry out complex signal processing tasks and extract audio embeddings with a single line of code. Access established pre-trained networks like YAMNet, VGGish, CREPE, and OpenL3 and apply them with the help of preconfigured feature extraction functions.

Pretrained Networks

Classify Sounds in an Audio Signal

YAMNet Neural Network

VGGish Neural Network

Word cloud displaying the sound types identified by classifySound in a particular audio segment.

Feature Extraction for Audio, Speech, and Acoustics

Transform signals into time-frequency representations like Mel, Bark, and ERB spectrograms. Compute cepstral coefficients such as MFCC and GTCC, and scalar features such as pitch, harmonicity, and spectral descriptors. Extract high-level features and signal embeddings using pre-trained deep learning models (VGGish, OpenL3) and the i-vector system. Accelerate feature extraction with compatible GPU cards.

Audio Feature Extractor

Learn about vggishFeatures

Voice Activity Detection in Noise Using Deep Learning

Live Mel spectrogram of speech commands.

Machine Learning Models and Training Recipes

Train state-of-the art machine learning with your audio data sets. Use established systems of models, such as i-vectors, for applications like speaker identification and verification. Learn from working examples how to design and train advanced neural networks and layers for audio, speech, and acoustics applications.

Learn about ivectorSystem

Speaker Diarization Using X-Vectors

Speaker Identification Using Custom SincNet Layer and Deep Learning

Waveform of speech recording with interleaved segments spoken by different speakers, and color highlighting indicating which speaker is speaking in each detected speech region.

Diarization results obtained using x-vectors on speech signal including five different speakers.

Import, Annotate, and Preprocess Audio Datasets

Read, partition, and preprocess large collections of audio recordings. Annotate audio signals manually with apps. Identify and segment regions of interests automatically using pre-trained machine learning models.

Learn about audioDatastore

Label Audio Using Audio Labeler

Import Audio File Data into Signal Labeler

Speech-to-Text Transcription

Region-of-interest labels in Audio Labeler app.

Augment and Synthesize Audio and Speech Datasets

Set up randomized data augmentation pipelines using combinations of pitch shifting, time stretching, and other audio processing effects. Create synthetic speech recordings from text using text-to-speech cloud-based services.

Text-to-Speech Synthesis

Audio Data Augmenter

Pitch-Invariant Time Stretching

Pitch Shifting

Formant estimation for timbre-invariant pitch shifting.

Audio Processing Algorithms and Effects

Generate standard waveforms, apply common audio effects, and design audio processing systems with dynamic parameter tuning and live visualization.

Audio Filters and Equalizers

Model and apply parametric EQ, graphic EQ, shelving, and variable-slope filters. Design and simulate digital crossover, octave, and fractional-octave filters.

Parametric Equalization

Graphic Equalization

Parametric Equalizer Design

Interactive tuning of a three-band crossover filter with live visualization.

Dynamic Range Control and Effects

Model and apply dynamic range processing algorithms such as compressor, limiter, expander, and noise gate. Add artificial reverberation with recursive parametric models.

Dynamic Range Control

Multiband Dynamic Range Compression

Interactive tuning of the dynamic response of a compressor.

System Simulation with Block Diagrams

Design and simulate system models using libraries of audio processing blocks for Simulink. Tune parameters and visualize system behavior using interactive controls and dynamic plots.

Real-Time Audio in Simulink

Multiband Dynamic Range Compression

Composed visualization of a Simulink model, with blocks and subsystems at different levels of the model hierarchy, a plot of a filter response, and a user interface with interactive dials to tune parameter values.

Detail of a multiband dynamic range compressor model in Simulink.

Real-Time Audio Prototyping

Validate audio processing algorithms with interactive real-time listening tests in MATLAB.

Live Parameter Tuning via User Interfaces

Automatically create user interfaces for tunable parameters of audio processing algorithms. Test individual algorithms with the Audio Test Bench app and tune parameters in running programs with auto-generated interactive controls.

Audio Test Bench Walkthrough

Real-Time Parameter Tuning

Delay-Based Audio Effects

Interactive tuning of a custom three-band parametric EQ using Audio Test Bench.

MIDI Connectivity for Parameter Control and Message Exchange

Interactively change parameters of MATLAB algorithms by using MIDI control surfaces. Control external hardware or respond to events by sending and receiving any type of MIDI message.

What Are DAWs, Audio Plugins, and MIDI Controllers?

MIDI Device Interface

MIDI Control for Audio Plugins

Using a MIDI Control Surface to Interact with a Simulink Model

Block diagram showing a keyboard MIDI controller sending MIDI messages to a MATLAB session, which in turns processes the messages, synthesizes note waveforms, and plays back the generated samples through a loudspeaker.

MIDI message and audio signal flow written in MATLAB for a musical instrument synthesizer.

Acoustic Measurements and Spatial Audio

Measure system responses, analyze and meter signals, and design spatial audio processing systems.

Standard-Based Metering and Analysis

Apply sound pressure level (SPL) meters and loudness meters to recorded or live signals. Analyze signals with octave and fractional-octave filters. Apply standard-compliant A-, C-, or K-weighting filters to raw recordings. Measure acoustic sharpness, roughness, and fluctuation strength.

Loudness Normalization in Accordance with EBU R 128 Standard

Sound Pressure Measurement of Octave Frequency Bands

THD+N Measurement with Tone-Tracking

Binaural Audio Rendering Using Head-Tracking

Effect of Soundproofing on Perceived Noise Levels

Visualization of different SPL measurements across two third-octave bands.

Impulse Response Measurement

Measure impulse and frequency responses of acoustic and audio systems with maximum-length sequences (MLS) and exponential swept sinusoids (ESS). Get started with the Impulse Response Measurer app. Automate measurements by programmatically generating excitation signals and estimating system responses.

Impulse Response Measurer Walkthrough

Measure Frequency Response of an Audio Device

Impulse Response Measurer app.

Efficient Convolution with Room Impulse Responses

Convolve signals with long impulse responses efficiently using frequency domain overlap-and-add or overlap-and-save implementations. Trade off latency for computational speed using automatic impulse response partitioning.

Measure Impulse Response of an Audio System

Learn about Partitioned Frequency-Domain FIR Filter

MATLAB figure showing the absolute value of a fairly long impulse response over time, using a log scale for the Y axis. After five seconds, the plot shows that the normalized absolute values have yet to become smaller than one thousandth of the initial amplitude.

Impulse response lasting five seconds or over 220k samples at 44100Hz.

Spatial Audio

Encode and decode different ambisonic formats. Interpolate spatially sampled head-related transfer functions (HRTF).

Ambisonic Binaural Decoding

Ambisonic Plugin Generation

Drawing showing a binaural mannequin, three loudspeakers at the vertices of a spherical sector representing three points at which the head-related transfer function is known, and a fourth point at a random position inside the sector, for which the head-related transfer function needs to be estimated.

Example of desired sound source position and nearest angles where HRTF measurements are available.

Generate and Host Audio Plugins

Prototype audio processing algorithms written in MATLAB as standard audio plugins; use external audio plugins as regular MATLAB objects.

Generating Audio Plugins

Generate VST plugins, AU plugins, and standalone executable plugins directly from MATLAB code without requiring manual design of user interfaces. For more advanced plugin prototyping, generate ready-to-build JUCE C++ projects (requires MATLAB Coder).

Automatically Generating VST Plugins from MATLAB Code

Audio Plugin Example Gallery

Design an Audio Plugin

UI of an audio plugin generated with MATLAB, as seen while is it used inside REAPER, a well-known digital audio workstation. The UI includes various sliders and knobs arranged over a 3-by-3 grid.

Multiband parametric EQ example: VST plugin generated from MATLAB code and running in REAPER.

Hosting External Audio Plugins

Use external VST and AU plugins as regular MATLAB objects. Change plugin parameters and programmatically process MATLAB arrays. Alternatively, automate associations of plugin parameters with user interfaces and MIDI controls. Host plugins generated from your MATLAB code for increased execution efficiency.

Host External Audio Plugins

To the left, the UI of a commercial audio plugin for audio denoising, featuring a large knob to set the level of noise suppression. To the right, a few lines of code show how the same plugin can be imported and used programmatically as a MATLAB object.

Example of external VST plugin for audio denoising (Accusonus ERA-N) and programmatic interface in MATLAB.

Target Embedded and Real-Time Audio Systems

Use code generation to implement audio processing designs on software devices and automate access to audio interfaces.

Code Generation for CPU and GPU Targets

With MathWorks coder products, generate C and C++ source code from signal processing and machine learning algorithms provided as toolbox functions, objects, and blocks. Generate CUDA source code from select feature extraction functions like mfcc and melSpectrogram.

Code Generation and GPU Support

Keyword Spotting in Noise Code Generation with Intel MKL-DNN

Speech Command Recognition Code Generation on Raspberry Pi

Plot reporting the time elapsed at each prediction cycle for a speech command recognition system, showing the time used is well below the available time budget of 50.

Dynamic profiling for the optimized implementation on an ARM Cortex-A processor of a speech command recognition system based on deep learning

Low-Cost and Mobile Devices

Prototype audio processing designs on Raspberry Pi™ by using on-board or external multichannel audio interfaces. Create interactive control panels as mobile apps for Android^® or iOS devices.

Audio Effects for iOS Devices

Parametric Audio Equalizer on Raspberry Pi

Simulink Support Package for Raspberry Pi Hardware

Raspberry Pi 3 board for design prototyping.

Zero-Latency Systems

Prototype audio processing designs with single-sample inputs and outputs for adaptive noise control, hearing aid validation, or other applications requiring minimum round-trip DSP latency. Automatically target Speedgoat audio machines and ST Discovery boards directly from Simulink models.

Parametric Audio Equalizer for STM32 Discovery Boards

ST Discovery Board Support from Embedded Coder

Speedgoat Hardware Support for Real-Time Simulation and Testing from Simulink Real-Time

Cochlear Ltd. Streamlines Development of Cochlear Implant Sound Processing Algorithms

Active Noise Control – From Modeling to Real-Time Prototyping

Product Resources:

Documentation Examples Videos Product Requirements Release Notes Functions Technical Articles User Stories Hardware Support System Objects

Audio Toolbox

Audio Toolbox

Design and analyze speech, acoustic, and audio processing systems

Get Started:

Streaming Acquisition and Playback with Audio Interfaces

Connectivity to Standard Audio Drivers

Low-Latency Multichannel Audio Streaming

Machine Learning and Deep Learning

Pre-Trained Deep Learning Models

Feature Extraction for Audio, Speech, and Acoustics

Machine Learning Models and Training Recipes

Import, Annotate, and Preprocess Audio Datasets

Augment and Synthesize Audio and Speech Datasets

Audio Processing Algorithms and Effects

Audio Filters and Equalizers

Dynamic Range Control and Effects

System Simulation with Block Diagrams

Real-Time Audio Prototyping

Live Parameter Tuning via User Interfaces

MIDI Connectivity for Parameter Control and Message Exchange

Acoustic Measurements and Spatial Audio

Standard-Based Metering and Analysis

Impulse Response Measurement

Efficient Convolution with Room Impulse Responses

Spatial Audio

Generate and Host Audio Plugins

Generating Audio Plugins

Hosting External Audio Plugins

Target Embedded and Real-Time Audio Systems

Code Generation for CPU and GPU Targets

Low-Cost and Mobile Devices

Zero-Latency Systems

Product Resources:

Get a Free Trial

Ready to Buy?

Are You a Student?

What's Next?

Hardware Support

Apps for Audio Toolbox

Audio Toolbox

Audio Toolbox

Design and analyze speech, acoustic, and audio processing systems

Get Started:

Streaming Acquisition and Playback with Audio Interfaces

Connectivity to Standard Audio Drivers

Standard Audio Interfaces

Low-Latency Multichannel Audio Streaming

Streaming Multichannel Audio Input

Machine Learning and Deep Learning

Pre-Trained Deep Learning Models

Single-Line Sound Type Classification

Feature Extraction for Audio, Speech, and Acoustics

Auditory Spectrogram.

Machine Learning Models and Training Recipes

Import, Annotate, and Preprocess Audio Datasets

Augment and Synthesize Audio and Speech Datasets

Speech Formant Estimation

Audio Processing Algorithms and Effects

Audio Filters and Equalizers

Crossover Filter

Dynamic Range Control and Effects

Tuning an Audio Compressor

System Simulation with Block Diagrams

Real-Time Audio Prototyping

Live Parameter Tuning via User Interfaces

Testing a Multiband Parametric EQ

MIDI Connectivity for Parameter Control and Message Exchange

Acoustic Measurements and Spatial Audio

Standard-Based Metering and Analysis

Third-Octave SPL Analysis

Impulse Response Measurement

Efficient Convolution with Room Impulse Responses

Spatial Audio

Generate and Host Audio Plugins

Generating Audio Plugins

Hosting External Audio Plugins

Target Embedded and Real-Time Audio Systems

Code Generation for CPU and GPU Targets

Low-Cost and Mobile Devices

Zero-Latency Systems

Product Resources:

Get a Free Trial

Ready to Buy?

Are You a Student?

What's Next?

Hardware Support

Apps for Audio Toolbox