Audio Toolbox

 

Audio Toolbox

Design and analyze speech, acoustic, and audio processing systems

Get Started:

Streaming Acquisition and Playback with Audio Interfaces

Connect to standard laptop and desktop sound cards for streaming low-latency multichannel audio between any combination of files and live inputs and outputs.

Connectivity to Standard Audio Drivers

Read and write audio samples from and to sounds cards (such as USB or Thunderbolt™) using standard audio drivers (such as ASIO, WASAPI, CoreAudio, and ALSA) across Windows®, Mac®, and Linux® operating systems.

Low-Latency Multichannel Audio Streaming

Process live audio in MATLAB with milliseconds of round-trip latency.

Live raw input from a four-channel microphone array.

Machine Learning and Deep Learning

Label, augment, create, and ingest audio and speech datasets, extract features, and compute time-frequency transformations. Develop audio and speech analytics with Statistics and Machine Learning Toolbox, Deep Learning Toolbox, or other machine learning tools.

Pre-Trained Deep Learning Models

Use deep learning to carry out complex signal processing tasks and extract audio embeddings with a single line of code. Access established pre-trained networks like YAMNet, VGGish, CREPE, and OpenL3 and apply them with the help of preconfigured feature extraction functions.

Word cloud displaying the sound types identified by classifySound in a particular audio segment.

Feature Extraction for Audio, Speech, and Acoustics

Transform signals into time-frequency representations like Mel, Bark, and ERB spectrograms. Compute cepstral coefficients such as MFCC and GTCC, and scalar features such as pitch, harmonicity, and spectral descriptors. Extract high-level features and signal embeddings using pre-trained deep learning models (VGGish, OpenL3) and the i-vector system. Accelerate feature extraction with compatible GPU cards.

Live Mel spectrogram of speech commands.

Machine Learning Models and Training Recipes

Train state-of-the art machine learning with your audio data sets. Use established systems of models, such as i-vectors, for applications like speaker identification and verification. Learn from working examples how to design and train advanced neural networks and layers for audio, speech, and acoustics applications.

Waveform of speech recording with interleaved segments spoken by different speakers, and color highlighting indicating which speaker is speaking in each detected speech region.

Diarization results obtained using x-vectors on speech signal including five different speakers.

Import, Annotate, and Preprocess Audio Datasets

Read, partition, and preprocess large collections of audio recordings. Annotate audio signals manually with apps. Identify and segment regions of interests automatically using pre-trained machine learning models.

Region-of-interest labels in Audio Labeler app.

Region-of-interest labels in Audio Labeler app.

Augment and Synthesize Audio and Speech Datasets

Set up randomized data augmentation pipelines using combinations of pitch shifting, time stretching, and other audio processing effects. Create synthetic speech recordings from text using text-to-speech cloud-based services.

Formant estimation for timbre-invariant pitch shifting.

Audio Processing Algorithms and Effects

Generate standard waveforms, apply common audio effects, and design audio processing systems with dynamic parameter tuning and live visualization.

Audio Filters and Equalizers

Model and apply parametric EQ, graphic EQ, shelving, and variable-slope filters. Design and simulate digital crossover, octave, and fractional-octave filters.

Interactive tuning of a three-band crossover filter with live visualization.

Dynamic Range Control and Effects

Model and apply dynamic range processing algorithms such as compressor, limiter, expander, and noise gate. Add artificial reverberation with recursive parametric models.

Interactive tuning of the dynamic response of a compressor.

System Simulation with Block Diagrams

Design and simulate system models using libraries of audio processing blocks for Simulink. Tune parameters and visualize system behavior using interactive controls and dynamic plots.

Composed visualization of a Simulink model, with blocks and subsystems at different levels of the model hierarchy, a plot of a filter response, and a user interface with interactive dials to tune parameter values.

Detail of a multiband dynamic range compressor model in Simulink.

Real-Time Audio Prototyping

Validate audio processing algorithms with interactive real-time listening tests in MATLAB.

Live Parameter Tuning via User Interfaces

Automatically create user interfaces for tunable parameters of audio processing algorithms. Test individual algorithms with the Audio Test Bench app and tune parameters in running programs with auto-generated interactive controls.

Interactive tuning of a custom three-band parametric EQ using Audio Test Bench.

MIDI Connectivity for Parameter Control and Message Exchange

Interactively change parameters of MATLAB algorithms by using MIDI control surfaces. Control external hardware or respond to events by sending and receiving any type of MIDI message.

Block diagram showing a keyboard MIDI controller sending MIDI messages to a MATLAB session, which in turns processes the messages, synthesizes note waveforms, and plays back the generated samples through a loudspeaker.

MIDI message and audio signal flow written in MATLAB for a musical instrument synthesizer.

Acoustic Measurements and Spatial Audio

Measure system responses, analyze and meter signals, and design spatial audio processing systems.

Standard-Based Metering and Analysis

Apply sound pressure level (SPL) meters and loudness meters to recorded or live signals. Analyze signals with octave and fractional-octave filters. Apply standard-compliant A-, C-, or K-weighting filters to raw recordings. Measure acoustic sharpness, roughness, and fluctuation strength.

Visualization of different SPL measurements across two third-octave bands.

Impulse Response Measurement

Measure impulse and frequency responses of acoustic and audio systems with maximum-length sequences (MLS) and exponential swept sinusoids (ESS). Get started with the Impulse Response Measurer app. Automate measurements by programmatically generating excitation signals and estimating system responses.

Capture of the Impulse Response Measurer App, showing an estimated response in the time domain and in the frequency domain, a menu with a list of other estimated impulse responses available to plot, and other interactive controls available in the app.

 Impulse Response Measurer app.

Efficient Convolution with Room Impulse Responses

Convolve signals with long impulse responses efficiently using frequency domain overlap-and-add or overlap-and-save implementations. Trade off latency for computational speed using automatic impulse response partitioning.

MATLAB figure showing the absolute value of a fairly long impulse response over time, using a log scale for the Y axis. After five seconds, the plot shows that the normalized absolute values have yet to become smaller than one thousandth of the initial amplitude.

Impulse response lasting five seconds or over 220k samples at 44100Hz.

Spatial Audio

Encode and decode different ambisonic formats. Interpolate spatially sampled head-related transfer functions (HRTF).

Drawing showing a binaural mannequin, three loudspeakers at the vertices of a spherical sector representing three points at which the head-related transfer function is known, and a fourth point at a random position inside the sector, for which the head-related transfer function needs to be estimated.

Example of desired sound source position and nearest angles where HRTF measurements are available.

Generate and Host Audio Plugins

Prototype audio processing algorithms written in MATLAB as standard audio plugins; use external audio plugins as regular MATLAB objects.

Generating Audio Plugins

Generate VST plugins, AU plugins, and standalone executable plugins directly from MATLAB code without requiring manual design of user interfaces. For more advanced plugin prototyping, generate ready-to-build JUCE C++ projects (requires MATLAB Coder).

UI of an audio plugin generated with MATLAB, as seen while is it used inside REAPER, a well-known digital audio workstation. The UI includes various sliders and knobs arranged over a 3-by-3 grid.

Multiband parametric EQ example: VST plugin generated from MATLAB code and running in REAPER.

Hosting External Audio Plugins

Use external VST and AU plugins as regular MATLAB objects. Change plugin parameters and programmatically process MATLAB arrays. Alternatively, automate associations of plugin parameters with user interfaces and MIDI controls. Host plugins generated from your MATLAB code for increased execution efficiency.

To the left, the UI of a commercial audio plugin for audio denoising, featuring a large knob to set the level of noise suppression. To the right, a few lines of code show how the same plugin can be imported and used programmatically as a MATLAB object.

Example of external VST plugin for audio denoising (Accusonus ERA-N) and programmatic interface in MATLAB.

Target Embedded and Real-Time Audio Systems

Use code generation to implement audio processing designs on software devices and automate access to audio interfaces.

Code Generation for CPU and GPU Targets

With MathWorks coder products, generate C and C++ source code from signal processing and machine learning algorithms provided as toolbox functions, objects, and blocks. Generate CUDA source code from select feature extraction functions like mfcc and melSpectrogram.

Plot reporting the time elapsed at each prediction cycle for a speech command recognition system, showing the time used is well below the available time budget of 50.

Dynamic profiling for the optimized implementation on an ARM Cortex-A processor of a speech command recognition system based on deep learning

Low-Cost and Mobile Devices

Prototype audio processing designs on Raspberry Pi™ by using on-board or external multichannel audio interfaces. Create interactive control panels as mobile apps for Android® or iOS devices.

Photo of a Raspberry Pi board.

Raspberry Pi 3 board for design prototyping.

Zero-Latency Systems

Prototype audio processing designs with single-sample inputs and outputs for adaptive noise control, hearing aid validation, or other applications requiring minimum round-trip DSP latency. Automatically target Speedgoat audio machines and ST Discovery boards directly from Simulink models.