MATLAB Programming/Phase Vocoder and Encoder
||A Wikibookian disputes the factual accuracy of this page because it does not conform to Flanagan's 1966 phase vocoder equation at high frequencies.
Please view the relevant discussion.
- Phase vocoder on the English Wikipedia;
- Laroche, J. and Dolson, M. (1999) "Improved Phase Vocoder Time-Scale Modification of Audio" IEEE Transactions on Speech and Audio Processing 7(3):323-32;
- Miller Puckette (1995) "Phase-locked Vocoder" [PostScript file] Proc. IEEE ASSP Conf. on Applications of Signal Processing to Audio and Acoustics (Mohonk, New York: 1995) — pioneered phase coherence propagation by phase-locking adjacent spectral bins and frames during resynthesis;
- Axel Röbel (2003) "A new Approach to Transient Processing in the Phase Vocoder" Proc. 6th Int'l Conf. on Digital Audio Effects (London: Sept. 8-11, 2003); and
- MATLAB Programming wikibook, where this code is given as an example.
The following code, comments, and related commentary are hereby released by the author into the public domain, or if that is not possible, permission is granted for everyone to use this material for any purpose.
% fs.m - synthesis from frequency spectra. % [output: waveform] = fs(freqresp: series of spectra) % % James Salsman, 1999-2000 % % Note: sometimes this produces data outside the (1,-1) range, even when % run with input from that smaller domain given to af.m. It is safe to % clamp outlying values to the extrema of the corresponding input domain. % If the resulting signal sounds clipped, it is NOT because of clamping % domain outliers of the range (e.g., 1.2, -1.3) to the domain edges (1,-1). % To improve the quality, try increasing the length of the fft; if that % doesn't work, then try increasing the sampling rate. To improve the speed, % rewrite it in C; fftPrep is the only complex vector, and the call to 'sort' % is only for the ordering of the dot-product magnitudes, not the sorting. function [output] = fs(freqresp) samplingRate = 16000; % changed from 8000 frameRate = 100; % should be even, maybe try 160 windowStep = samplingRate/frameRate; % 16000/100 = 160, shall be even % windowSize = samplingRate/40; % must be even; use 256 for 11k/s rate windowSize = windowStep*2; % this is better  [rows, cols] = size(freqresp); % input dimensions origFFtSize = rows*2; % should be power of two % Allocate (preextend) all space needed for output array and helpers. % output = zeros(cols*windowStep + (windowSize - windowStep), 1); FFtPrep = zeros(1, origFFtSize) + 0*i; % complex ifft input iFFtReal = zeros(1, origFFtSize); % real ifft output phases = zeros(1, origFFtSize/2); % resynthesis phase array newPhases = zeros(1, origFFtSize/2); % uses "-1" for unassigned phases transInd = zeros(1, origFFtSize/2); % index into bins by magnitude ignore = zeros(1, origFFtSize/2); % sort values unused but can't hurt for n = 1:origFFtSize/2, % Initialize synthesis phases. phases(n) = pi*mod(n, 2); % Each starting phase 180 deg's apart end; % from the next one. % pre-calculate phase transition from one frame to the next for base 2nd bin % phaseFactor = 2*pi*samplingRate/(frameRate*origFFtSize); % resynthesis COLA window should be Portnoff (sinc) because iFFT will % shape like the window it was taken from % wind = sinc((-windowSize/2:windowSize/2-1)/(windowSize/2)); % pure sinc for c = 1:cols, % iterate over columns first = (c-1)*windowStep + 1; % output index last = first + windowSize - 1; % boundaries for b = 2:origFFtSize/2, % iterate over bins % prepare resynthesis % [re, im] = pol2cart(phases(b), freqresp(b, c)); % polar to cartesian FFtPrep(b) = re + i*im; % rectangular to complex FFtPrep(origFFtSize - b+2) = conj(FFtPrep(b)); % the conjugates count end; iFFtReal = real(ifft(FFtPrep)); % resynthesize output(first:last) = output(first:last) ... % note: column vector (') + (iFFtReal( origFFtSize/2 - windowSize/2 : ... origFFtSize/2 + windowSize/2 - 1 ) .* wind )'; % COLA if c < cols, % figure out the next set of phases newPhases = ones(1, origFFtSize/2)*-1; % "-1" for unassigned phase % set transInd to index the least-to-greatest transitioning magnitudes % [ignore, transInd] = sort(freqresp(:,c)' .* freqresp(:,c+1)'); % iterate downwards such that indexed bin will be greatest mags first % for idx = origFFtSize/2:-1:1, % step by -1 from greatest to least b = transInd(idx); % get bin from index if b-1 > 0, % adjacent lower neighboring bin lower = newPhases(b-1); else lower = -1; % nonexistent = unassigned end; if b+1 < origFFtSize/2, % adjacent higher neighboring bin higher = newPhases(b+1); else higher = -1; % nonexistent = unassigned end; if (lower == -1) & (higher == -1), % both unassigned: propagate newPhases(b) = mod(phases(b) + (b-1)*phaseFactor, 2*pi); elseif lower == -1, % make 180 degs from higher newPhases(b) = mod(higher + pi, 2*pi); elseif higher == -1, % make 180 degs from lower newPhases(b) = mod(lower + pi, 2*pi); else % both neighbors assigned avgAng = (lower + higher)/2; % Mean will be 180 deg off if abs(lower - higher) > pi, % when the angles are that avgAng = avgAng + pi; % far apart; if so, adjust end; % 180 degs from mean angle. newPhases(b) = mod(avgAng + pi, 2*pi); end; end; phases = newPhases; % replace array for next iteration end; % if -- skips last iteration end;
Here is the corresponding encoder:
% af.m - frequency spectrum analysis. % [freqresp: sequence of spectra] = af(input: waveform) % % James Salsman, '99-2000; adapted from Malcolm Slaney's mfcc.m [from his % "Auditory Toolbox" at www.interval.com.] % % Also works when samplingRate is 8000 or fftSize is 256, but I haven't tried % both together. function [freqresp] = af(input) samplingRate = 16000; % samples/sec frameRate = 100; % started at 100/sec, maybe use 160 later windowStep = samplingRate/frameRate; % 16000/100 = 160; shall be even windowSize = 400; % large, but from ETSI standard ES 201108 windowSize = samplingRate/40; % must be even; use 256 for 11k/s rate fftSize = 512; % larger better but slower; usually 2^int. [rows, cols] = size(input); % sanity-check input dimensions if (rows > cols) % if there are more rows than columns input=input'; % then transpose end % hamming window shapes input signal % hamWindow = 0.54 - 0.46*cos(2*pi*(0:windowSize-1)/windowSize); % how many columns of data we will end up with % cols = ceil((length(input) - windowSize)/windowStep); % Allocate (preextend) all the space we need for the output arrays. % freqresp = zeros(fftSize/2, cols); fftData = zeros(1, fftSize); fftMag = zeros(1, fftSize); for c = 1:cols, % for each column of data first = (c-1)*windowStep + 1; % determine last = first + windowSize - 1; % endpoints % shape the data with a hamming window % fftData(1:windowSize) = input(first:last).*hamWindow; % find the magnitude of the fft % fftMag = abs(fft(fftData)); % keep only real magnitudes freqresp(:, c) = fftMag(1:fftSize/2)'; % emit end;