Kernelized Correlation Filters for Visual Tracking

yufei
December 1, 2015 5:37 PM

KCF: Exploiting the Circulant Structure of Tracking-by-detection with Kernels(ECCV2012)

High-Speed Tracking with Kernelized Correlation Filters(PAMI2015)

J. F. Henriques, R. Caseiro, P. Martins, J. Batista

Project

Problem

Observation:

Such sets of samples are riddled with redundancies
any overlapping pixels are constrained to be the same.

Idea

We can diagonalize it with the Discrete Fourier Transform, reducing both storage and computation by several orders of magnitude.

Method:

KCF=ridge regression + Circulant data

ridge regression
<script type="math/tex; mode=display" id="MathJax-Element-133"> \min_\mathbb w \sum_i(f(\mathbb x_i) -y_i)^2+\lambda ||\mathbb w||^2 </script>
<script type="math/tex; mode=display" id="MathJax-Element-134"> \mathbb w =(X^TX + \lambda I)^{-1}X^Ty </script>

Circulant data
<script type="math/tex; mode=display" id="MathJax-Element-135"> C(u)v= \mathcal F^{-1}(\mathcal F(u)\odot\mathcal F(v)) </script>

卷积定理：空域卷积 == 频域乘积

1. 卷积定理

Given: $<script type="math/tex; mode=display" id="MathJax-Element-136">\mathbf x = [x_1,x_2,x_3]</script>$, $<script type="math/tex; mode=display" id="MathJax-Element-137">\mathbf x' = [x'_1,x'_2,x'_3]</script>$

0	<script type="math/tex" id="MathJax-Element-139">x'_1</script>	<script type="math/tex" id="MathJax-Element-140">x'_2</script>	<script type="math/tex" id="MathJax-Element-141">x'_3</script>	0
0	<script type="math/tex" id="MathJax-Element-142">x_1</script>	<script type="math/tex" id="MathJax-Element-143">x_2</script>	<script type="math/tex" id="MathJax-Element-144">x_3</script>	0
<script type="math/tex" id="MathJax-Element-145">x_1</script>	<script type="math/tex" id="MathJax-Element-146">x_2</script>	<script type="math/tex" id="MathJax-Element-147">x_3</script>	0	0
<script type="math/tex" id="MathJax-Element-148">x_2</script>	<script type="math/tex" id="MathJax-Element-149">x_3</script>	0	0	<script type="math/tex" id="MathJax-Element-150">x_1</script>
<script type="math/tex" id="MathJax-Element-151">x_3</script>	0	0	<script type="math/tex" id="MathJax-Element-152">x_1</script>	<script type="math/tex" id="MathJax-Element-153">x_2</script>
0	0	<script type="math/tex" id="MathJax-Element-154">x_1</script>	<script type="math/tex" id="MathJax-Element-155">x_2</script>	<script type="math/tex" id="MathJax-Element-156">x_3</script>

<script type="math/tex; mode=display" id="MathJax-Element-157"> corr(\mathbf x, \mathbf y) = \mathcal F( \tilde {\mathbf x})\odot \mathcal F(\tilde{\mathbf y}) </script>

2. Training
<script type="math/tex; mode=display" id="MathJax-Element-158"> \mathcal F({\mathbf w}) = \frac{\mathcal F(\mathbf x) \odot \mathcal F(\mathbf y)}{\mathcal F(\mathbf x) \odot \mathcal F(\mathbf x) + \lambda } </script>

3. testing
<script type="math/tex; mode=display" id="MathJax-Element-159"> \mathbf y'=\mathcal F^{-1}(\mathcal F(\mathbf w) \odot \mathcal F(\mathbf x') ) </script>

4. Kernel
<script type="math/tex; mode=display" id="MathJax-Element-161"> \kappa(\mathbf x, \mathbf x') = h(||\mathbf x - \mathbf x'||^2) =h(||\mathbf x||^2 +||\mathbf x'||^2 - 2\mathbf x^T\mathbf x' ) </script>
<script type="math/tex; mode=display" id="MathJax-Element-162"> =h(||\mathbf x||^2 +||\mathbf x'||^2 - 2\mathcal F^{-1} (\mathcal F({\mathbf x}) \odot \mathcal F({\mathbf x'}) ) </script>

5. kernel ridge regression
<script type="math/tex; mode=display" id="MathJax-Element-163"> \alpha = (\kappa+ \lambda I)^{-1}\mathbf y </script>

Solustions:

Training:
<script type="math/tex; mode=display" id="MathJax-Element-164"> \alpha = \mathcal F^{-1}(\frac{\mathcal F(\mathbf y)}{\mathcal F(\kappa (\mathbf x, \mathbf x)) + \lambda}) </script>
Prediction:
<script type="math/tex; mode=display" id="MathJax-Element-165"> \hat {\mathbf y} = \mathcal F^{-1}( \mathcal F(\kappa(\mathbf x, \mathbf x')) \odot \mathcal F(\alpha)) </script>

Implementation

Training:

$\mathcal F(\mathbf y) $

    % expected response
	state.window =  floor(state.size*2.5);  
    sz = fliplr(floor(state.window/state.cell_size));
	[rs, cs] = ndgrid((1:sz(1)) -  bitshift(sz(1)+1,-1), (1:sz(2)) - bitshift(sz(2)+1,-1));
	y = exp(-0.5 / output_sigma^2 * (rs.^2 + cs.^2));
    y = circshift(y, bitshift(sz+2,-1));
	state.yf = fft2(y);

<script type="math/tex" id="MathJax-Element-166">\mathcal F(\kappa(\mathbf x, \mathbf x))</script>

        x = get_region(im, state);
        x = double(fhog(single(x) / 255, state.cell_size,9)); x(:,:,end) = [];
        x = bsxfun(@times, x, state.cos_window);
        xf = fft2(x);
		kf = dense_gauss_kernel(xf,xf);

function kf = dense_gauss_kernel(xf, yf)	
		sigma = 0.5;
		N = size(xf,1) * size(xf,2);
		xx = xf(:)' * xf(:) / N;  %squared norm of x
		yy = yf(:)' * yf(:) / N;  %squared norm of y
		%cross-correlation term in Fourier domain
		xyf = xf .* conj(yf);
		xy = sum(real(ifft2(xyf)), 3);  %to spatial domain
		kf = fft2(exp(-1 / sigma^2 * max(0, (xx + yy - 2 * xy) / numel(xf))));
end

<script type="math/tex" id="MathJax-Element-167">\alpha</script>

    	new_alphaf = state.yf ./(kf + 0.0001);

<script type="math/tex" id="MathJax-Element-168">\mathcal F(\kappa(\mathbf x, \mathbf x'))</script>

	% next  frame
    x = get_region(im, state);
    x = double(fhog(single(x) / 255, state.cell_size,9)); x(:,:,end) = []; 
    x = bsxfun(@times, x, state.cos_window);
    xf = fft2(x);
    kf = dense_gauss_kernel( xf, state.z); % z is template

	state.z = (1 - interp_factor) * state.z + interp_factor * xf;

<script type="math/tex" id="MathJax-Element-169">[p_x,p_y] = \arg \max \hat{\mathbf y},</script>

	response = real(ifft2(state.alphaf .* kf));   
    [yc, xc] = find(response == max(response(:)), 1);

Experiments:

dataset: OTB 50 videoes
algorithms: MOSSE, TLD, CT, STRUCK,

Conlusions

Contributions:

the connection between Ridge Regression with cyclically shifted samples and classical correlation filters.
it proposed closed-form solutions to compute kernels at all cyclic shifts.
extend the original work to deal with multiple channels.

Beyond:

Scale ?
Loss functions: hingle loss?, exponent loss?
… …

DSST: Accurate Scale Estimation for Robust Visual Tracking(BMVC 2014)

Martin Danelljan, Gustav H?ger, Fahad Shahbaz Khan and Michael Felsberg.
Matlab

Problem

Accurate scale estimation

Idea

scale = pyramid + correlation filter

先估计目标空间位置，然后估计目标的尺度。

step1: The Translation Filter:

*step2: The Scale Filter:

Algorithms DSST

Scale Estimation

object size : <script type="math/tex" id="MathJax-Element-170">P\times R</script>
scales: <script type="math/tex" id="MathJax-Element-171">n \in \{ \lfloor -\frac{s-1}{2} \rfloor, \cdots,\lfloor \frac{s-1}{2} \rfloor \}</script>
patch <script type="math/tex" id="MathJax-Element-172">J_n</script> size: <script type="math/tex" id="MathJax-Element-173">a^n P \times a^nR</script> :Scale increment factor

Step1: scales setting

nScales= 33;           % number of scale levels (denoted "S" in the paper)
scale_step = 1.02;               % Scale increment factor (denoted "a" in the paper)
% scale factors
ss = 1:nScales;
scaleFactors = scale_step.^(ceil(nScales/2) - ss);

**Step2: expected scale respons <script type="math/tex" id="MathJax-Element-174">\mathbf y</script>**

% desired scale filter output (gaussian shaped), bandwidth proportional to
% number of scales
scale_sigma = nScales/sqrt(33) * scale_sigma_factor;
ss = (1:nScales) - ceil(nScales/2);
ys = exp(-0.5 * (ss.^2) / scale_sigma^2);
ysf = single(fft(ys));

Step3: scale pathes extracting

将目标的尺度（宽度和高度）进行缩放，获取对应位置的图像patch；
将patch resize为模板的大小，以便和模板比对；
提取新patch的hog特征；
将hog特征转为一个列向量，然后乘以hanning窗

function out = get_scale_sample(im, pos, base_target_sz, scaleFactors, scale_window, scale_model_sz)
% out = get_scale_sample(im, pos, base_target_sz, scaleFactors, scale_window, scale_model_sz)
% 
% Extracts a sample for the scale filter at the current
% location and scale.
nScales = length(scaleFactors);
for s = 1:nScales
    patch_sz = floor(base_target_sz * scaleFactors(s));    
    xs = floor(pos(2)) + (1:patch_sz(2)) - floor(patch_sz(2)/2);
    ys = floor(pos(1)) + (1:patch_sz(1)) - floor(patch_sz(1)/2);    
    % check for out-of-bounds coordinates, and set them to the values at
    % the borders
    xs(xs < 1) = 1;
    ys(ys < 1) = 1;
    xs(xs > size(im,2)) = size(im,2);
    ys(ys > size(im,1)) = size(im,1);    
    % extract image
    im_patch = im(ys, xs, :);    
    % resize image to model size
    im_patch_resized = mexResize(im_patch, scale_model_sz, 'auto');    
    % extract scale features
    temp_hog = fhog(single(im_patch_resized), 4);
    temp = temp_hog(:,:,1:31);    
    if s == 1
        out = zeros(numel(temp), nScales, 'single');
    end    
    % window
    out(:,s) = temp(:) * scale_window(s);
end

**step4: 计算<script type="math/tex" id="MathJax-Element-175">\alpha</script>**

	 xs = get_scale_sample(im, pos, base_target_sz, currentScaleFactor * scaleFactors, scale_window, scale_model_sz);
    % calculate the scale filter update
    xsf = fft(xs,[],2);
    new_sf_num = bsxfun(@times, ysf, conj(xsf));
    new_sf_den = sum(xsf .* conj(xsf), 1);

**Step5: Prediction <script type="math/tex" id="MathJax-Element-176">\mathbf y'</script>**

	 % extract the test sample feature map for the scale filter
        xs = get_scale_sample(im, pos, base_target_sz, currentScaleFactor * scaleFactors, scale_window, scale_model_sz);        
        % calculate the correlation response of the scale filter
        xsf = fft(xs,[],2);
        scale_response = real(ifft(sum(sf_num .* xsf, 1) ./ (sf_den + lambda)));

Step6: 估计目标尺度

 	% find the maximum scale response
    recovered_scale = find(scale_response == max(scale_response(:)), 1);
 	% update the scale
    currentScaleFactor = currentScaleFactor * scaleFactors(recovered_scale);

Expeiments

dateset: ** 28 sequences**¹ annotated with the scale variation attribute
algorithms:

Conlusions:

learning discriminative correlation filters based on a scale pyramid representation.
transform = 2D correlation filter , scale = 1D correlation filter

Contribution:

pyramid + correlation filter

Beyond:

若空间位置不准，scale估计也会出现大的偏差；
没有用到kernel也取得了好的效果；
和以前的方法（NCC）相比，只是将响应期望由冲击响应变为gauss或者laplacian分布；

Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration(ECCVW2014)

Yang Li, Jianke Zhu

Matlab

Problem: scale estimation
Idea:
- multi-scales: sampling the patch to different scales compared with template.
- feature : hog + cn
Codes

	search_size = [1  0.985 0.99 0.995 1.005 1.01 1.015];% 	
	for i=1:size(search_size,2)
    tmp_sz = floor((target_sz * (1 + padding))*search_size(i));
    param0 = [pos(2), pos(1), tmp_sz(2)/window_sz(2), 0,...
                        tmp_sz(1)/window_sz(2)/(window_sz(1)/window_sz(2)),0];
    param0 = affparam2mat(param0); 
    patch = uint8(warpimg(double(im), param0, window_sz));
    zf = fft2(get_features(patch, features, cell_size, cos_window,w2c));
    response(:,:,i) = real(ifft2(model_alphaf .* kzf));  
	end
	[vert_delta,tmp, horiz_delta] = find(response == max(response(:)), 1);
	szid = floor((tmp-1)/(size(cos_window,2)))+1;
	horiz_delta = tmp - ((szid -1)* size(cos_window,2));
	if vert_delta > size(zf,1) / 2, 
		vert_delta = vert_delta - size(zf,1);
	end
	if horiz_delta > size(zf,2) / 2,  %same for horizontal axis
		horiz_delta = horiz_delta - size(zf,2);
	end
	tmp_sz = floor((target_sz * (1 + padding))*search_size(szid));
	current_size = tmp_sz(2)/window_sz(2);
	pos = pos + current_size*cell_size * [vert_delta - 1, horiz_delta - 1];

RPT: Reliable Patch Trackers: Robust Visual Tracking by Exploiting Reliable Patches(CVPR2015)

Yang Li, Jianke Zhu, Steven C.H. Hoi

Matlab

Fast Tracking via Dense Spatio-Temporal Context Leraning

Kaihua Zhang, Lei Zhang, Qingshan Liu, David Zhang, and Ming-Hsuan Yang
European Conference on Computer Vision (ECCV 2014), pp. 127-141, Zurich, Switzerland, September, 2014.
Matlab

SRDCF: Learning Spatially Regularized Correlation Filters for Visual Tracking (ICCV2015)

[Project] (http://www.cvl.isy.liu.se/research/objrec/visualtracking/regvistrack/index.html)

ACF: Adaptive Color Attributes for Real-Time Visual Tracking

Martin Danelljan, Fahad Shahbaz Khan, Michael Felsberg and Joost van de Weijer.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014 (Oral).
Matlab

LCF: Long-term Correlation Tracking(CVPR2015)

Chao Ma, Xiaokang Yang, Chongyang Zhang, and Ming-Hsuan Yang,
Project

YiWu, Jongwoo Lim, and Ming-Hsuan Yang. Online object tracking: A benchmark. in CVPR, 2013. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!