P r o j e c t s


          N o t e s



         I n f o r m a t i o n
         g i t h u b









Matting Laplacian



1. Color Line Assumption



  • Matting is a technique that extracts the foreground from an image.
  • Matting needs an assumption that for a small window `w` of an image, its foreground pixels are colinear, so are its  background pixels.
  • More specifically, for a pixel `i in w`, there exist `beta_i`, `gamma_i`, `F_1`, `F_2`, `B_1` and `B_2` such that `F_i=beta_i F_1+(1-beta_i)F_2` and  `B_i=gamma_i B_1+(1-gamma_i)B_2` where `F_i` is the foreground component of the pixel `i` and `B_i` is the background component of it.
  • Moreover, the intensity `I_i` of the pixel `i` consists of `F_i` and `B_i` such that `I_i=alpha_i F_i+(1-alpha_i)B_i` for some `alpha_i`.
  • `I_i=alpha_i(beta_i F_1+(1-beta_i)F_2)+(1-alpha_i)(gamma_i B_1+(1-gamma_i) B_2)`
  • It can be also represented as 
\begin{equation}
\begin{pmatrix} F_2-B_2 & F_1-F_2 &  B_1-B_2 \end{pmatrix}
\begin{pmatrix} \alpha_i \\ \alpha_i\beta_i \\ (1-\alpha_i)\gamma_i  \end{pmatrix} = I_i-B_2 \\
\begin{pmatrix} \alpha_i \\ \alpha_i\beta_i \\ (1-\alpha_i)\gamma_i  \end{pmatrix} =
\begin{pmatrix} F_2-B_2 & F_1-F_2 &  B_1-B_2 \end{pmatrix}^{-1} (I_i-B_2)=
\begin{pmatrix} r_1^t \\ r_2^t \\ r_3^t  \end{pmatrix} (I_i-B_2)
\end{equation}
  • So `alpha_i=r_1^tI_i-r_1^tB_2`. It implies that `alpha_i` and `I_i` are in a linear relationship for `w`. It means that for every window `I_i` and `alpha_i` are in a affine relationship.


2. Matting Laplaciain


  • From the affinity of `I_i` and `alpha_i`, `alpha_i approx a I_i+b` for some `a` and `b`. Now consider the cost function `J(alpha, a, b)`.
\begin{equation} J(\alpha, a, b)= \sum_{j=1}^{N} \sum_{i \in w_j} (\alpha_i^j - (a_j I_i^j+b_j))^2 + \epsilon\|a_j\|_2^2
\end{equation} 
   where `N` is the number of windows in an image which is the same as the number of pixels in an image because the windows are created for each pixel. As the regularization term of this cost function, `epsilon norm{a_j}_2^2`, is minimized, `a_j` goes to zero, and it causes `alpha_i approx b_j`. Here assume that `I_i` is a scalar for simplicity, but it can be also extended to a three-dimensional vector.
  • This cost function can be reformed as follows:
\begin{align} J(\alpha, a, b) &= \sum_{j=1}^{N} \left\|
\begin{pmatrix}
I_1^j & 1 \\
\vdots & \vdots  \\
I_W^j & 1 \\
\sqrt{\epsilon} & 0
\end{pmatrix}
\begin{pmatrix} a_j \\ b_j \end{pmatrix} -
\begin{pmatrix} \alpha_1^j \\ \vdots \\ \alpha_W^j \\ 0 \end{pmatrix} \right\| ^2 =
\sum_{j=1}^{N} \left\| G_j
\begin{pmatrix}  a_j \\ b_j \end{pmatrix} - \bar{\alpha_j} \right\|^2
\end{align}
   where `W` is the number of pixels in each window.
  • Let the optimal `a_j` and `b_j` for the window `w_j` be `a_j^ast` and `b_j^ast`, then from the normal equation
\begin{equation}
\begin{pmatrix} a_j^{\ast} \\ b_j^{\ast} \end{pmatrix} =
(G_j^tG_j)^{-1}G_j^{t}\bar{\alpha_j}.
\end{equation}
  • Now, the cost function `J(alpha)` for this optimal `a_j` and `b_j` is
\begin{align}  J(\alpha) &=\sum_{j=1}^{N} \left\| G_j
\begin{pmatrix} a_j^{\ast} \\ b_j^{\ast} \end{pmatrix} - \bar{\alpha_j} \right\|^2 = \sum_{j=1}^{N} \left\| G_j (G_j^tG_j)^{-1}G_j^{t}\bar{\alpha_j} - \bar{\alpha_j} \right\|^2
\end{align}
   Let `G_j(G_j^tG_j)^{-1}G_j^{t}=X`, then `X^t= G_j(G_j^tG_j)^{-1}G_j^{t}=X`, so `X` is symmetric. Moreover, `XX^t=XX``=G_j(G_j^tG_j)^{-1}G_j^{t} G_j(G_j^tG_j)^{-1}G_j^{t}``= G_j(G_j^tG_j)^{-1}G_j^{t}``=X`. It implies that
\begin{align}
J(\alpha) &= \sum_{j=1}^{N} \left\| X \bar{\alpha_j}-\bar{\alpha_j} \right\|^2 =\sum_{j=1}^{N} \left\| (X-I)\bar{\alpha_j} \right\|^2 = \sum_{j=1}^{N}\bar{\alpha_j}^t (X-I)^t(X-I)\bar{\alpha_j} \\
&= \sum_{j=1}^{N} \bar{\alpha_j}^t (X^t-I)(X-I) \bar{\alpha_j} = \sum_{j=1}^{N} \bar{\alpha_j}^t (X^tX-X-X^t+I) \bar{\alpha_j} \\
&= \sum_{j=1}^{N}\bar{\alpha_j}^t (X-X-X^t+I)\bar{\alpha_j} = \sum_{j=1}^{N}\bar{\alpha_j}^t (I-X)\bar{\alpha_j} \\
&= \sum_{j=1}^{N} \bar{\alpha_j}^t(I-G_j(G_j^tG_j)^{-1}G_j^{t})\bar{\alpha_j} =\sum_{j=1}^{N}\bar{\alpha_j}^t \bar{G_j}\bar{\alpha_j}
\end{align}
  • Besides, `G_j^tG_j`, `(G_j^tG_j)^{-1}`, `(G_j^tG_j)^{-1}G_j^t`, and `G_j(G_j^tG_j)^{-1}G_j^t` are as follows:
\begin{align} G_j^tG_j &= 
\begin{pmatrix} I_1^j & \cdots & I_W^j & \sqrt{\epsilon} \\ 1 & \cdots & 1 & 0 \end{pmatrix}
\begin{pmatrix}
I_1^j & 1 \\
\vdots & \vdots  \\
I_W^j & 1 \\
\sqrt{\epsilon} & 0
\end{pmatrix} =
\begin{pmatrix}
\epsilon+\sum_{i}^{W}I_i^2 & \sum_{i}^{W}I_i \\
\sum_{i}^{W}I_i & W
\end{pmatrix} \\ &= 
\begin{pmatrix}
W(\sigma^2+\mu^2)+\epsilon & W\mu \\
W\mu & W
\end{pmatrix} \\ \\

(G_j^tG_j)^{-1} &= \frac{1}{W^2 (\sigma^2+\mu^2) +W\epsilon-W^2\mu^2}
\begin{pmatrix} W & -W\mu \\ -W\mu & W( \sigma^2+\mu^2 )+\epsilon
\end{pmatrix} \\ &= \frac{1}{W^2\sigma^2 +W\epsilon}
\begin{pmatrix} W & -W\mu \\ -W\mu & W(\sigma^2+\mu^2 )+\epsilon
\end{pmatrix} \\ \\

(G_j^tG_j)^{-1}G_j^t &= \frac{1}{W^2\sigma^2 +W\epsilon}
\begin{pmatrix} W & -W\mu \\ -W\mu & W(\sigma^2+\mu^2 )+\epsilon
\end{pmatrix}
\begin{pmatrix} I_1^j & \cdots & I_W^j & \sqrt{\epsilon} \\ 1 & \cdots & 1 & 0 \end{pmatrix} \\ 
&= \frac{1}{W\sigma^2+\epsilon}
\begin{pmatrix} I_1^j-\mu & \cdots & I_W^j-\mu & \sqrt{\epsilon} \\
-\mu I_1^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & \cdots & -\mu I_W^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & -\mu\sqrt{\epsilon}
\end{pmatrix} \\ \\

G_j(G_j^tG_j)^{-1}G_j^t &
\end{align}
\begin{align} \  
&= \frac{1}{W\sigma^2+\epsilon}
\begin{pmatrix}
I_1^j & 1 \\
\vdots & \vdots  \\
I_W^j & 1 \\
\sqrt{\epsilon} & 0
\end{pmatrix}
\begin{pmatrix} I_1^j-\mu & \cdots & I_W^j-\mu & \sqrt{\epsilon} \\
-\mu I_1^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & \cdots & -\mu I_W^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & -\mu\sqrt{\epsilon}
\end{pmatrix} \\ &= \frac{1}{W\sigma^2+\epsilon}
\begin{pmatrix}
\ddots & \  & \  & \sqrt{\epsilon}(I_1^j-\mu) \\
\  & I_uI_v - I_u\mu+\sigma^2+\mu^2-I_v\mu+\frac{\epsilon}{W} & \  & \vdots \\
\  &  \  & \ddots & \sqrt{\epsilon}(I_W^j-\mu) \\
\sqrt{\epsilon}(I_1^j-\mu) & \cdots & \sqrt{\epsilon}(I_W^j-\mu) & \epsilon
\end{pmatrix}
\end{align}
    where `u, v in [1, W]` are integers. From equations as above, `(bar{G_j})_{uv}` which means the ` (u, v) ` element of `bar{G_j}` is  
\begin{align} (\bar{G_j})_{uv} &= (I- G_j(G_j^tG_j)^{-1}G_j^t )_{uv} \\
&=\delta_{uv} - \frac{1}{W\sigma^2+\epsilon} \left( I_uI_v-I_u\mu+\sigma^2+\mu^2-I_v\mu+\frac{\epsilon}{W} \right) \\
&= \delta_{uv} - \frac{1}{W\sigma^2+\epsilon} \left( \sigma^2+\frac{\epsilon}{W}+(I_u-\mu)(I_v-\mu) \right) \\ &= \delta_{uv} - \frac{1}{W} \left( 1+\frac{1}{\sigma^2+\frac{\epsilon}{W}}(I_u-\mu)(I_v-\mu) \right)
\end{align}
    where `delta_{uv}` is the Kronecker delta which means that `1` if `u=v` and `0` otherwise.
  • Let the top-left `Wtimes W` submatrix of `bar{G_j}` be `[bar{G_j}]`. Now, `J(alpha)` can be represented as the form `m^t L m` where `L` is the Matting Laplacian. Particularly, `m` is a `Ntimes 1` matte vector where `m_k` is the matte of the `k`-th pixel. Therefore, there exist multiple `i` and `j` pairs such that `alpha_i^j = m_k`.
\begin{align} J(\alpha) &= \sum_{j=1}^{N}\bar{\alpha_j}^t \bar{G_j}\bar{\alpha_j} = \sum_{j=1}^{N}
\begin{pmatrix} \alpha_1^j & \cdots & \alpha_W^j & 0 \end{pmatrix} \bar{G_j}
\begin{pmatrix} \alpha_1^j \\ \vdots \\ \alpha_W^j \\ 0 \end{pmatrix} \\ &= \sum_{j=1}^{N}
\begin{pmatrix} \alpha_1^j & \cdots & \alpha_W^j \end{pmatrix} [\bar{G_j}]
\begin{pmatrix} \alpha_1^j \\ \vdots \\ \alpha_W^j \end{pmatrix} = m^t L m \\ \\ &=
\begin{pmatrix} m_1 & \cdots & m_N \end{pmatrix}
\begin{pmatrix}
L_{11} & \cdots & L_{1N} \\
\vdots & \ddots & \vdots \\
L_{N1} & \cdots & L_{NN}
\end{pmatrix}
\begin{pmatrix} m_1 \\ \vdots \\ m_N \end{pmatrix} \\ \\ &=
\begin{matrix}
m_1 L_{11} m_1 + m_1 L_{12} m_2 + \cdots + m_1 L_{1N} m_N + \ \\
\cdots \\
m_i L_{i1} m_1 + m_i L_{i2} m_2 + \cdots + m_i L_{iN} m_N + \ \\
\cdots \\
m_N L_{N1} m_1 + m_N L_{N2} m_2 + \cdots + m_N L_{NN} m_N
\end{matrix}
\end{align}
  • `L_{kl}` appears in only `m_k L_{kl} m_l` part. Therefore, `L_{kl}` is the sum of the coefficients of `alpha_{i_1}^{j_1} alpha_{i_2}^{j_2}` pairs which are equivalent to `m_k m_l` in ` \sum_{j=1}^{N}\bar{\alpha_j}^t \bar{G_j}\bar{\alpha_j} `. 



   In other words, for the window-set `w_{kl}` that includes the `k`-th and `l`-th pixels together in an image, `L_{kl}` can be figured out from this by calculating `sum_{w in w_{kl}} bar{alpha_w}^t bar{G_w} bar{alpha_w}`. This is because there exists the set `Theta={(p_w, q_w) | p_w=m_k, q_w=m_l,``\   p_w, q_w in bar{alpha_w}``  text{and}  w in w_{kl}}`. Therefore, `L_{kl}` is as follows:
\begin{align} L_{kl} &= \sum_{w \in w_{kl}} (\bar{G_w})_{p_w q_w} = \sum_{w \in w_{kl}} \delta_{p_w q_w} - \frac{1}{W_{p_w q_w}} \left( 1+\frac{1}{\sigma_w^2+\frac{\epsilon}{W}}(I_{p_w}-\mu_w)(I_{q_w}-\mu_w) \right)
\end{align}
  • `L` is symmetric since `L_{kl}=L_{lk}`.
  • Finally, `J=m^tLm`, so it can be solved by solving `Lm=0` because `frac{partial J}{partial m}=2m^t L=0`.


3. Adding Constraint

  • Any constant `m` is in nullspace of `L`. Consider `cL(1, cdots, 1)^t` for `c in mathbb{R}`. Then its `i`-th row element is `sum_{j=1}^N L_{ij}`. For the window-set `w_{ii}` that includes the `i`-th pixel, let `p_w=m_i`, `w in w_{ii}`. Then,
\begin{align} \sum_{j=1}^{N} L_{ij} &= \sum_{w \in w_{ii}} \sum_{u \in w} (\bar{G_w})_{p_w u} = \sum_{w \in w_{ii}} \sum_{u \in w} \delta_{p_w u} - \frac{1}{W}\left( 1+\frac{1}{\sigma_w^2+\frac{\epsilon}{W}}(I_{p_w}-\mu_w)(I_u-\mu_w) \right) \\ &= 
\sum_{w \in w_{ii}} \left( \sum_{u \in w}\delta_{p_w u} - \sum_{u \in w} \frac{1}{W}\left( 1+\frac{1}{\sigma_w^2+\frac{\epsilon}{W}}(I_{p_w}-\mu_w)(I_u-\mu_w) \right) \right) \\ &= 
\sum_{w \in w_{ii}} \left(
1 - 1 - \frac{1}{W\sigma_w^2+\epsilon}(I_{p_w u}-\mu_w)\sum_{u \in w}(I_u-\mu_w) \right) \\ &=
\sum_{w \in w_{ii}} - \frac{1}{W\sigma_w^2+\epsilon}(I_{p_w u}-\mu_w) (W\mu_w - W\mu_w) \\ &=
\sum_{w \in w_{ii}} 0 = 0
\end{align}
  • So user scribbles are required which denote some of foreground and background pixels in an image. Let `U` be a `Ntimes 1` vector whose `i`-th element is `1` if it is the known foreground pixel or `0` otherwise. In addition, let `F={f | U_f=1}`.
\begin{align} J = \min m^t L m + \lambda \sum_{f \in F} (m_f - 1)^2 =
\min m^t L m + \lambda (m-U)^t D (m-U)
\end{align}
   where `D` is a `Ntimes N` diagonal matrix which `D_{ii}=1` if `U_i=1` or `0` otherwise.
  • To solve `J`, we need the following:
\begin{align} \frac{\partial J}{\partial m} &= 2m^t L + \lambda(m-U)^t(D+D^t)=2m^t L +2\lambda(m^t-U^t)D= \\ &= 2m^t (L+\lambda D) - 2\lambda U^t D =0 \\ \\ &\Rightarrow
m^t (L+\lambda D) = \lambda U^t D = \lambda U^t \\ \\&\Rightarrow
(L+\lambda D)^t m = (L+\lambda D) m =  \lambda U
\end{align}
   Therefore, the solution can be from the linear system `(L+\lambda D) m = \lambda U` where `lambda` is a large number, for example, `100`.


4. The eigenvectors of `L`

  • Without user scribbles, the solution of `J=m^t L m` is that of `Lm=0`. 
  • Consider `Lv=lambda v` where `v` is the eigenvector of `L` and `lambda` is the eigenvalue of `v`. If we choose `lambda` closing to `0`, then `Lv=lambda v \approx 0=Lm`
  • This eigenvector `v` looks like a grayscale image.
  • Let the set `Phi={v | Lv=lambda v, lambda approx 0}`. Then, for `s=|Phi|`, `Lv_1 + cdots Lv_s=``L(v_1 + cdots + v_s) approx 0`. It means that any combination of elements in `Phi` can be a candidate of `m`. More formally, there exist some constants such that `m approx sum_{i=1}^{s} c_i v_i` and `sum_{i=1}^{s} c_i=1` for `c in mathbb{R}`. However, it is more realistic that we define a `Ntimes 1` vector `beta` such that `m approx sum_{i=1}^{s} beta_i v_i` and `sum_{i=1}^{s} beta_{i, j}=1` where `beta_{i, j}` denotes the `j`-th element of `beta_i` for `1 leq j leq N`.
  • Meanwhile, it is likely that most elements of `m` are `0` or `1` in the grond truth of `m`. This is because most `m_i` is in range `(0, 1)` only when the `i`-th pixel is in the boundary of the foreground. So, it is reasonable that if `m` is well estimated, most elements of `m` are `0` or `1`. 
  • Therefore, `J` can be reformed as follows:
\begin{align} J= \min \sum_{i=1}^{s} \sum_{j=1}^{N} | \beta_{i, j} |^\rho +  | 1-\beta_{i, j} |^\rho
\end{align}
   Each `beta_i` is forced to be distributed to `0` or `1` when it is minimized. This cost function produces a good solution even when user scribbles are not given.



︎Reference


[1] Richard J. Radke, (2012) Computer Vision for Visual Effects, Cambridge University Press, Cambridge.

Mark
emoy.net
Mark