A b o u t   M e      |       P r o j e c t s     |       N o t e s       |       T h e   D a y    ︎ ︎

# 1. Color Line Assumption

• Matting is a technique that extracts the foreground from an image.
• Matting needs an assumption that for a small window w of an image, its foreground pixels are colinear, so are its  background pixels.
• More specifically, for a pixel i in w, there exist beta_i, gamma_i, F_1, F_2, B_1 and B_2 such that F_i=beta_i F_1+(1-beta_i)F_2 and  B_i=gamma_i B_1+(1-gamma_i)B_2 where F_i is the foreground component of the pixel i and B_i is the background component of it.
• Moreover, the intensity I_i of the pixel i consists of F_i and B_i such that I_i=alpha_i F_i+(1-alpha_i)B_i for some alpha_i.
• I_i=alpha_i(beta_i F_1+(1-beta_i)F_2)+(1-alpha_i)(gamma_i B_1+(1-gamma_i) B_2)
• It can be also represented as
$$\begin{pmatrix} F_2-B_2 & F_1-F_2 & B_1-B_2 \end{pmatrix} \begin{pmatrix} \alpha_i \\ \alpha_i\beta_i \\ (1-\alpha_i)\gamma_i \end{pmatrix} = I_i-B_2 \\ \begin{pmatrix} \alpha_i \\ \alpha_i\beta_i \\ (1-\alpha_i)\gamma_i \end{pmatrix} = \begin{pmatrix} F_2-B_2 & F_1-F_2 & B_1-B_2 \end{pmatrix}^{-1} (I_i-B_2)= \begin{pmatrix} r_1^t \\ r_2^t \\ r_3^t \end{pmatrix} (I_i-B_2)$$
• So alpha_i=r_1^tI_i-r_1^tB_2. It implies that alpha_i and I_i are in a linear relationship for w. It means that for every window I_i and alpha_i are in a affine relationship.

# 2. Matting Laplaciain

• From the affinity of I_i and alpha_i, alpha_i approx a I_i+b for some a and b. Now consider the cost function J(alpha, a, b).
$$J(\alpha, a, b)= \sum_{j=1}^{N} \sum_{i \in w_j} (\alpha_i^j - (a_j I_i^j+b_j))^2 + \epsilon\|a_j\|_2^2$$
where N is the number of windows in an image which is the same as the number of pixels in an image because the windows are created for each pixel. As the regularization term of this cost function, epsilon norm{a_j}_2^2, is minimized, a_j goes to zero, and it causes alpha_i approx b_j. Here assume that I_i is a scalar for simplicity, but it can be also extended to a three-dimensional vector.
• This cost function can be reformed as follows:
\begin{align} J(\alpha, a, b) &= \sum_{j=1}^{N} \left\|\begin{pmatrix} I_1^j & 1 \\ \vdots & \vdots  \\ I_W^j & 1 \\ \sqrt{\epsilon} & 0 \end{pmatrix} \begin{pmatrix} a_j \\ b_j \end{pmatrix} -\begin{pmatrix} \alpha_1^j \\ \vdots \\ \alpha_W^j \\ 0 \end{pmatrix} \right\| ^2 = \sum_{j=1}^{N} \left\| G_j \begin{pmatrix}  a_j \\ b_j \end{pmatrix} - \bar{\alpha_j} \right\|^2 \end{align}
where W is the number of pixels in each window.
• Let the optimal a_j and b_j for the window w_j be a_j^ast and b_j^ast, then from the normal equation
$$\begin{pmatrix} a_j^{\ast} \\ b_j^{\ast} \end{pmatrix} = (G_j^tG_j)^{-1}G_j^{t}\bar{\alpha_j}.$$
• Now, the cost function J(alpha) for this optimal a_j and b_j is
\begin{align}  J(\alpha) &=\sum_{j=1}^{N} \left\| G_j \begin{pmatrix} a_j^{\ast} \\ b_j^{\ast} \end{pmatrix} - \bar{\alpha_j} \right\|^2 = \sum_{j=1}^{N} \left\| G_j (G_j^tG_j)^{-1}G_j^{t}\bar{\alpha_j} - \bar{\alpha_j} \right\|^2 \end{align}
Let G_j(G_j^tG_j)^{-1}G_j^{t}=X, then X^t= G_j(G_j^tG_j)^{-1}G_j^{t}=X, so X is symmetric. Moreover, XX^t=XX=G_j(G_j^tG_j)^{-1}G_j^{t} G_j(G_j^tG_j)^{-1}G_j^{t}= G_j(G_j^tG_j)^{-1}G_j^{t}=X. It implies that
\begin{align} J(\alpha) &= \sum_{j=1}^{N} \left\| X \bar{\alpha_j}-\bar{\alpha_j} \right\|^2 =\sum_{j=1}^{N} \left\| (X-I)\bar{\alpha_j} \right\|^2 = \sum_{j=1}^{N}\bar{\alpha_j}^t (X-I)^t(X-I)\bar{\alpha_j} \\&= \sum_{j=1}^{N} \bar{\alpha_j}^t (X^t-I)(X-I) \bar{\alpha_j} = \sum_{j=1}^{N} \bar{\alpha_j}^t (X^tX-X-X^t+I) \bar{\alpha_j} \\&= \sum_{j=1}^{N}\bar{\alpha_j}^t (X-X-X^t+I)\bar{\alpha_j} = \sum_{j=1}^{N}\bar{\alpha_j}^t (I-X)\bar{\alpha_j} \\&= \sum_{j=1}^{N} \bar{\alpha_j}^t(I-G_j(G_j^tG_j)^{-1}G_j^{t})\bar{\alpha_j} =\sum_{j=1}^{N}\bar{\alpha_j}^t \bar{G_j}\bar{\alpha_j} \end{align}
• Besides, G_j^tG_j, (G_j^tG_j)^{-1}, (G_j^tG_j)^{-1}G_j^t, and G_j(G_j^tG_j)^{-1}G_j^t are as follows:
\begin{align} G_j^tG_j &=  \begin{pmatrix} I_1^j & \cdots & I_W^j & \sqrt{\epsilon} \\ 1 & \cdots & 1 & 0 \end{pmatrix} \begin{pmatrix}I_1^j & 1 \\\vdots & \vdots  \\I_W^j & 1 \\\sqrt{\epsilon} & 0\end{pmatrix} = \begin{pmatrix} \epsilon+\sum_{i}^{W}I_i^2 & \sum_{i}^{W}I_i \\ \sum_{i}^{W}I_i & W \end{pmatrix} \\ &=  \begin{pmatrix} W(\sigma^2+\mu^2)+\epsilon & W\mu \\ W\mu & W \end{pmatrix} \\ \\ (G_j^tG_j)^{-1} &= \frac{1}{W^2 (\sigma^2+\mu^2) +W\epsilon-W^2\mu^2} \begin{pmatrix} W & -W\mu \\ -W\mu & W( \sigma^2+\mu^2 )+\epsilon \end{pmatrix} \\ &= \frac{1}{W^2\sigma^2 +W\epsilon}\begin{pmatrix} W & -W\mu \\ -W\mu & W(\sigma^2+\mu^2 )+\epsilon\end{pmatrix} \\ \\ (G_j^tG_j)^{-1}G_j^t &= \frac{1}{W^2\sigma^2 +W\epsilon} \begin{pmatrix} W & -W\mu \\ -W\mu & W(\sigma^2+\mu^2 )+\epsilon\end{pmatrix} \begin{pmatrix} I_1^j & \cdots & I_W^j & \sqrt{\epsilon} \\ 1 & \cdots & 1 & 0 \end{pmatrix} \\  &= \frac{1}{W\sigma^2+\epsilon} \begin{pmatrix} I_1^j-\mu & \cdots & I_W^j-\mu & \sqrt{\epsilon} \\ -\mu I_1^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & \cdots & -\mu I_W^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & -\mu\sqrt{\epsilon}\end{pmatrix} \\ \\ G_j(G_j^tG_j)^{-1}G_j^t & \end{align} \begin{align} \   &= \frac{1}{W\sigma^2+\epsilon} \begin{pmatrix}I_1^j & 1 \\\vdots & \vdots  \\I_W^j & 1 \\\sqrt{\epsilon} & 0\end{pmatrix} \begin{pmatrix} I_1^j-\mu & \cdots & I_W^j-\mu & \sqrt{\epsilon} \\ -\mu I_1^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & \cdots & -\mu I_W^j+\sigma^2+\mu^2+\frac{\epsilon}{W} & -\mu\sqrt{\epsilon}\end{pmatrix} \\ &= \frac{1}{W\sigma^2+\epsilon} \begin{pmatrix} \ddots & \  & \  & \sqrt{\epsilon}(I_1^j-\mu) \\ \  & I_uI_v - I_u\mu+\sigma^2+\mu^2-I_v\mu+\frac{\epsilon}{W} & \  & \vdots \\ \  &  \  & \ddots & \sqrt{\epsilon}(I_W^j-\mu) \\ \sqrt{\epsilon}(I_1^j-\mu) & \cdots & \sqrt{\epsilon}(I_W^j-\mu) & \epsilon \end{pmatrix} \end{align}
where u, v in [1, W] are integers. From equations as above, (bar{G_j})_{uv} which means the  (u, v)  element of bar{G_j} is
begin{align} (bar{G_j})_{uv} &= (I- G_j(G_j^tG_j)^{-1}G_j^t )_{uv} \
&=delta_{uv} - frac{1}{Wsigma^2+epsilon} left( I_uI_v-I_umu+sigma^2+mu^2-I_vmu+frac{epsilon}{W} right) \
&= delta_{uv} - frac{1}{Wsigma^2+epsilon} left( sigma^2+frac{epsilon}{W}+(I_u-mu)(I_v-mu) right) \ &= delta_{uv} - frac{1}{W} left( 1+frac{1}{sigma^2+frac{epsilon}{W}}(I_u-mu)(I_v-mu) right)
end{align}
where delta_{uv} is the Kronecker delta which means that 1 if u=v and 0 otherwise.
• Let the top-left Wtimes W submatrix of bar{G_j} be [bar{G_j}]. Now, J(alpha) can be represented as the form m^t L m where L is the Matting Laplacian. Particularly, m is a Ntimes 1 matte vector where m_k is the matte of the k-th pixel. Therefore, there exist multiple i and j pairs such that alpha_i^j = m_k.
\begin{align} J(\alpha) &= \sum_{j=1}^{N}\bar{\alpha_j}^t \bar{G_j}\bar{\alpha_j} = \sum_{j=1}^{N} \begin{pmatrix} \alpha_1^j & \cdots & \alpha_W^j & 0 \end{pmatrix} \bar{G_j} \begin{pmatrix} \alpha_1^j \\ \vdots \\ \alpha_W^j \\ 0 \end{pmatrix} \\ &= \sum_{j=1}^{N} \begin{pmatrix} \alpha_1^j & \cdots & \alpha_W^j \end{pmatrix} [\bar{G_j}]\begin{pmatrix} \alpha_1^j \\ \vdots \\ \alpha_W^j \end{pmatrix} = m^t L m \\ \\ &= \begin{pmatrix} m_1 & \cdots & m_N \end{pmatrix} \begin{pmatrix} L_{11} & \cdots & L_{1N} \\\vdots & \ddots & \vdots \\L_{N1} & \cdots & L_{NN} \end{pmatrix}\begin{pmatrix} m_1 \\ \vdots \\ m_N \end{pmatrix} \\ \\ &= \begin{matrix}m_1 L_{11} m_1 + m_1 L_{12} m_2 + \cdots + m_1 L_{1N} m_N + \ \\\cdots \\m_i L_{i1} m_1 + m_i L_{i2} m_2 + \cdots + m_i L_{iN} m_N + \ \\\cdots \\m_N L_{N1} m_1 + m_N L_{N2} m_2 + \cdots + m_N L_{NN} m_N\end{matrix} \end{align}
• L_{kl} appears in only m_k L_{kl} m_l part. Therefore, L_{kl} is the sum of the coefficients of alpha_{i_1}^{j_1} alpha_{i_2}^{j_2} pairs which are equivalent to m_k m_l in  sum_{j=1}^{N}bar{alpha_j}^t bar{G_j}bar{alpha_j} .

In other words, for the window-set w_{kl} that includes the k-th and l-th pixels together in an image, L_{kl} can be figured out from this by calculating sum_{w in w_{kl}} bar{alpha_w}^t bar{G_w} bar{alpha_w}. This is because there exists the set Theta={(p_w, q_w) | p_w=m_k, q_w=m_l,   p_w, q_w in bar{alpha_w}  text{and}  w in w_{kl}}. Therefore, L_{kl} is as follows:
\begin{align} L_{kl} &= \sum_{w \in w_{kl}} (\bar{G_w})_{p_w q_w} = \sum_{w \in w_{kl}} \delta_{p_w q_w} - \frac{1}{W_{p_w q_w}} \left( 1+\frac{1}{\sigma_w^2+\frac{\epsilon}{W}}(I_{p_w}-\mu_w)(I_{q_w}-\mu_w) \right) \end{align}
• L is symmetric since L_{kl}=L_{lk}.
• Finally, J=m^tLm, so it can be solved by solving Lm=0 because frac{partial J}{partial m}=2m^t L=0.

• Any constant m is in nullspace of L. Consider cL(1, cdots, 1)^t for c in mathbb{R}. Then its i-th row element is sum_{j=1}^N L_{ij}. For the window-set w_{ii} that includes the i-th pixel, let p_w=m_i, w in w_{ii}. Then,
\begin{align} \sum_{j=1}^{N} L_{ij} &= \sum_{w \in w_{ii}} \sum_{u \in w} (\bar{G_w})_{p_w u} = \sum_{w \in w_{ii}} \sum_{u \in w} \delta_{p_w u} - \frac{1}{W}\left( 1+\frac{1}{\sigma_w^2+\frac{\epsilon}{W}}(I_{p_w}-\mu_w)(I_u-\mu_w) \right) \\ &=  \sum_{w \in w_{ii}} \left( \sum_{u \in w}\delta_{p_w u} - \sum_{u \in w} \frac{1}{W}\left( 1+\frac{1}{\sigma_w^2+\frac{\epsilon}{W}}(I_{p_w}-\mu_w)(I_u-\mu_w) \right) \right) \\ &=  \sum_{w \in w_{ii}} \left( 1 - 1 - \frac{1}{W\sigma_w^2+\epsilon}(I_{p_w u}-\mu_w)\sum_{u \in w}(I_u-\mu_w) \right) \\ &= \sum_{w \in w_{ii}} - \frac{1}{W\sigma_w^2+\epsilon}(I_{p_w u}-\mu_w) (W\mu_w - W\mu_w) \\ &= \sum_{w \in w_{ii}} 0 = 0 \end{align}
• So user scribbles are required which denote some of foreground and background pixels in an image. Let U be a Ntimes 1 vector whose i-th element is 1 if it is the known foreground pixel or 0 otherwise. In addition, let F={f | U_f=1}.
\begin{align} J = \min m^t L m + \lambda \sum_{f \in F} (m_f - 1)^2 = \min m^t L m + \lambda (m-U)^t D (m-U) \end{align}
where D is a Ntimes N diagonal matrix which D_{ii}=1 if U_i=1 or 0 otherwise.
• To solve J, we need the following:
\begin{align} \frac{\partial J}{\partial m} &= 2m^t L + \lambda(m-U)^t(D+D^t)=2m^t L +2\lambda(m^t-U^t)D= \\ &= 2m^t (L+\lambda D) - 2\lambda U^t D =0 \\ \\ &\Rightarrow m^t (L+\lambda D) = \lambda U^t D = \lambda U^t \\ \\&\Rightarrow (L+\lambda D)^t m = (L+\lambda D) m =  \lambda U \end{align}
Therefore, the solution can be from the linear system (L+lambda D) m = lambda U where lambda is a large number, for example, 100.

4. The eigenvectors of L

• Without user scribbles, the solution of J=m^t L m is that of Lm=0.
• Consider Lv=lambda v where v is the eigenvector of L and lambda is the eigenvalue of v. If we choose lambda closing to 0, then Lv=lambda v approx 0=Lm
• This eigenvector v looks like a grayscale image.
• Let the set Phi={v | Lv=lambda v, lambda approx 0}. Then, for s=|Phi|, Lv_1 + cdots Lv_s=L(v_1 + cdots + v_s) approx 0. It means that any combination of elements in Phi can be a candidate of m. More formally, there exist some constants such that m approx sum_{i=1}^{s} c_i v_i and sum_{i=1}^{s} c_i=1 for c in mathbb{R}. However, it is more realistic that we define a Ntimes 1 vector beta such that m approx sum_{i=1}^{s} beta_i v_i and sum_{i=1}^{s} beta_{i, j}=1 where beta_{i, j} denotes the j-th element of beta_i for 1 leq j leq N.
• Meanwhile, it is likely that most elements of m are 0 or 1 in the grond truth of m. This is because most m_i is in range (0, 1) only when the i-th pixel is in the boundary of the foreground. So, it is reasonable that if m is well estimated, most elements of m are 0 or 1.
• Therefore, J can be reformed as follows:
\begin{align} J= \min \sum_{i=1}^{s} \sum_{j=1}^{N} | \beta_{i, j} |^\rho +  | 1-\beta_{i, j} |^\rho \end{align}
Each beta_i is forced to be distributed to 0 or 1 when it is minimized. This cost function produces a good solution even when user scribbles are not given.

# ︎Reference

[1] Richard J. Radke, (2012) Computer Vision for Visual Effects, Cambridge University Press, Cambridge.

emoy.net