The generator is a typical seq2seq model.
I'm good
is similar to I'am XXX
but is not the answer to How are you?
.Maximum Likelihood | Reinforcement Learning | |
---|---|---|
Objective Function | \(\frac{1}{N} \sum_{i=1}^{N} \log P_{\theta}\left(\hat{x}^{i} \mid c^{i}\right)\) | \(\frac{1}{N} \sum_{i=1}^{N} R\left(c^{i}, x^{i}\right) \log P_{\theta}\left(x^{i} \mid c^{i}\right)\) |
Gradient | \(\frac{1}{N} \sum_{i=1}^{N} \nabla \log P_{\theta}\left(\hat{x}^{i} \mid c^{i}\right)\) | \(\frac{1}{N} \sum_{i=1}^{N} R\left(c^{i}, x^{i}\right) \nabla \log P_{\theta}\left(x^{i} \mid c^{i}\right)\) |
Training Data | \(\left\{\left(c^{1}, \hat{x}^{1}\right), \ldots,\left(c^{N}, \hat{x}^{N}\right)\right\} \\ R\left(c^{i}, \hat{x}^{i}\right)=1\) | \(\left\{\left(c^{1}, x^{1}\right), \ldots,\left(c^{N}, x^{N}\right)\right\}\) obtained from interation, weighted by $R(c^i, x^i)$ |
Warning: Sometimes discriminator may immediately distinguish the fake one-hot score, because comparing to the true label where the only exact true dimension contains 1
, fake score vector is diffused as [0.8, 0.1, 0, ..., 0.1]
. Thus, WGAN solves this problem by constraining the discriminator with 1-Lipschitz
function, whereby the discriminator’s vision becomes ‘fuzzy’ and scores are not distinguished too much.
Note: Likelihood only gives the similarity of the generation to the real data, but has no indication on the quality of the generated result which is not in the dataset. Also, likelihood doesn’t guarantee the diversity of the generation.
1000
dimensional vector, where each dimension represents the possibility of one class.KL-divergence
of $p(y|x)$ and $p(y)$.In summary: The higher Inception Score (IS) means the better quality and diversity.
$\mu,\Sigma$ represents the mean and covariance matrix in real image $x$ and generated set $g$, $\text{Tr}$ is the trace of the matrix.
The covariance matrix is as large as the size of pixel*pixel
dimensions and spatial expensive.
2048
dimensional predictor output size.In summary: The lower Fréchet Inception Distance (FID) means the the real distribution are more closed to the real images. The better real image set and the lower FID means the better generation,
TODO DCGAN
TODO ALI
z
to two parts, c
encodes the different feature in each dimension and z'
as input noise.c
from the output x
from the Generator, which supervises the generate to output x
with the feature of c
.c
which benefit for the Classifier to predict, but generate bad results.z
, regularized by imposing a prior distribution over the latent distribution $p(z)z
of Encoder: minimize the reconstructionz
as closed to the normal distribution as possible.Discriminator is to distinguish the real, generated or reconstructed images.
Enc
, Dec
, D
m
images $\{ x^{1}, x^{2}, \ldots, x^{m} \}$ from database distribution $P_{data}(x)$.m
codes $\{\tilde{z}^{1}, \tilde{z}^{2}, \ldots, \tilde{z}^{m}\}$ from encoder, $\tilde{z}^i = Enc(x^i)$.m
images $\{\tilde{x}^{1}, \tilde{x}^{2}, \ldots, \tilde{x}^{m}\}$ from decoder, $\tilde{x}^i = Dec(\tilde{z}^i)$.m
noise samples $\{z^{1}, z^{2}, \ldots, z^{m}\}$ from the prior $P_{prior}(z)$.m
images $\{\hat{x}^{1}, \hat{x}^{2}, \ldots, \hat{x}^{m}\}$ from decoder, $\hat{x}^i = Dec(z^i)$.Update Enc to decrease reconstruction error of MSE $\lVert \tilde{x}^i - x^i \rVert$, decrease $\textit{KL-divergence}(P(\tilde{z}^i |
x^i) \Vert P(z))$ |
Dec
to decrease reconstruction error of MSE
$\lVert \tilde{x}^i - x^i \rVert$, increase binary cross entropy $D(\tilde{x}^i)$ and $D(\hat{x}^i)$.D
to increase binary cross entropy $D(x^i)$, decrease $D(\tilde{x}^i)$ and $D(\hat{x}^i)$.Info: Another kind of discriminator can be implemented to output three labels of the result: real, generated and reconstructed.
Enc
, Dec
, D
m
images $\{ x^{1}, x^{2}, \ldots, x^{m} \}$ from database distribution $P_{data}(x)$.m
codes $\{\tilde{z}^{1}, \tilde{z}^{2}, \ldots, \tilde{z}^{m}\}$ from encoder, $\tilde{z}^i = Enc(x^i)$.m
images $\{\tilde{x}^{1}, \tilde{x}^{2}, \ldots, \tilde{x}^{m}\}$ from decoder, $\tilde{x}^i = Dec(\tilde{z}^i)$.m
noise samples $\{z^{1}, z^{2}, \ldots, z^{m}\}$ from the prior $P_{prior}(z)$.m
images $\{\hat{x}^{1}, \hat{x}^{2}, \ldots, \hat{x}^{m}\}$ from decoder, $\hat{x}^i = Dec(z^i)$.Update Enc to decrease reconstruction error of MSE $\lVert \tilde{x}^i - x^i \rVert$, decrease $\textit{KL-divergence}(P(\tilde{z}^i |
x^i) \Vert P(z))$ |
Dec
to decrease reconstruction error of MSE
$\lVert \tilde{x}^i - x^i \rVert$, increase binary cross entropy $D(\tilde{x}^i)$ and $D(\hat{x}^i)$.D
to increase binary cross entropy $D(x^i)$, decrease $D(\tilde{x}^i)$ and $D(\hat{x}^i)$.Info: Another kind of discriminator can be implemented to output three labels of the result: real, generated and reconstructed.
x
from the dataset and generate code z
, and make a pair (x, z)
(the corresponding distribution P(x, z)
) for discriminator.
P(x, z)
is from eecoder.z'
sampled from the prior distribution and generate image x'
, and make a pair (x', z)
(the corresponding distribution Q(x', z')
) for discriminator.
Q(x', z')
is from dncoder.P(x, z)
will be the same as Q(x', z')
.z
will be similar to the code generated from the prior distribution, and the image x'
from decoder will be real.Enc(x) = z
=> Dec(z') = x
for all x'
.Dec(z) = x
=> Enc(x) = z
for all z
.Enc
, Dec
, D
m
images $\{ x^{1}, x^{2}, \ldots, x^{m} \}$ from database distribution $P_{data}(x)$.m
codes $\{\tilde{z}^{1}, \tilde{z}^{2}, \ldots, \tilde{z}^{m}\}$ from encoder, $\tilde{z}^i = Enc(x^i)$.m
noise samples $\{z^{1}, z^{2}, \ldots, z^{m}\}$ from the prior $P_{prior}(z)$.m
images $\{\tilde{x}^{1}, \tilde{x}^{2}, \ldots, \tilde{x}^{m}\}$ from decoder, $\tilde{x}^i = Dec(z^i)$.D
to increase $D(x^i, \tilde{z}^i)$, decrease $D(\tilde{x}^i, z^i)$.Enc
to decrease $D(x^i, \tilde{z}^i)$.Dec
to increase $D(\tilde{x}^i, z^i)$.Note: It doesn’t matter that the D
gives the positive score to the pair from Enc
or another. What matters is that the score D
gives to Enc
or Dec
should be opposite. The objective of Enc
and Dec
should also be the opposite of the objective of D
so that D
could not discriminate between the generated pair after the advesarial training.
TODO
Note: Considering $f$ is the convex function, the above inequation can be established according to Jessen inequation:
$ \varphi\left(\frac{1}{b-a} \int_{a}^{b} f(x) d x\right) \leq \frac{1}{b-a} \int_{a}^{b} \varphi(f(x)) d x $
f
function can be differentTake $x$ as $\frac{p(x)}{q(x)}$ :
\(\begin{aligned}
D_{f}(P \| Q) &=\int_{x} q(x) f\left(\frac{p(x)}{q(x)}\right) d x \\
&=\int_{x} q(x)\left(\max _{t \in \operatorname{dom}\left(f^{*}\right)}\left\{\frac{p (x)}{q(x)} t-f^{*}(t)\right\}\right) d x \\
&\color{red}{\geq} \int_{x} q(x)\left(\frac{p(x)}{q(x)} \color{red}{D(x)}-f^{*}( \color{red}{D(x)})\right) d x \\
&=\int_{x} p(x) D(x) d x-\int_{x} q(x) f^{*}(D(x)) d x \\
\end{aligned}\)
For the divergence between $P$ and $Q$:
\(\begin{aligned}
D_{f}(P \| Q) &\approx \max _{\mathrm{D}} \int_{x} p(x) D(x) d x-\int_{x} q(x) f^{*} (D(x)) d x \\
&=\max _{\mathrm{D}}\left\{E_{x \sim P}[D(x)]-E_{x \sim Q}\left[f^{*}(D(x))\right] \right\} \\
&\quad\text { Samples from P } \quad \text { Samples from Q } \\
\end{aligned}\)
The problem may be caused by the bad selection of divergence
As the below picture shows:
KL-divergence
, where the distribution of the well-trained generator falls between the peak of the true data distribution.
Info: Images are the low-dimensional manifold in high-dimensional space. If we hypothesis the space is only 3D, the images is a line or plane in this 3D space.
2log2
constant item, the gradient of which would be zero, which could cause the training stop.log2
(ignore the coefficient 2
ahead).JS-divergence
will be 0
, which is the final target of generator.Note: Strictly, according to the above equation of the objective of the discriminator, the JS-divergence is closed to 0
and the total objective is log2
if there is barely enough overlap between distributions.
log2
of the objective gives no guide for the generator to judge which situation is worse. In other words, $P_{G_0}$ and $P_{G_1}$ are considered equally bad. Therefore, the generator will not update from $P_{G_0}$ to $P_{G_1}$.JS-divergence
is 0
, binary classifier will always achieves 100%
accuracy.Most important: For the discriminator, a binary classifier, as long as the output of the classification is the false label, the loss is constant regardless of the deviation between the current $P_G$ and $P_{data}$. This property of JS-divergence
makes it impossible to measure the distance between the non-overlapping distributions and hence the generator cannot be optimized properly.
sigmoid
nonlinear transform.
Assume two discrete distributions $P_r$ and $P_{\theta}$, each with $l$ possible states $x$ or $y$ respectively.
There are infinitely ways to move the earth around, but we only need to find the optimal one for EMD.
infimum
, or the greatest lower bound (下确界), roughly meaning the minimum, the opposite of which is $\sup$ for supremum
(上确界)TODO: the derivation.
1-Lipschitz
if and only if it has gradients with a norm less than or equal to 1 everywhere.L2-norm
, which constrain the largest gradient to 1
1
everywhere.1-Lipschitz
continuity.TODO: paper, https://blog.csdn.net/c9Yv2cf9I06K2A9E/article/details/87220341
TODO paper read https://blog.csdn.net/a312863063/article/details/88125429
TODO paper read
TODO paper read https://www.cnblogs.com/king-lps/p/8552177.html
log
divergence with Wasserstein Distance.k
times for fully trained.
m
examples $\{ x^{1}, x^{2}, \ldots, x^{m} \}$ from database distribution $P_{data}(x)$.m
noise samples $\{z^{1}, z^{2}, \ldots, z^{m}\}$ from the prior $P_{prior}(z)$.m
noise samples $\{z^{1}, z^{2}, \ldots, z^{m}\}$ from the prior $P_{prior}(z)$.TODO: update $\tilde{V}$ with gradient penalty, check code.
Reason to add noise: Without noise, the network only learn the map from the input (e.g. text ‘train’, ‘car’, etc.) to the average of multiple positive samples (e.g. the average of the front and side train), which is very bad.
There are two commonly used prototype of structure of Discriminator:
Progressively generate large resolution image by stacking generator.
TODO
TODO
TODO
Reconstruction error is built to guide each branch of network can reconstruct images in each domain after the encoding.
Discriminator optimize the generation quality and solve obscure images produced by autoencoder (learn by averaging)
After separately training, each row of the network has ability to reconstruct image in each domain.
[0, 0, 1]
by $EN_X$, but black hair in the domain $B$ may be projected to [1, 0, 0]
by $EN_Y$.TODO Couple GAN, UNIT
TODO ComboGAN
TODO XGAN TODO DTN: unsupervised
TODO Bram matrix
Each dimension of input vector represents some characteristic, which means that change the input of the noise, some of the characteristic of the output image will change.
The outout of discriminator is commonly a scalar. The larger value means realistic, while smaller means fake.
Algorithm Initialize $\theta_{d}$ for $\mathrm{D}$, and $\theta_{g}$ for $\mathrm{G}$
TODO VAE
TODO TODO result in what?
TODO structured learning
argmax
functionInfo: Manifold, in short, generalizes the triangle, circle, squares, etc. to higher dimensions. If we take a closer look at the N-dimensional manifold (N-manifold), it looks like N-1 dimensional manifold.
The ensemble of images from the sampling constitute the data distribution $P_{data}(x)$, which only occupies the tiny part of the whole high-dimensional space.
The point outside of the manifold means there will be low probablity of the true sampling.
Building a generator is the process of building a mapping from one known distribution to the data distribution in high level.
data
at $x$ sampling point.G
at $x$ sampling point with parameter $\theta$.TODO: Expectation Maximum
mean
and variance
of GMM looks like, it fails to resemble the target distribution.G
is a network, which defines a probability distirbution $P_G$, mapping the input distribution to the target.
Info: Div
is the divergence between two distribution, e.g. KL-divergence, JS-divergence.
Recall the objective function for generator G
:
\(G^{*}=\arg \min _G \operatorname{Div}\left(P_{G}, P_{\text {data}}\right)\)
D
:1
0
D
is a binary classifierG
, the optimal $D^*$ maximizing:D(x)
can represent any function.
D(x)
can represent any function, and neural network indeed can be regarded as piecewise function mathematically.\(\begin{aligned} V &=E_{x \sim P_{\text {data}}}[\log D(x)]+E_{x \sim P_{G}}[\log (1-D(x))] \\[15px] &=E_{x \sim P_{\text {data}}}\left[\log \frac{P_{\text {data}}(x)}{P_{\text {data}}(x)+P_{G}(x)}\right]+E_{x \sim P_{G}}\left[\log \frac{P_{G}(x)}{P_{\text {data}}(x)+P_{G}(x)}\right] \\[15px] &=\int_{x} P_{\text {data}}(x) \log \frac{P_{\text {data}}(x)}{P_{\text {data}}(x)+P_{G}(x)} d x+\int_{x} P_{G}(x) \log \frac{P_{G}(x)}{P_{\text {data}}(x)+P_{G}(x)} d x \\[15px] &=\int_{x} P_{\text {data}}(x) \log \frac{\frac{1}{2}P_{\text {data}}(x)}{\frac{1}{2}[P_{\text {data}}(x)+P_{G}(x)]} d x+\int_{x} P_{G}(x) \log \frac{\frac{1}{2}P_{G}(x)}{\frac{1}{2}[P_{\text {data}}(x)+P_{G}(x)]} d x \\[15px] &=-2log2 + \int_{x} P_{\text {data}}(x) \log \frac{P_{\text {data}}(x)}{\frac{1}{2}[P_{\text {data}}(x)+P_{G}(x)]} d x+\int_{x} P_{G}(x) \log \frac{P_{G}(x)}{\frac{1}{2}[P_{\text {data}}(x)+P_{G}(x)]} d x \\[15px] &=-2log2 + K L\left(P_{\text {data}} \| \frac{P_{\text {data}}(x)+P_{G}(x)}{2}\right) + K L\left(P_{\text {G}} \| \frac{P_{\text {data}}(x)+P_{G}(x)}{2}\right) \\[15px] &=-2log2 + \color{red}{2JS\left(P_{\text {data}} \| P_{\text {G}}\right)} \end{aligned}\)
TODO: inverse KL divergence
G
is to minimize the divergence between $P_G$ and $P_{data}$D
, which maximizes the cross entropy as a binary classifier.G
, update discriminator $D$.D
, update generator $G$.G
minimizing the loss functionG
Critical: Actually, after we get the optimal $D_i^*$ by maximizing the JS-divergence, given the fixed $G_i$, we do gradient descent optimizing $G_i$ a little bit per step to finally get $G_{i+1}$. However, in the process of the gradient descent of $G_i$ to $G_{i+1}$, the divergence function $V(G_i, D_i)$ may change a lot, which cause the maximum objective value of JS-divergence
shift to somewhere else.
Positive examples
.Negative examples
.k
times for fully trained.
m
examples $\{ x^{1}, x^{2}, \ldots, x^{m} \}$ from database distribution $P_{data}(x)$.m
noise samples $\{z^{1}, z^{2}, \ldots, z^{m}\}$ from the prior $P_{prior}(z)$.m
noise samples $\{z^{1}, z^{2}, \ldots, z^{m}\}$ from the prior $P_{prior}(z)$.Note that it is impossible for $D$ to find the maximum value of $\max_D V(G,D)$, which means that the JS-divergence can never be reached, because:
1. The learning ability of $D$ is limited due to the iteration times and learning rate, the $D$ can not convergence in several update steps.
2. Even with the infinite training step and the convergence achieved, it is more likely for $D$ to fall in local minima.
3. Even without trapped in local minima, the capcity of the generatlization of $D$ is limited, which contradict the assumption that $D$ can represent all functions.
Tip: The difference between Blinn-Phong model and Phong model is the half vector, which is computed by adding incident vector and the vector of viewing direction. With dot product, Computing reflected is less efficient than half vector.
Info: BRDF distributes as Dirac δ function(distribution). The total area under BRDF curve is the albedo specifying what fraction of incident light is reflected in total, which is as opposed to being abosorbed or transmitted. For the perfect reflector, the area under the curbe sums to 1, because it reflects all of the incident light but in one direction. In this case, the δ function, with the area of 1, is an infinite thin and infinite tall spike at $x = 0$, which is ideal reflective material.
Note: Obviously, a real material can not be perfect specular reflector. First, the area under the curve is less than 1 due to some of the light absorbed. More importantly, the reflection peak can not be infinite thin so that the reflection will be blurred ever so slightly. This implicates that the peak will not be infinitely high. The wider the peak, the less hight it has to be to maintain the area of 1.
Medium | $\eta ^*$ |
---|---|
Vaccum | 1.0 |
Air (sea level) | 1.00029 |
Water(20°C) | 1.333 |
Glass | 1.5-1.6 |
Diamond | 2.42 |
index of refraction is wavelength dependent
Total internal reflection
Light incident on boundary from large enough angle may not exit medium
Tip: The sphere is symmetric object, which means that the light ray refracted into the sphere will always be refracted out.
info: BTDF (bidirectional transmittance distribution function) is similar to BRDF (bidirectional reflectance distribution function) but for the opposite side of the surface, especially used to evaluate how outgoing refractive light distributes at incident point. Further, BSDF (bidirectional scattering distribution function) is the generalization of BRDF and BTDF. Moreover, BSSRDF (bidirectional scattering-surface reflectance distribution function) describes the relationship between outgoing radiance and the incident flux, including the phenonmona like subsurface scattering.
Fresnel Reflection: predict the reflectance of smooth surfaces, and depends solely on the refractive index and the angle of incidence.
The inverted reflection in water gradually evanish with the increasing incident angle between the normal of horizontal plane and incident light.
Tip: Index of refraction of the conductor is actually the complex number, which contains two number $n,\ k$.
Info: Polarization is defined relative to the plane of incidence, i.e. the plane that contains the incoming and reflected rays as well as the normal to the sample surface. Perpendicular (s-)polarization is the polarization where the electric field is perpendicular to the plane of incidence. Parallel (p-)polarization is the polarization where the electric field is parallel to the plane of incidence.
Microfacet BRDF: distribution of microfacets’ normals
Info: Three Important Geometric Effects to Consider with Microfacet Reflection Models. (a) Masking: the microfacet of interest isn’t visible to the viewer due to occlusion by another microfacet. (b) Shadowing: analogously, light doesn’t reach the microfacet. (c) Interreflection: light bounces among the microfacets before reaching the viewer. Shadow-masking effect is obvious when the angle between incident light or viewing direction and normal is aounrd 90°, which we called grazing angle (掠射角).
for outgoing direction w_o:
move light to illuminate surface with a thin beam from w_o
for each incoming direction w_i:
move sensor to be at direction w_i from surface
measure incident radiance
Tip: Analogous Irradiance to Electric field intensity, Radiant flux to Magnetic Flux.
Differential irradiance incoming: $d E\left(\omega_{i}\right)=L\left(\omega_{i}\right) \cos \theta_{i} d \omega_{i}$
Differential radiance exiting (due to $dE(\omega_i)$): $dL_r(\omega_r)$
BRDF can be understanded in the way of energy distribution after the point on the surface absorbs the energy from directions all around and then reflects.
Caveat: we assume that all directions are pointing outwards, and field of integration is hemisphere, which is the reason for $\cos\theta$ replacing $max(\cos\theta, 0)$.
Volume rendering equation is a more general form of rendering equation.
The solution of renderings equation will be narrated in the following part.
Understanding of the rendering equation
Tip: $-\omega$ is because the light direction is pointing opposite of the incident direction with regards to $x^\prime$.
Tip: Use substitution rule for integrals from the current object to the object of reflected light source will achieve $d\omega_i$ to $dv$, which will be detailed in solution part.
Equation of the first kind
\[g(t)=\int_{a}^{b} K(t, s) f(s) \mathrm{d} s\]Equation of the second kind
\[\varphi(t)=f(t)+\lambda \int_{a}^{b} K(t, s) \varphi(s) \mathrm{d} s\]operator if it has two properties
\[f(x + y) = f(x) + f(y)\] \[f(cx) = cf(x)\]The most common examples of linear operators $K$ of differentiation and integration
\[(K \bullet f)(u)=\frac{\partial f}{\partial u}(u)\] \[(K \bullet f)(u)=\int k(u, v) f(v) d v\]Monte Carlo Integration
\[\int f(x) dx = \frac{1}{N} \sum_{i=1}^{N} \frac{f\left(X_{i}\right)}{p\left(X_{i}\right)}\]But these simplifications are not reasonable
Tip: In the left image of the above figure, the region that are not reached by direct illumination are shadowed in black, whereas the diffuse ambient actually illuminates the region. In the right image, the left side of rectangular is shaded in red, and right side of cubic is shaded in green, This phenomenon in which objects or surfaces are colored by reflection of colored light from nearby surfaces is called color bleeding.
Use Monte Carlo method to solve this integration numerically
\[\int_{a}^{b} f(x) \mathrm{d} x \approx \frac{1}{N} \sum_{k=1}^{N} \frac{f\left(X_{k}\right)}{p\left(X_{k}\right)} \quad X_{k} \sim p(x)\]Shading algorithm for direct illumination
def shade(p, wo)
Randomly choose **N** directions wi~pdf
Lo = 0.0
for each wi:
Trace a ray r(p, wi)
if ray r hit the light:
Lo += (1 / N) * L_i * f_r * cosine / pdf(wi)
return Lo
def shade(p, wo)
Randomly choose **N** directions wi~pdf
Lo = 0.0
for each wi:
Trace a ray r(p, wi)
if ray r hit the light:
Lo += (1 / N) * L_i * f_r * cosine / pdf(wi)
# Add the following branch
elif ray r hit an object at q:
Lo += (1 / N) * shade(q, -wi) * f_r * cosine / pdf(wi)
# minus wi is due to the reversed direction of the light towards object p
return Lo
num_rays
as num_bounce
goes upnum_ray
will not explode if an only if $\color{red}{N = 1}$
Path Tracing: only 1 ray is traced at each shading point
def shade(p, wo)
# modify N to One
Randomly choose **One** directions wi~pdf
Trace a ray r(p, wi)
if ray r hit the light:
return (1 / N) * L_i * f_r * cosine / pdf(wi)
elif ray r hit an object at q:
return (1 / N) * shade(q, -wi) * f_r * cosine / pdf(wi)
info: Distributed Ray Tracing: $ N != 1$
def ray_generation(p, wo)
Uniformly choose **N** sample positions within the pixel
pixel_radiance = 0.0
for each sample in the pixel:
shoot a ray r(cam_pos, camera_to_sample)
if ray r hit the scene at p:
pixel_radiance += 1 / N * shade(p, - camera_to_sample)
return pixel_radiance
num_bounce
is to cut energy
and reduce authenticitydef shade(p, wo)
# Add the following branch
Manually specify a probability p_RR
Randomly select ksi in a uniform distribution in [0, 1]
if ksi > p_RR:
return 0.0
Randomly choose **One** directions wi~pdf
Trace a ray r(p, wi)
if ray r hit the light:
return (1 / N) * L_i * f_r * cosine / pdf(wi) / p_RR
elif ray r hit an object at q:
return (1 / N) * shade(q, -wi) * f_r * cosine / pdf(wi) / p_RR
def shade(p, wo)
# Contribution from the light source
Uniformly sample the light at x' (pdf_light = 1 / A)
L_dir = 0.0
Shoot a ray from p to x'
if the ray is not blocked in the middle:
L_dir = Li * f_r * cosθ * cosθ' / |x' - p|^2 / pdf_light
# Contribution from other reflections
## Test RR
Manually specify a probability p_RR
Randomly select ksi in a uniform distribution in [0, 1]
if ksi > p_RR:
return L_dir
## Ray Travce
L_indir = 0.0
Uniformly sample the hemisphere toward wi (pdf_hemi = 1 / 2pi)
Trace a ray r(p, wi)
if ray r hit a non-emitting object at q:
L_indir = shade(q, -wi) * f_r * cosθ / pdf_hemi / p_RR
return L_dir + L_indir
Pinhole Camera Model
Generate an image by casting one ray per pixel
Info: we usually don’t care about how to solve the equation, because numerical optimization will always help us to find an approximate solution of the problem corresponding to the given equation of implicit surface.
Tip: Point in Polygon test is: given a point $p$ and a polygon $Q$, draw a line from the query point $p$ to other point far away in the plane, which should be outside $Q$, then count the number of intersection of this ray with $Q$. If odd number of intersections, the point $p$ is inside polygon $Q$, otherwise outside.
Only care 0, 1 intersection, ignore multiple edge cases
Box is the intersection of 3 pairs of slabs
Axis-Aligned Bounding Box (AABB): the smallest enclosing box with any side along either x, y, z axis.
Tip: Choosing split position of median object can be implement by sorting the barycentric of each triangle and select the median one, which take $O(nlog(n))$ time. While quick selection algorithm for finding the kth smallest value can optimize the consuming complexity to $O(n)$.
Intersect(Ray ray, BVH node) {
if (ray misses node.bbox)
return;
if (node is a leaf node){
test intersection with all objects;
return closest intersection;
}
hit1 = Intersect(ray, node.child1);
hit2 = Intersect(ray, node.child2);
return the closer of hit1, hit2; // ATTENTION: closer hit point is returned!
}
Take $f(x, y, z)=\left(2-\sqrt{x^{2}+y^{2}}\right)^{2}+z^{2}-1$ as an example:
Take $f(x, y, z)=x^{2}+y^{2}+z^{2}-1$ as an example:
Signed Distance Functions (SDF): determines the distance of a given point $x$ from the boundary of $\omega$. The function has positive values at points $x$ outside $\omega$, it decreases in value as $x$ approches the boundary of $\omega$ where the signed distance function is zero.
Instead of Booleans, gradually blend surfaces together using Distance functions:
Blending any two distance functions: every point in the space can be represented as distance function, blending of two objects in the space is to sum up two distance functions. The boundary forms when the distance function is 0
More details about distance function, SDF code about various primitives and useful tutorial
Provides much more explicit control over shape (like a texture)
Take $f(u, v)=((2+\cos u) \cos v,(2+\cos u) \sin v, \sin u)$ as an example
Note: Bézier Curve is a representation of explicit geometry since it is way of parameter mapping
Same
means both direction and the amount of value.TIP: piecewise (片段) in 2D curve vs patch (片) in 3D surface
Tip: $n$ is vertex degree. In graph theory, the degree of a vertex is the number of edges that are incident to the vertex.
Note: Loop is family name of the founder Charles Loop of Loop Subdivision, so it has nothing to do with looping.
Note: $q$ is a regularized parameter of plane, where $a^2+b^2+c^2 = 1$.
Note: For each step, the edge corresponding to the least quadric error is selected to be collasped. Clearly, this is a way of reaching local optimum for each step to achieve global optimum, namely greedy strategy. However strictly speaking, this approch does not prove that the global optimal solution can be reached, but this insignificant error is allowed by default in CG.
Texture is applied to Surface
We didn’t care about how mapping relationship of triangles produces between model and texture. The mapping and definition of primitives (triangles) are known.
Tip: No matter the texture is square or not, $u$ and $v$ are in range of $(0,1)$.
e.g. $A$ is when $(\alpha, \beta, \gamma) = (1, 0, 0)$
Barycentric coordinate of centroid: $(\alpha, \beta, \gamma) = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})$, then $(x, y)=\frac{1}{3} A+\frac{1}{3} B+\frac{1}{3} C$
Linear interpolate values at vertices
Note: barycentric coordinates are not invariant under projection! Thus interpolate every time after projections.
for each rasterized screen sample (x, y): // Usually a pixel's center
(u, v) = evaluate texture coordinate at (x, y); // Using barycentric coordinates
texcolor = texture.sample(u, v); // get texture (u,v)
set sample's color to texcolor; // Usually the diffuse albedo Kd of Blinn-Phong reflectance model
Info: For each point on the object or scene, mapping it to the corresponding point on the low resolution texture will get non-integer coordinates of texture. Rounding off is the common process to obtain non-floating coordinate texel. As a result, multiple pixels of the object or scene will corresponds to the same texel of the texture, which means the generated figure is obscure of low quality. Thus interpolation handle it.
Black points: indicate texture sample locations (the center of texel)
For the nearest method of texture applying, take the red point as an example, any mapping point in the square which the red point and $u_{11}$ locates at will take $u_{11}$ as a texel. When the object or scene is very large comparing to the texture, multiple pixels will map to the same texel as $u_{11}$. That is the reason why jaggies and artifacts are consipicuous in the above nearest figure.
For the bilinear interpolation, it takes 4 nearest sample locations with textrue values as labeled, blending the property and information of 4 texels, so the result is more smooth and realistic.
Supersampling to antialiasing: yes, high quality, but costly.
Info: Some data structure like K-d tree, segment tree, binary indexed tree etc. are good ways to solve range query problem.
There are total of $log(n)$ images
Info: We call this kind of structure, with images that represent the same content at a series of lower and lower sampling rates, image pyramid
in computer vision field.
The distance between the pixel and neighbouring in screen space is 1
Denote the pixel $(x,y)$ in screen space and texel $(u,v)$ in texture space
Denote transformation $\psi$ the mapping from image sapce to texture space as a linear mapping, the transformation equation, thus $u = \psi_{x}(x, y), v = \psi_{y}(x, y)$.
Jacobian matrix $J$ is the best linear approximation of $\psi$ in a neighbourhood of $(u,v)$ where $\psi$ is differentiable.
Recall that in linear transformation \(\left[\begin{array}{ll}a_{x} & b_{x} \\ a_{y} & b_{y}\end{array}\right]\left[\begin{array}{l}x \\ y\end{array}\right]=\left[\begin{array}{l}x^{\prime} \\ y^{\prime}\end{array}\right]\),\(\left[\begin{array}{ll}a_{x} & b_{x} \\ a_{y} & b_{y}\end{array}\right]\) is transformation matrix, the two columns of it \(\left[\begin{array}{l}a_{x} \\ a_{y}\end{array}\right]\) and \(\left[\begin{array}{l}a_{x} \\ a_{y}\end{array}\right]\) are two basis vector of the new space if the original basis is \(\left[\begin{array}{l}1 \\ 0\end{array}\right]\) and \(\left[\begin{array}{l}0 \\ 1\end{array}\right]\). Thus, we could take Jacobian matrix as transformation matrix mapping the pixel $(x, y)$ in screen space to the texture space $(u, v)$.
In screen space, suppose $(u, v)_{10}$ and $(u, v)_{01}$ as $(1, 0)$ and $(0, 1)$ relatively, the corresponding texel coordinate is computed by multiplying the transformation Jacobian matrix $J$:
Tip: Each column of Jacobian matrix is the new basis vector of the transformed space. More details
But Bilinear filtering only, where $D$ is clamped to nearest level, not support floating $D$ and the level $D$ is not continuous.
Trilinear Interpolation: perform two Bilinear interpolation in neighbouring levels and interpolate the interpolated result again.
Info: N x Anisotropic filtering in video games means the original figure will be copied of reduced size up to N times along horizontal and vertial axis. No matter the N increases to 4x, 8x, 16x or more, the ceiling of the space consumption is 4 times of the original texture. Generally, as long as the graphics memory is enough, the larger anisotropy will has no influence on the computing performance.
Info: Classic model in CG: Utah teapot, Stanford bunny, Stanford dragon, Cornell Box
Approximate derivatives at $p$ are \(dp/du = c_1 * [h(u+1) - h(u)] \\ dp/dv = c_2 * [h(v+1) - h(v)]\)
Note: These are all in local coordinate. Thus, the perturbed normal need to be transformed to world coordinate.
Info: In DirectX on Windows system, dynamic tessellation (动态曲面细分) is self-adaptive techniques to tessellate object geometry in displacement mapping stage, which dynamically splits the primitives of object into smaller part of triangle to match the frequency requirements of displacement mapping.
Perlin noise
is a good example of noise function to generate marble crack, mountain fluctuation etc.