Untitled

\pdfoutput=1
\documentclass[a4paper,12pt,titlepage, twoside]{article}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{amssymb,amsmath}
\usepackage{algorithm,algpseudocode}
\usepackage[title,titletoc]{appendix}

\begin{document}


% todo: compute Inception accuracy

\section{Image Dewarping}

The fish-eye images are very distorted and standard neural net detectors, which have been pretrained on imagenet, have difficulty correctly recognizing objects. The main idea of this section is

\subsection{Scene localization}
\label{sec:scene_localization}
Before we start decomposing the image, we need to compensate for another hardware error of the camera. As can be seen in the image the scene is not exactly in the middle. It is not even circle, but rather an ellipse. Properties of the elipse could be measured manually, but our system will be deployed on multiple cameras and due to uncertainty in the manufacturing process, each scene has a different location and distortion. Since we need a high precision for further position estimation, an universal algorithm for detecting the sphere has been introduced.

The algorithm is based on optimization. It takes an image and produces parameters of the ellipse. From observation, the ellipse can only be either the horizontal major axis or the vertical major axis ellipse. The used equation is rather unussual, but it allows faster cost function evaluation.

\begin{equation}
\frac{(x-s_x)^2}{a} + \frac{(y-s_y)^2}{1} = r^2
\end{equation}

Now we need to find the parameters $s_x, s_y, a, r$.

The original image $I$ of the size $H, W$ and channels $I_1, I_2, I_3$ is transformed to a mask $M$ of the same size by thresholding the total sum of chanals on 8 bit scale is grather or equal to 1. \cite{lukacs1997real}

\begin{equation*}
M_{x,y} = \begin{cases}
1 & if \quad \sum_{i=1}^{3} I_{i,x,y} \geq 1 \\
0 & otherwise
\end{cases}
\end{equation*}

The mask $M$ represents the scene by the pixels with the value 1 and the background by the pixels with the value 0.

We create an additional mask $E(s_x, s_y, a, r)$ of the ellipse as

\begin{equation*}
E_{x,y}(s_x, s_y, a, r) = \begin{cases}
1 & if \frac{(x-s_x)^2}{a} + \frac{(y-s_y)^2}{1} \leq r^2 \\
0 & otherwise
\end{cases}
\end{equation*}

The cost function $C(M, E(s_x, s_y, a, r))$ penalizes the pixels that have been masked as the scene and lie outside the ellipse and the pixels, that have been masked as background and lie inside the ellipse.

\begin{equation}
C(M, s_x, s_y, a, r) = \sum_{x = 0}^{W-1} \sum_{y = 0}^{H-1} E_{x,y}(s_x, s_y, a, r) \cdot (1-M_{x,y}) + (1 - E_{x,y}(s_x, s_y, a, r)) \cdot M_{x,y}
\end{equation}

The algorithm could evaluate all combinations of parameters, but the number of searched parameters can be grately reduced by searching in a coarse to to fine manner.


\begin{verbatim}
x, y, a, r := initialize_params()
center_step, a_step, r_step := initialize_steps()

while(not imporoving):
    x_list := [x - center_step, x, x + center_step]
    y_list := [y - center_step, y, y + center_step]
    a_list := [a - a_step, a, a + a_step]
    r_list := [r - r_step, r, r + r_step]

    x, y, a, r := select_best(x_list, y_list, a_list, r_list)

    center_step := center_step / 2
    a_step := a_step / 2
    r_step := r_step / 2

return x, y, a, r
\end{verbatim}

This algorithm quickly finds the ellipse with very high precision. Furthermore the most expensive function, $select\_best$ can be highly parallelized.

\subsection{Fish-eye model}
For propper detection and image dewarping we need to know the transformations between real world coordinates $x^w, y^w, x^w$ and the projection on the captured frame $x^f, y^f$. After applying the algorithm from \ref{sec:scene_localization}, we know, where in the frame the scene is projected. First, we will


\end{document}