Untitled

Chapter 1
INTRODUCTION
1.1. Motivation
During the last few years, there has been a rapid expansion of computer vision research. One area that attained great progress is human pose estimation. Given an image, a human pose estimation system must determine the pose of an articulated body, which consists of joints and rigid parts using image-based observations[wikipedia]. The reason for its importance is the abundance of applications that can benefit from such a technology. Amongst the most popular ones being: human – computer interaction, robotics, smart devices (e.g. smartphones), security (e.g. person tracking). It also benefits the marker-less motion capture technology.
 Traditionally, a human pose can be accurately reconstructed from the motion captured with markers attached to body parts. However, the use of markers prohibits pose estimation in real life cases, requires multiple devices and expense. As a result, an increasing number of studies have been focused on marker-less method. Despite many years of research, human pose estimation still presents a unique set of challenges. Among the most noticeable challenges are the variability of human visual appearance in images, variability in lightning conditions, variability in human physique, partial occlusions due to self-articulation and layering of objects in the scene, and the complexity of human skeletal structure. These current issues raised a need for a method that is capable of estimating a consistent pose while dealing with all of the aforementioned challenges.
In recent literature, the availability of deep learning based methods and large-scale databases makes the process of single person pose estimation increasingly reliable. Not only body parts are detected with excellent precision, spatial relation relations between parts can also be directly learned by a neural network. These approaches have been the center of researches for progress, but it failed to represent a realistic sample of real-world images. It is clearly that multi person case deserves more attention since it represents a real-world task.
As a principled solution for multi person pose estimation, all the body parts hypotheses are firstly detected and then assembled into plausible poses by minimizing a joint objective. In Deepcut and Deeper-cut, the proposed problem is cast in the form of Integer Linear Program (ILP) that facilitates feasible solutions with a certified optimality gap. These, combined with the neural network variants, obtain state-of-the-art results for multi-person pose estimation. The inference process, in the contrary, is proportional to the number of people and often too long for practical applications. Thus, in this thesis, we would like to specify and construct an additional module to the Deeper-cut framework that detect and localize person in images <constraint + sparse graph chua co>. The reason for adding such a module is the fact that with fewer body parts hypotheses, the optimization process can run faster <thoi gian cu the>.
1.2. Thesis objective and contents
The purpose of this thesis is to first introduce a novel multi-person pose estimation approach that jointly solves the tasks of detection and pose estimation. This joint formulation is in contrast to previous strategies that formulate the problem by first detecting people and subsequently estimating their body pose. It has the ability to disambiguate multiple and potentially overlapping persons, and is capable of assembling body parts even for rare body articulations. Furthermore, we propose a local pose estimation method that use a person detector to crop image regions contain the person of interest and feed to the Deeper-cut framework as input. We then evaluate our approach on the MPII Human Pose dataset and WAF dataset to statistically prove our improvement. <still not adding constraint + graph>


1.3. Contributions and thesis overview
1.3.1. Contributions
1.3.2. Thesis overview
The rest of this thesis is organized as follows.
Chapter 2 demonstrates details structure and implementation of the Deepcut and Deeper-cut approach. This chapter also provides theoretical background on 2D human pose estimation, basic knowledge of neural network in the field of pose estimation, as well as the optimization strategy used in the framework.
Chapter 3 mentions about our contribution to the framework. More specifically, we will describe our body part detector implementation and how it connected to the Deeper-cut framework.
Chapter 4 will clearly specify how we did our experiments and the result of evaluation in comparison with Deeper-cut.
The thesis ends with the conclusion and future works in chapter 5. In this chapter, we proposed some approaches that can improve Deeper-cut in the next stage.