Zygmunt L. Szpak
Computer Vision & Machine Learning

Ellipse Fitting

$$\newcommand{\T}{\mathsf{T}} \newcommand{\ve}{\mathbf{#1}} \newcommand{\M}{\mathbf{#1}} \newcommand{\costAML}{J_{\mathrm{AML}}} \newcommand{\costALS}{J_{\mathrm{ALS}}} \newcommand{\de}{\mathbf{\theta}} \newcommand{\estALS}{\widehat{\de}_{\mathrm{ALS}}} \newcommand{\costDIR}{J_{\mathrm{DIR}}} \newcommand{\Lambdab}{\mathbf{\Lambda}} \newcommand{\cov}[]{\Lambdab_{#2}^{#1}} \newcommand{\estDIR}{\widehat{\de}_{\mathrm{DIR}}} \newcommand{\estGAML}{\widehat{\de}_{\mathrm{GAML}}} \newcommand{\estAML}{\widehat{\de}_{\mathrm{AML}}} \newcommand{\etab}{\mathbf{\eta}} \newcommand{\kappab}{\mathbf{\kappa}}$$ Together with Wojciech Chojnacki, I have been studying the classic problem of fitting an ellipse to data. At first glance it may seem that fitting an ellipse to data would be an easy task. It is after all a familiar and simple shape, and surely the problem has been completely solved by now. Unfortunately, this is not the case. Ellipse fitting harbours very many technical difficulties.

The problem boils down to the fact that the task of ellipse fitting falls into the category of errors-in-variables regression and is inherently non-linear. Nikolai Chernov does an excellent job of explaining why errors-in-variables models pose so many challenges. I recommend his historic overview on the subject in the first chapter of his book (the first chapter is available for download for free). In general you will find many fascinating and well written papers on errors-in-variables modelling on his website.

(With great sadness I discovered that Nikolai Chernov recently passed away)

Existing ellipse fitting methods

In a nutshell, there are two possible ways of fitting an ellipse to data. The first is called orthogonal distance regression, where one minimises a geometrically meaningful error: the orthogonal distance between data points and the ellipse. This is the cost function that arises naturally when one assumes independent Gaussian noise in the data points and applies the principle of maximum likelihood estimation. However, the implementation of this method is quite involved. In its classic formulation it requires, for each iteration, computing the roots of a quartic polynomial as an intermediate step. This step can pose many numerical problems.

An alternative approach is to minimise an algebraic cost function. Here one does not minimise a geometrically meaningful cost function. Instead, the cost function is based on how much a data point fails to satisfy an ellipse equation.

To clarify, if a data point $$(m_1,m_2)$$ is to lie on an ellipse, then it must satisfy the equation $$ax^2 + bxy + cy^2 + dx + ey + f = 0.$$ Actually, this equation is the general equation of a conic. It represents ellipses, parabolas and hyperbolas depending on whether the discriminant $$\Delta = b^2 - 4ac$$ is negative, zero or positive. So for a data point to satisfy an ellipse equation, the ellipse parameters must satisfy an ancillary constraint $$b^2-4ac <0$$.

Let $$\mathbf{\theta} = [a,b,c,d,e,f]^\T$$ be the vector of parameters, let $$\ve{x} = [m_1,m_2]^\T$$ be the vector of variables and let $$\ve{u}(\ve{x}) = [m_{1}^2, m_1 m_2, m_{2}^2, m_{1}, m_{2}, 1]^\T$$ be the vector of transformed data points. Then we can write the equation of a conic as $$\ve{\theta}^{\T}\ve{u}(\ve{x})$$.

One possible measure of how much a data point fails to satisfy the ellipse equation could be the algebraic least squares
$$\frac{\| \ve{\theta}^{\T}\ve{u}(\ve{x}) \|^2}{\|\ve{\theta}\|^2} = \frac{\ve{\theta}^\T \ve{u}(\ve{x}) \ve{u}(\ve{x})^\T \ve{\theta}}{\|\ve{\theta}\|^2}$$ where $$\|\ve{\theta}\|^2 = \left(\theta_1^2 + \ldots + \theta_6^2 \right)^2$$. Based in this observation, the algebraic least squares (ALS) cost function is: $$\costALS(\ve{\theta}; \ve{x}_1, \ve{x}_2, \ldots, \ve{x}_N) = \frac{\sum_{n=1}^N\ve{\theta}^\T \ve{u}(\ve{x}_i) \ve{u}(\ve{x}_i)^\T \ve{\theta}}{\|\ve{\theta}\|^2} = \frac{\ve{\theta}^\T \M{M} \ve{\theta}}{\ve{\theta}^\T \ve{\theta}},$$ where $$\M{M} = \sum_{n=1}^N\ve{\theta}^\T \ve{u}(\ve{x}_n) \ve{u}(\ve{x}_n)^\T \ve{\theta}$$. The reason that the algebraic least squares cost function is so popular, is because it is very easy to minimise. It has the form of the Rayleigh quotient that is so frequently encountered in many computer vision problems. The parameter vector $$\estALS$$ that minimises the algebraic cost function is the eigenvector associated with the smallest eigenvalue. So an eigendecomposition is all that is needed to find the estimate. This is simple to implement and very efficient.

However, notice that the ancillary constraint $$b^2 - 4ac < 0$$ is not enforced in the process. This means that potentially the estimate $$\estALS$$ could be a hyperbola or parabola instead of an ellipse. To overcome this limitation, Fitzgibbon et. al (1999) proposed a modification to the algebraic cost function that guaranteed an ellipse fit. They called it the Direct Ellipse Fit (DIR).

The main insight is that the ancillary constraint $$b^2 - 4ac < 0$$ can be written in matrix notation as $$\de^\T \M{F} \de > 0$$, where

$$\M{F} = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} \otimes \begin{bmatrix} 0 & 0 & 2 \\ 0 & -1 & 0 \\ 2 & 0 & 0 \end{bmatrix}$$

An ALS based estimator that guarantees an ellipse is the solution of the following optimization problem $$\min_{\de} \sum_{i=n}^N\ve{\theta}^\T \ve{u}(\ve{x}_n) \ve{u}(\ve{x}_n)^\T \ve{\theta} \quad \text{subject to} \quad \de^\T \M{F} \de = 1.$$ This is the same as the minimiser of $$\costDIR(\ve{\theta}; \ve{x}_1, \ve{x}_2, \ldots, \ve{x}_N) = \frac{\de^\T\M{M}\de}{\de^\T\M{F}\de}.$$

Because the matrix $$\M{F}$$ does not depend on the parameters $$\de$$, this is again a Rayleigh quotient problem. It too can be solved with the help of an eigendecomposition.

Our first contribution

One drawback of the direct ellipse fit is that it often produces biased estimates. When points are sampled from only a portion of the ellipse, then the estimated ellipse is often smaller than what it should be. One reason for the inferior performance of the direct ellipse fit estimator is that it does not consider the uncertainty of a data point in the cost function.

Assuming that the uncertainty of each data point $$\ve{x}_i$$ can be described by a $$k \times k$$ covariance matrix $$\Lambdab_{\ve{x}_i}$$, a so-called approximate maximum likelihood cost function can be devised $$\costAML(\de; \ve{x}_1, \ve{x}_2, \ldots, \ve{x}_n) = \sum_{n=1}^N \frac{\de^\T \M{A}_n \de}{\de^\T \M{B}_n \de}$$ with $$\M{A}_n = \ve{u}(\ve{x}_n) \ve{u}(\ve{x}_n)^\T \quad \text{and} \quad \M{B}_n = \partial_{\ve{x}}\ve{u}(\ve{x}_n) \cov{\ve{x}_n} \partial_{\ve{x}}\ve{u}(\ve{x}_n)^\T$$ for each $$n = 1, \dots, N$$. Here, for any length-$$2$$ vector $$\ve{y}$$, $$\partial_{\ve{x}}\ve{u}(\ve{y})$$ denotes the $$6 \times 2$$ matrix of the partial derivatives of the function $$\ve{x} \mapsto \ve{u}(\ve{x})$$ evaluated at $$\ve{y}$$, and, for each $$n = 1, \dots, N$$, $$\cov{\ve{x}_n}$$ is a $$2 \times 2$$ symmetric covariance matrix describing the uncertainty of the data point $$\ve{x}_n$$.

The cost function arises from a first order propagation of the uncertainty in the original $$\ve{x}_n$$ to the transformed data space $$\ve{u}(\ve{x}_n)$$. For small noise levels it is a very good approximation of the maximum likelihood cost function.

Our approach for guaranteeing an ellipse fit for the AML cost function is based on the following approach:

1. Start with an estimate $$\estDIR$$ that guarantees an ellipse so that we know that our parameters lie within the ellipse region of the parameter space.
2. For the estimate to become a hyperbola, the discriminant $$\Delta = b^2 - 4ac$$ has to pass through zero. If we divide by $$\Delta$$ in our cost function, the cost function will become very large as we approach a parabola because we will be dividing by a number that approaches zero. Since the cost goes to infinity as we approach the parabola, we should theoretically never pass through the parabola and end up with a hyperbola.
3. Conceptually, we are introducing a barrier term that ensures that our estimator always stays in the ellipse region of the search space.

The equation for the barrier function is

$$g(\de) = \frac{\|\de\|^4}{(\de^\T \M{F}\de)^2} = \|\de\|^4\Delta^{-2}.$$
The complete cost function that is minimised is $$\estGAML(\de; \ve{x}_1, \ve{x}_2, \ldots, \ve{x}_n) = \costAML(\de; \ve{x}_1, \ve{x}_2, \ldots, \ve{x}_n) + \alpha g(\de),$$ where $$\alpha$$ is a very small number.

A Gauss-Newton type method can be used to solve this problem. The only ingredient that is needed is to compute the Jacobian and Hessian of the cost function. We chose to re-write the cost function as a sum of squares, and used the Levenberg--Marquardt method to optimise the cost function. Occasionally, for ill-posed problems, the Levenberg--Marquardt algorithm takes such large steps that it overhoots the barrier function! We detect such cases, go back to the previous iterate and decrease the step-size to ensure that the barrier function is not breached.

You can download a MATLAB implementation of our algorithm here. You can also read the accompanying publication here.

Our second contribution

To be truly useful in a variety of contexts the ellipse estimate should come with a measure of uncertainty that is geometrically meaningful. Our second contribution is the derivation of a covariance matrix for our approximate maximum likelihood ellipse estimate. To facilitate the interpretation of the uncertainty measure, we propagate the uncertainty in the algebraic ellipse parameters through to geometrically meaningful ellipse parameters, such as the centre, axes and orientation of the ellipse. We also propose an ellipse-specific confidence region; that is, a confidence region in the plane which is meant to cover the in-plane locus of the true ellipse with a given high probability. See Figure 1 for an example. Example of geometrically meaningful uncertainty measure and ellipse-specific confidence region.

We also discovered that it is possible to guarantee an ellipse estimate through a suitable choice of algebraic parameters. We noted that for $$\de$$ to describe an ellipse $$a \neq 0$$ and $$c\neq 0$$, for otherwise---should $$a = 0$$ or $$c = 0$$ hold---the discriminant $$\Delta = b^2 - 4ac = b^2$$ would be non-negative, and this would contradict the fact that for ellipses the discriminant is always negative. Given this and the fact that the scale of $$\de$$ is irrelevant as far as the determination of the underlying ellipse is involved, it is safe to assume that $$a = 1$$. Under this assumption the condition $$\Delta < 0$$ becomes $$4c > b^2$$, and letting $$b = 2p$$, we see that $$c > p^2$$, or, equivalently, $$c = p^2 + q^{-2}$$ for some $$q$$. This argument indicates that the set of ellipses can be parametrised as $$\de = \kappab(\etab), \quad \kappab(\etab) = [1, 2p, p^2+q^{-2}, r, s, t]^\T,$$ with $$\etab = [p,q,r,s,t]^\T$$ running over all possible length-$$5$$ vectors.

The discovery of this implicit ellipse parametrisation suggests that the barrier term $$g(\de)$$ can be omitted, thus resulting in an even simpler algorithm. Nevertheless, some important nuances need to be taken into account to ensure a fully reliable implementation of the algorithm for scenarios where data points are sampled from only a very short segment of the ellipse (a very ill-posed problem). For example, while the parametrisation guarantees ellipses it also admits degenerate ellipses, so special stopping criteria are required to terminate the algorithm if it approaches a degenerate ellipse solution.

More details on algorithm together as well as comprehensive experimental results can be found in the accompanying publication. You can also download a MATLAB implementation of our algorithm here.

last update: Tue Oct 06, 2015