04-02-2016, 03:11 PM
Abstract
In the series of our earlier papers on the subject, we proposed a novel statistical hypothesis testing method for detection of objects in noisy images. The method uses results from percolation theory and random graph theory. We developed algorithms that allowed to detect objects of unknown shapes in the presence of nonparametric noise of unknown level and of unknown distribution. No boundary shape constraints were imposed on the objects, only a weak bulk condition for the object’s interior was required. Our algorithms have linear complexity and exponential accuracy. In the present paper, we describe an implementation of our nonparametric hypothesis testing method. We provide a program that can be used for statistical experiments in image processing. This program is written in the statistical programming language R.
Introduction
Assume we observe a noisy digital image on a screen of N × N pixels. Object detection and image reconstruction for noisy images are two of the cornerstone problems in image analysis. In the series of papers Langovoy and Wittich (2009a), Davies et al. (2009), Langovoy and Wittich (2009b) and Langovoy and Wittich (2010), we proposed a new efficient technique for quick detection of objects in noisy images. Our approach uses mathematical percolation theory. Detection of objects in noisy images is the most basic problem of image analysis. Indeed, when one looks at a noisy image, the first question to ask is whether there is any object at all. This is also a primary question of interest in suchdiverse fields as, for example, cancer detection (Ricci-Vitiani et al. (2007)), automated urban analysis (Negri et al. (2006)), detection of cracks in buried pipes (Sinha and Fieguth (2006)), and other possible applications in astronomy, electron microscopy and neurology. Moreover, if there is just a random noise in the picture, it doesn’t make sense to run computationally intensive procedures for image reconstruction for this particular picture. Surprisingly, the vast majority of image analysis methods, both in statistics and in engineering, skip this stage and start immediately with image reconstruction. The crucial difference of our method is that we do not impose any shape or smoothness assumptions on the boundary of the object. This permits the detection of nonsmooth, irregular or disconnected objects in noisy images, under very mild assumptions on the object’s interior. This is especially suitable, for example, if one has to detect a highly irregular non-convex object in a noisy image. This is usually the case, for example, in the aforementioned fields of automated urban analysis, cancer detection and detection of cracks in materials. Although our detection procedure works for regular images as well, it is precisely the class of irregular images with unknown shape where our method can be very advantageous. We approached the object detection problem as a hypothesis testing problem within the class of statistical inverse problems in spatial statistics. We were able to extend our approach to the nonparametric case of unknown noise density, and this density was not assumed smooth or even continuous. It is even possible that the noise distribution doesn’t have a density or that the noise distribution is heavy-tailed. We have shown that there is a deep connection between the spatial structure chosen for the discretisation of the image, the type of the noise distribution on the image and statistical properties of object detection. These results seem to be of independent interest for the field of spatial statistics. In our previous papers, we considered the case of square lattices in Langovoy ad Wittich (2009a) and Langovoy and Wittich (2009b), triangular lattices in Davies et al. (2009) and Langovoy and Wittich (2010) and even the case of general periodic lattices in Langovoy and Wittich (2010). In all those cases, we proved that our detection algorithms have linear complexity in terms of the number of pixels on the screen. These procedures are not only asymptotically consistent, but on top of that they have accuracy that grows exponentially with the ”number of pixels” in the object of detection. All of the detection algorithms have a built-in data-driven stopping rule, so there is no need in human assistance to stop an algorithm at an appropriate step. In view of the above, our method can be viewed as an unsupervised learning method, in the language of machine learning. This makes our results valuable for the field of machine learning as well. Indeed, we do not only propose an unsupervised method, but also prove the method’s consistency and even go as far as to prove the rates of convergence. In this paper, we describe an algorithmic implementation of our nonparametric hypothesis testing / image processing method. We also provide an actual program that was used for our research at the initial stages. This program is written in the statistical programming language R. It is known that, as a programming language, R is rather slow. At present, we have implemented our algorithms in C++, and they run substantially faster. However, R is a free and convenient language for statistical programming, since R has many advanced statistical procedures already built-in. This language also has convenient tools for generating random variables of various types, therefore R is suitable for statistical experiments with our method. The paper is organized as follows. Section 2.1 describes some basic methods for converting, storing and processing images in R. Our implementation of lattice neighborhood structures is given in Section 2.2. The Depth First Search algorithm and its R-implementation for image analysis are described in Section 3. In Section 4 we explain how the fast Newman-Ziff algorithm from discrete algorithmic probability can be effectively used in our image analysis method. Subsection 4 describes the theory behind the Newman-Ziff approach, while Subsection 4.1 introduces our modification of this algorithm, suitable for percolation probabilities that are inhomogeneous over the lattice. This modification allows for finer simulation studies for any given finite screen size, and this is particularly useful for small images. Subsections 4.2 and 4.3 are devoted to Rimplementations of the Newman-Ziff algorithm and the modified Newman-Ziff algorithms respectively