Homogeneity measures and iterative clustering approach for object detection in an image
- PhD Student:
- Arthur Marzinkowski
- Co-Advisors :
- Salem Benferhat
- Anastasia Paparrizou
- Co-Supervisor :
- Cédric Piette
- Funding : Artois
- PhD defended on :
- Nov 28, 2025
Automatic data processing of non-structured data, such as images, videos, maps, or texts, represents a major challenge in artificial intelligence. These data, even though they are becoming more and more available and rich in information, remain difficult to exploit directly. Contrary to structured data, which can be processed easily with statistical analysis or represented with graphs and databases, visual and textual data require additional intermediary processing steps before being exploitable. Geographic maps illustrate this difficulty particularly well: they are constituted of multiple graphic elements and textual elements, but they are not directly interpretable without some preprocessing to structure them.
The data we are working with in this thesis are images that represent numerical maps. In many application cases of image processing, such as text recognition or object detection, these systems produce bounding boxes. These boxes, often axis-aligned, frame objects of interest or some informative zones of the visual content.
In the first part, we will focus on the automatic detection of aligned text boxes, in particular text that represents the legend region of the map. After we extract the text from the images with the help of OCR tools, we apply an iterative process of clustering on these bounding boxes. Five main criteria are used: text alignment, distance between the textual zones, background color, character color, and font size. For each of these criteria, we define similarity measures. We propose a method combining iteratively each cluster obtained from each criterion. The study reveals two essential results: first, using multiple criteria gives better results than those obtained with a simple distance (e.g., Euclidean) between text boxes; second, it confirms the efficiency of the priority we intuitively defined between each criterion for legend detection.
In the second part, we widen our case study: instead of just comparing two bounding boxes, we now consider a set of boxes, which we will simply call “clusters.” Contrary to the simple case of a pair of boxes, the analysis of a whole cluster requires a global reasoning, which takes into account the collective organization of text boxes in the images. Some criteria can be extended easily to this situation, while others introduce new considerations necessary to evaluate more complex configurations.
In addition to this approach of clustering, we develop a pairing algorithm between different text boxes. This method associates, for example, graphic symbols present in the map with the texts that describe them.
The last part of this work address the detection of objects int numeric maps. We are interested specifically to object defined in the legend and we seek to identify them in the rest of the map. To do this, we exploit different similarity mesures allowing us to compare legend objects with those detected in the map. We will illustrate our method by detecting objects representing lifting stations and manholes in a wastewater network.