br pert Often these methods do
pert. Often these methods do not take into account or do not take full advantage of this information throughout the learning itera-tions. Besides that, once a query is refined at each iteration more and more relevant images are returned. However, all these im-ages may not contribute to the learning process of a given image classifier. Therefore, to overcome these issues, active learning (AL) strategies can be embedded into the CBIR process.
AL  is a machine learning paradigm that selects the most informative samples for the learning process. It allows a small set of unlabeled learning samples to be selected and displayed, itera-tively, for expert annotations. Then, the annotated set is used for training a classifier. Several active learning techniques [18,19] have been developed using different selection strategies to obtain the most informative samples. Although well-known and widely used in different domains, many of them are unfeasible, specifically for the medical context and its inherent constraints (e.g. related to dealing with large datasets, interactive response times, and mini-mal expert intervention in the learning process).
To take into account these characteristics, we propose in the next section an active learning strategy dedicated to the RF in the CBIR process, based on the uncertainty and diversity criteria. We focus our main attention on the medical context, specifically in-volving the diagnosis of breast cancer.
We proposed an approach, named as Medical Active leaRning and Retrieval (MARRow), that Necrosulfonamide active learning strategies for content-based breast image retrieval. In the first iteration, the expert performs the traditional annotation process (indicating rel-evant and irrelevant images, according to a given query image), as the classic RF loop. Next, from the second iteration forward, the selected and retrieved images, which will be trained in an active learning process, are those that will most contribute to the learn-ing process of a given classifier. Unlike the literature works, in our approach the most informative images are those that present the best balance between not only the similarity with the query im-age, but also certain degrees of diversity and uncertainty. In other words, those images that are from different classes and di cult to differentiate, when we compare the query image semantics and the retrieved image ones (e.g. images at the boundaries of two dif-ferent/overlapped classes).
For instance, in Fig. 1, we can see an example of two in-formative (uncertain) images located at the boundaries of two different classes, benign and malignant lesions, respectively, that will be presented to the expert (instead of only images closer to the query center). It is possible to notice that both images (regions of interest from different classes) present a high similarity degree regarding their lesions (highlighted by dashed lines) and
other tissues. Through our approach, we can balance the learning process with the set of images that will most contribute to reach a faster and higher accuracy of the classifier. Then, consequently, it will improve the quality of the returned images in the CBIR process. It occurs because the classifier will be trained with the most informative (similar and uncertain) images.
Algorithm 1 and Fig. 2 present the main steps of our proposed approach. The dashed lines (Fig. 2) represent the cycle of the in-cremental learning process. In Step 1, given an image dataset I, and a query image q, it is performed the selection of the best de-scriptor (best feature extractor and distance function pair). We an-alyzed several sets of feature extractors Fi and distance functions Dj (Algorithm 1, Line 1), due to their importance to the retrieval process. Afterwards, low-level features are extracted from I, us-ing the best feature extractor and generating the learning set Z2 (Algorithm 1, Line 2). We also extracted features from q, using the same extractor.
Considering Step 2, the learning set is partitioned in k clusters using a given clustering method. After the clustering process, it is generated the set of centroids C (Algorithm 1, Line 3). In addition to the images from C, we selected the most similar images to q. To do so, we obtained the desired number of images from LS, which is ordered by an increasing order of distance from q (Algorithm 1, Lines 4 − 5). Then, those images are presented to the expert for annotation (as relevant/irrelevant). The annotated images consti-tute the initial training set Z1 (Algorithm 1, Line 6), which is used in the classifier training process, generating the first instance of the learning model M (Step 3, Line 7).