Deep Learning Model-Based Detection of Anemia from Conjunctiva Images

Najmus Sehar; Nirmala Krishnamoorthi; C. Vinoth Kumar

doi:10.4258/hir.2025.31.1.57

Abstract

Objectives

Anemia is characterized by a reduction in red blood cells, leading to insufficient levels of hemoglobin, the molecule responsible for carrying oxygen. The current standard method for diagnosing anemia involves analyzing blood samples, a process that is time-consuming and can cause discomfort to participants. This study offers a comprehensive analysis of non-invasive anemia detection using conjunctiva images processed through various machine learning and deep learning models. The focus is on the palpebral conjunctiva, which is highly vascular and unaffected by melanin content.

Methods

Conjunctiva images from both anemic and non-anemic participants were captured using a smartphone. A total of 764 conjunctiva images were augmented to 4,315 images using the deep convolutional generative adversarial network model to prevent overfitting and enhance model robustness. These processed and augmented images were then utilized to train and test multiple models, including statistical regression, machine learning algorithms, and deep learning frameworks.

Results

The stacking ensemble framework, which includes the models VGG16, ResNet-50, and InceptionV3, achieved a high area under the curve score of 0.97. This score demonstrates the framework’s exceptional capability in detecting anemia through a non-invasive approach.

Conclusions

This study introduces a noninvasive method for detecting anemia using conjunctiva images obtained with a smartphone and processed using advanced deep learning techniques.

Go to :

I. Introduction

Anemia is characterized by a deficiency of red blood cells or a reduction in the concentration of hemoglobin, the molecule responsible for carrying oxygen. This leads to a diminished oxygen supply to the body’s tissues. Studies have shown that globally, 40% of children aged 6–59 months, 37% of pregnant women, and 30% of non-pregnant women aged 15–49 are affected by anemia [1]. The groups most vulnerable to anemia include menstruating women and adolescent girls, infants under 2 years of age, pregnant women, and women in the postpartum period [2]. Additionally, anemia rates in India have been on the rise, with 57% of women and 67% of children currently suffering from the condition [3,4]. The palpebral conjunctiva of the eye, which is highly vascularized, is particularly useful for studying anemia detection. This is because conjunctival pallor, or a reduction in the redness of the eye, is readily apparent when hemoglobin levels in the blood are low. The complete blood count is a standard laboratory test that provides detailed information about the various types and concentrations of blood cells. A traditional method for determining hemoglobin content is the cyanmethemoglobin method, which involves using cyanide to convert hemoglobin into a measurable, stable form.

1. Literature Review

Related studies have described a diverse array of methodologies and technologies employed to detect anemia, a significant global health issue. These methods range from image processing techniques to the use of new, reliable, and fast machine learning algorithms. Bevilacqua et al. [1] utilized a specialized acquisition device comprising a head-mounted device, a Raspberry Pi, and a Pi-Noir camera, complemented by LEDs and a Java application for software. The images captured were converted to CIELAB color space, and the automated algorithm started with the application of canny edge detection, followed by computing a logical mask to mark the brightest region pixels. A support vector machine (SVM) classifier was then trained and tested on the dataset, using the average a* value as input for classification. Bauskar et al. [5] collected a dataset of 99 images from both anemic and non-anemic subjects using a standard camera. Images of the interior conjunctiva were taken and underwent region segmentation using a triangle threshold algorithm, followed by conversion from RGB to LAB color space. The relationship between the hemoglobin level and the a* component of the extracted region was analyzed using the Pearson correlation coefficient. Classification of anemia was subsequently performed using a modified version of the SVM classifier to improve accuracy. Fuadah et al. [6] developed a noninvasive anemia detection system for pregnant women. The process began with manual cropping of the palpebral conjunctiva, followed by conversion of the original RGB image to HSV and grayscale color spaces. During the feature extraction stage, first-order parameters of the image were extracted. Finally, a k-nearest neighbor (KNN) classifier was trained and tested with the feature vectors.

Wu et al. [7] utilized a deep convolutional generative adversarial network (DCGAN) for data augmentation to identify tomato leaf disease. The dataset comprised 1,500 images across five classes. The study involved developing a DCGAN and training classification models such as AlexNet, GoogLeNet, VGGNet, and ResNet using the augmented dataset. Additionally, the paper highlighted the importance of addressing the unequal distribution of information by employing generative adversarial networks that convert images into images, and by using convolutional neural networks with multiple scales to extract outstanding features. Appiahene et al. [8] analyzed images of the eye conjunctiva, palpable palm, and fingernails to detect iron deficiency anemia. The images were captured using a Samsung Galaxy Tab 7A by trained professionals, focusing on individuals aged 6–59 months. Preprocessing in LAB color space was conducted before region of interest (ROI) segmentation, followed by the extraction of mean a*, b*, and G parameters. The model achieved higher accuracy on palm images, with a result of 99.12%.

Purwanti et al. [9] introduced a noninvasive mobile health application designed to detect anemia by analyzing images of the conjunctiva. The final method achieved an intersection over union value of approximately 72.05% for segmentation purposes and a classification accuracy of 91.43%. The methods discussed in the literature consistently demonstrate high accuracy and sensitivity.

The literature review focused on image processing techniques that incorporated automatic segmentation and feature extraction methods [10,11,12,13,14]. These studies highlighted the potential of noninvasive imaging techniques targeting the palm, nail, and conjunctiva regions for anemia detection. Specifically, the research utilized sophisticated algorithms and models, including logistic regression, KNN classifiers, and SVMs, to classify the images. This advancement marks a significant development in the field of noninvasive anemia diagnosis. The integration of image processing, deep learning, and user-friendly applications is essential for effective anemia detection across diverse populations. The accuracy levels achieved suggest that imaging of the conjunctiva could serve as a reliable tool in anemia screening, thereby eliminating the need for blood draws. The purpose of this study was to analyze conjunctiva images taken with smartphones to detect anemia using machine learning and deep learning techniques. The objectives included acquiring images according to specific protocols and developing appropriate image processing algorithms for preprocessing and segmenting the ROI. Features were then extracted to identify anemia using both machine learning and deep learning models. The Introduction section highlights the importance of detecting anemia and reviews existing methodologies, including preliminary research in this area. The Methods section describes the data acquisition process and the techniques employed in this study. The Results section presents the outcomes, and the Discussion section assesses the effectiveness of the proposed method.

Go to :

II. Methods

The methodology, illustrated in Figure 1, began with data collection via a smartphone from participants diagnosed with anemia and those without, as well as from an online database. These data were then preprocessed using a blurring technique. For the segmentation of the ROI, the image was converted to the CIELAB color space. Relevant color features were extracted from the conjunctiva region. Finally, the performance of both machine learning and deep learning models was evaluated to assess their effectiveness in detecting anemia. The ROI in this study was the palpebral conjunctiva, which was chosen for its high vascularity and lack of influence from melanin content.

Figure 1

Overall workflow. ROI: region of interest.

1. Data Collection

The images of the eye were captured using an iPhone XR equipped with a 12-megapixel camera (Figure 1). The flash was disabled to prevent overexposure and light reflections, which could alter the essential color properties of the image. The focus and zoom settings were adjusted to capture detailed views of the conjunctiva. A total of 764 conjunctiva images were analyzed in this study. Of these, 54 images were captured using an iPhone, and 710 images were downloaded from an online database. The downloaded dataset included images of the conjunctiva from both eyes, captured using a smartphone, and encompassed participants aged 16–59 years [15]. The images directly acquired for the study included participants aged 18–50 years.

The directly acquired conjunctiva images included 13 from participants with anemia and 41 from participants without anemia. As shown in Table 1, the dataset comprised a total of 764 participants, with 439 having anemia and 325 not having anemia.

Table 1

Dataset information

Dataset	Total	Age group	Anemic		Non-anemic

			Male	Female	Male	Female
Online dataset	710	16–59 yr	102	324	206	78

Acquired	54	18–53 yr	4	9	28	13

2. Image Processing

Each acquired image underwent preprocessing to enable the segmentation of the conjunctiva region from the eye image. Figure 2 illustrates the comprehensive preprocessing flow of the image. First, the raw image was resized to 200 × 200 pixels, standardizing the size for subsequent preprocessing steps. Following this, histogram equalization was applied to the RGB image to distribute pixel values more evenly, thereby balancing the contrast. The final step involved applying a Gaussian blur filter to reduce exposure and diminish any light reflections that occurred during the image acquisition process. This application of Gaussian blur effectively reduced the high-frequency components of the image. Subsequently, the Gaussian-blurred image was converted to CIELAB color space. The channel of the CIE space was selected for further processing because the redness in this channel of the conjunctiva image is directly associated with the condition of anemia. This information is crucial for further feature extraction post-segmentation, where the mean values of the channels are calculated.

Figure 2

Preprocessing of the acquired images.

Segmentation of the image identified the ROI (i.e., the palpebral conjunctiva, which is crucial for anemia detection). The focus of the study was the redness of the conjunctiva, from which all color features were extracted. The segmentation method employed was the simple linear iterative clustering (SLIC) superpixel algorithm, which we chose for its ability to preserve important image details while segmenting a specific part of the image. The results of the segmentation process using the SLIC method are presented in Figure 3.

Figure 3

Segmentation output.

According to the literature [3,16], there is a strong correlation between the hemoglobin level and the color properties of the conjunctiva region in the RGB color space. An unhealthy pale appearance of the palpebral conjunctiva indicates anemia. This membrane, which covers the white part of the eyes, often appears less red than normal, suggesting reduced blood flow. The mucosa in this area is very thin, allowing the underlying vessels to be clearly visible, which facilitates the assessment of pallor [17]. The level of redness in the conjunctiva directly reflects the anemic condition, where increased redness suggests a higher hemoglobin value and a non-anemic state, and vice versa. The conjunctiva region, as captured in the original RGB space, is analyzed by separating it into its three color channels: R (red), G (green), and B (blue). The mean pixel values are calculated for each of these channels. The hemoglobin value is then determined using formula (1). These computed hemoglobin values are compared with standard values to determine the presence of anemia.

(1)

$H b = 10.1457 + 0.2238 \times R - 0.2608 \times G - 0.0611 \times B .$

3. Machine Learning Methods

Machine learning models such as SVM, KNN, naïve Bayes (NB), and decision tree are utilized to analyze their performance in classifying and distinguishing between anemic and non-anemic images [3,4,18]. For the SVM model, the parameters are set with a linear kernel function, a tolerance of 0.1, and an epsilon of regression at 0.1. The model is trained with 80% of the data, while 10% is used for validation and another 10% for testing. The KNN model classifies images as anemic or non-anemic based on a similarity measure, such as distance. It uses features like the mean values of blue, green, and red pixels, which reflect the blood characteristics of the conjunctiva, to accurately predict the class before training. The decision tree model is robust and effectively classifies images into anemic and non-anemic categories. The SVM model seeks to identify the hyperplane that maximizes the margin between different classifications. This margin is defined as the space between the hyperplane and the nearest data points from each class, known as support vectors. The NB model is simple, effective, and efficient when handling a large dataset. In classifying anemia, this NB model calculates the probability of each class based on the color and intensity features extracted from the image channels. Each feature independently contributes to the probability that an image indicates anemia, simplifying computation and enhancing the model’s speed and scalability. Typical features derived from images of the eye’s conjunctiva for NB categorization include color histograms, edge detection results, and statistical texture analyses. These features help differentiate various hemoglobin levels, as evidenced in the eye images, which correspond to different anemia levels.

4. Deep Learning Models

DCGAN is an augmentation method that has been employed to increase the dataset size and thereby enhance the accuracy of deep learning performance. The algorithm of the DCGAN architecture is illustrated in Figure 4.

Figure 4

A deep convolutional generative adversarial network (DCGAN) architecture: (A) generator and (B) discriminator. ReLU: rectified linear unit.

The DCGAN’s generated images significantly increased the diversity of the dataset, with benefits for achieving high accuracy. The DCGAN-augmented images were fed to the deep learning models, namely a voting ensemble model, stacking ensemble model and GoogLeNet model.

1) Ensemble models

The voting ensemble model, shown in Figure 5, combines the strengths of VGG16, ResNet-50, and InceptionV1 by aggregating their key features. This integration increases accuracy beyond the capabilities of any individual model. In this ensemble, voting is conducted by each component classifier, which provides a probability distribution. These distributions are then averaged in a process known as soft voting. The class label with the highest average probability is selected as the final prediction.

Figure 5

Architecture of the voting ensemble model. ReLU: rectified linear unit.

Stacking, also known as stacking generalization (Figure 6), is an ensemble learning technique that combines multiple classification or regression models through a meta-classifier or meta-regressor. Initially, the stacked ensemble trains several base-level models—specifically, VGG16, ResNet-50, and InceptionV1—on the entire training dataset. Once these base models are trained, their outputs are utilized as inputs for the meta-classifier, which then makes the final prediction. The final output is derived by amalgamating the predictions from the individual models, resulting in a system that is both adaptable and typically yields higher accuracy and more stable predictions across various types of input images.

Figure 6

Stacking ensemble model architecture. ReLU: rectified linear unit.

The third model, known as GoogLeNet or InceptionV1, features multiple parallel layers of varying types, as shown in Figure 7. Each module contains 1 × 1, 3 × 3, and 5 × 5 convolutional filters, along with a 3 × 3 max pooling layer. These components process the same input layer and then concatenate their outputs, capturing information at different stages. The input consists of a three-channel image measuring 224 × 224 pixels. To help prevent overfitting, the architecture includes a dropout layer, which is followed by a linear layer and a softmax function. During training, GoogLeNet employs batch normalization, ReLU activations, and RMSprop as the optimizer.

Figure 7

Architecture of the GoogLeNet model.

Go to :

III. Results

The dataset of 764 images was divided into three subsets: 70% for training, 10% for validation using 10-fold cross-validation, and 20% for testing. These subsets were used in the following models. The confusion matrix for the decision tree model is displayed in Figure 8. The minimum parent size was set at 10, resulting in an accuracy of 70.15%. From the testing data, 64 true positives and 108 true negatives were recorded. For the SVM model, the parameters were set with a linear kernel function, a tolerance of 0.1, and an epsilon of regression also at 0.1. The KNN model was configured with 5 as the parameter for the number of neighbors (K). The images generated by the DCGAN model increased the dataset size from 765 to 4315 images, as shown in Table 2, and these images were then used in the deep learning models for classification.

Figure 8

Confusion matrix and receiver operating characteristic curve (ROC) for the (A) machine learning models and (B) deep learning models. SVM: support vector machine, KNN: k-nearest neighbor, AUC: area under the curve.

Table 2

Evaluation of the models’ performance

Technique	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)
SVM	57	57	Nan	72
NB	52	55	78	65
KNN	78.2	80.6	81.6	81.14
Decision tree	70.17	69.62	84.61	76.3
Regression model	83.3	81.25	86.61	84
Voting ensemble model	58.7	67.2	58.3	62.5
GoogLeNet	76.4	71	76	74
Stacking ensemble model	89.48	88	95	91

SVM: support vector machine, NB: naïve Byes, KNN: k-nearest neighbor.

The augmented images were input into various deep learning models, including stacking and voting ensemble models, as well as the GoogLeNet model. We analyzed the performance of these models using several metrics: accuracy, precision, recall, and F1-score. Accuracy indicates the proportion of instances correctly classified within the dataset. Precision assesses the accuracy of the predicted positive instances. Recall, also known as sensitivity or true positive rate (TPR), measures the proportion of actual positive instances that the model correctly identified. The F1-score represents the harmonic mean of precision and recall. The confusion matrix for the deep learning models, including the GoogLeNet model and the stacking ensemble model, is shown in Figure 8B.

With an area under the curve (AUC) of 0.87, the model under consideration is deemed effective due to its strong ability to differentiate between classes. The graph suggests that the model maintains a high TPR even at low threshold settings, while the false positive rate (FPR) remains minimally affected. This is a significant indicator of the model’s utility in identifying positive results early on. A perfect model would have an AUC of 1, whereas an AUC of 0.5 indicates performance no better than random guessing. The AUC of 0.97, as shown in the receiver operating characteristic (ROC) curve in Figure 8B, suggests that the model was nearly perfect. This graph demonstrates that the model consistently achieved a high TPR while keeping the FPR low across various threshold settings. According to Table 2, the regression model performed best with non-augmented data, achieving an accuracy of 83.3%, followed by the KNN model, which had an accuracy of 78.2%. For the augmented dataset, the stacking ensemble model achieves the highest accuracy, classifying images with 89.48% accuracy.

Go to :

IV. Discussion

The dataset comprised 764 conjunctiva images, which were divided into 70% for training, 10% for validation, and 20% for testing the aforementioned models. These models were assessed using performance metrics and compared to identify the most effective model for accurate anemia classification. Initially, various machine learning models were applied to the dataset for classification purposes, with the KNN model achieving the highest accuracy at 78.2%.

The augmentation through the DCGAN model expanded the dataset to 4,315 images, which were then classified using deep learning models. Initially, these models were trained with non-augmented images, resulting in an accuracy of 55.6% for the voting ensemble, 73% for GoogLeNet, and 82.4% for the stacking ensemble model. Upon comparing the voting ensemble, stacking ensemble, and GoogLeNet, the stacking technique achieved the highest accuracy, reaching 89.4% with the DCGAN-augmented data, followed by GoogLeNet at 76.4%. The introduction of DCGAN-based data augmentation significantly enhanced model performance, with the stacking ensemble showing the most notable improvement. Given that deep learning models often perform better with larger datasets, the substantial performance boost following augmentation is encouraging. Additionally, the use of ensemble approaches highlights how leveraging the strengths of multiple models can achieve high accuracy. In summary, the results demonstrate that deep learning, especially when utilizing the stacking ensemble model on the augmented dataset, significantly enhances performance compared to traditional machine learning models. The comprehensive analysis of AUC and ROC supports the model’s robust capability to early identify positive cases, which is advantageous for anemia classification. Future data collection efforts should focus on a diverse range of demographics to improve the model’s generalizability across different groups.

Go to :