Deep Learning for Skin Cancer Classification

Published: September 29th, 2024


 Powered by AWS Polly | 0:00/0:00

I wanted to work on a project using real-world data to further develop my skills in deep learning. With this in mind, I wanted to recreate a publication from a major scientific journal. This led me to find the following Nature Publication. It is titled Human–computer collaboration for skin cancer recognition. In this article, I want to outline my approach to recreating this publication and walk through the code I used to do so. The code and data I used for this project can be found on my GitHub.


I used images from the HAM10000 dataset. HAM10000 is a dataset for skin cancer detection using annotated lesion images. The dataset can be accessed here. I performed some preparatory work to label and organize the images into two directories, one for training (80%) and one for testing (20%), based on their class. The CSV file, ground-truth.csv, contains all the labels I used for this process. Both ground-truth.csv and the train/test directories, within the images folder, are available in the GitHub repository.


Once the necessary libraries were imported, I prepared the images for training and testing. I resized all the images to 224x224 pixels, flipped them horizontally at random, converted them to tensors, and normalized the pixel values to have a mean of 0.5 and a standard deviation of 0.5. I then loaded the image datasets from two folders: train and test, located in the images directory. The images are organized in subfolders by class, for all seven classes. Next, I created two DataLoaders for training and testing. These DataLoaders will be used to batch the images during training and testing. The train DataLoader shuffles the images, while the test DataLoader does not.


The convolutional neural network (CNN) I developed, FourLayer, balances complexity and efficiency to effectively classify skin cancer images. It has four convolutional layers that progressively increase the number of filters, allowing it to capture both simple and complex features in the images. Batch normalization is applied to each convolution to stabilize learning, while max-pooling reduces spatial dimensions while retaining essential features, thereby decreasing computational requirements. FourLayer concludes with two fully connected layers, which map the learned features into a decision space for classification. A dropout rate of 0.3 is incorporated to reduce overfitting, ensuring FourLayer is robust and generalizable. Below is the model, FourLayer, I developed.


FourLayer CNN
Custom FourLayer CNN

For training, I initially used my MacBook Pro with Apple's M3 Pro CPU+GPU system on a chip. I implemented an 80/20 train/test split and processed images in batches of 32, resulting in a per-epoch training time of 10 minutes. While this may seem short, running 100 epochs would take over 16 hours. To address this, I leveraged AWS's cloud computing resources by running an on-demand EC2 instance utilizing an Nvidia L4 GPU via a g6.xlarge EC2 instance. I set up a Jupyter Lab server on the EC2 instance to utilize CUDA and the L4 GPU. With this setup, I completed 100 epochs in just over 90 minutes, more than 11 times faster than my local hardware. My training loop is below.


I also implemented Gradient-weighted Class Activation Mapping (Grad-CAM) to further verify that the model is learning the right features. Grad-CAM is a technique that visualizes which areas of an image a CNN focuses on when making a prediction by highlighting the regions most important for the model's decision. Using Grad-CAM helps humans to better interpret and understand the model by showing what features influence the classification. In this case, Grad-CAM is crucial to verify that the model is learning the right features, such as identifying the borders, texture, shape, and color of the lesion, which are critical for classifying skin cancer images.


To generate the Grad-CAM heatmap, I loaded my 100-epoch trained skin cancer recognition model and a sample image. The image was resized to 224x224 pixels and normalized with a mean and standard deviation of 0.5, matching the image pre-processing. The pre-processed image was then converted into a PyTorch tensor and passed through the model. A forward hook was registered to capture the feature maps from the final convolutional layer. After the model's forward pass, the captured feature maps were used to compute the class activation map (CAM) based on the weights of the predicted class. The CAM was resized to match the dimensions of the original image, and a threshold of 0.5 was applied to highlight the most relevant regions the model used to predict the lesion class. I also applied a Gaussian filter with a σ of 1 to smooth the CAM slightly.


In the heatmap, red areas indicate regions where the model places the highest importance, while yellow and green areas show moderate relevance. Blue and cyan regions denote areas of low importance. Referring to the color bar on the right of the plots, red areas are closer to 1 (higher importance), while blue areas are closer to 0 (lower importance). This visualization helps interpret which parts of the lesion—such as its borders, texture, and pigmentation—were crucial for classification. The code for generating Grad-CAM heatmaps and several examples are below.




Grad-CAM for Malignant Melanoma
Grad-CAM for Malignant Melanoma
Grad-CAM for Benign Melanocytic Nevi
Grad-CAM for Benign Melanocytic Nevi
Grad-CAM for Dermatofibroma
Grad-CAM for Dermatofibroma
Grad-CAM for Vascular Lesion
Grad-CAM for Vascular Lesion