{"id":4346,"date":"2021-08-02T08:26:59","date_gmt":"2021-08-02T06:26:59","guid":{"rendered":"http:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/?p=4346"},"modified":"2022-09-07T10:51:38","modified_gmt":"2022-09-07T08:51:38","slug":"gradient-weighted-class-activation-mapping-with-fruit-images","status":"publish","type":"post","link":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/2021\/08\/02\/gradient-weighted-class-activation-mapping-with-fruit-images\/","title":{"rendered":"Gradient-weighted Class Activation Mapping with fruit images"},"content":{"rendered":"\n<p>In a previous blog we documented methods and code for recognizing fruits with a neural network on a raspberry pi. This time we want to go one step further and describe the method and code to create heatmaps for fruit images. The method we use here is called Gradient-weighted Class Activation Mapping (Grad-CAM). A heatmap is telling us which pixel of an fruit image leads to the neural network&#8217;s decision to assign the input image to a specific class.<\/p>\n\n\n\n<p>We believe that heatmaps are a very useful information especially during the validation after a neural network is trained. We want to know, if the neural networks is really making the right decision upon the given information (such as a fruit image). A good example is the classification of wolf and husky dog images made once by researchers (&#8220;Why Should I Trust You?&#8221;, See Reference). The researchers had actually pretty good results, until somebody figured out, that most wolf images were taken in a snowy environment, while the husky images were not. The neural network mostly associated a snowy environment with a wolf.  The husky dog was therefore classified as wolf, when the image was taken with snow in the background.<\/p>\n\n\n\n<p>For a deeper neural network validation, we can use heatmaps to see how the classification decision was made. Below we will show you how we generate heatmaps from fruit images. The <a href=\"https:\/\/keras.io\/examples\/vision\/grad_cam\/\">Keras website<\/a> helped us a lot to write our code, see also Reference.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The  Setup<\/h2>\n\n\n\n<p>We use three different <em>classes<\/em> of images: Apfel (appel) images, orange images and tomate (tomato) images, see Figure 1. The list <em>classes<\/em>, see code below, contains strings describing the classes. We filled in the training, validation and test directories with fruit images, similar to those in Figure 1. Each class of fruit images went into its own directory named Apfel, Orange and Tomate.  <\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"357\" height=\"119\" src=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/fruits.png\" alt=\"\" class=\"wp-image-4371\" srcset=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/fruits.png 357w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/fruits-300x100.png 300w\" sizes=\"auto, (max-width: 357px) 100vw, 357px\" \/><figcaption>Figure 1: Images of an appel, an orange and a tomato<\/figcaption><\/figure>\n<\/div>\n\n\n<p>In the code below we use the paths<em> traindir<\/em>, a <em>validdir<\/em> and a<em> testdir<\/em>. The weights and the structure of the neural network model (<em>modelsaved<\/em> and <em>modeljson<\/em>) is saved into the model directory .<\/p>\n\n\n\n<p>The classes Apfel, Orange and Tomate are associated to numbers with the python <em>classnum<\/em> dictionary.  The variable <em>dim<\/em> defines the size of the images.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">basepath = \"\/home\/......\/Session1\"\ntraindir = os.path.join(basepath, \"pics\" , \"train\")\nvaliddir = os.path.join(basepath, \"pics\" , 'valid')\ntestdir = os.path.join(basepath, \"pics\" , 'test')\nmodel_path = os.path.join(basepath, 'models')\n\nclasses = [\"Apfel\",\"Orange\",\"Tomate\"]\nclassnum = {\"Apfel\":0, \"Orange\":1, \"Tomate\":2}\n\nnet = 'convnet'\n\nnow = datetime.datetime.now()\n\nmodelweightname = f\"model_{net}_{now.year}-{now.month}-{now.day}_callback.h5\"\nmodelsaved = f\"model_{net}.h5\"\nmodeljson = f\"model_{net}.json\"\n\n\ndim = (100,100, 3)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">The Model<\/h2>\n\n\n\n<p>The following code shows the convolutional neural network (CNN) code in Keras. We have four convolutional layers and two dense layers. The last dense layer has three neurons. The output of one neuron indicates if an image is predicted as an apple, an orange or a tomato. Since the output is exclusive (an apple cannot be a tomato), we use the softmax activation function for the last layer.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def create_conv_net():\n    model = Sequential()\n    model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(dim[0], dim[1], 3)))\n    model.add(BatchNormalization())\n    model.add(MaxPooling2D((2, 2)))\n    model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))\n    model.add(BatchNormalization())\n    model.add(MaxPooling2D((2, 2)))\n    model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))\n    model.add(BatchNormalization())\n    model.add(MaxPooling2D((2, 2)))\n    model.add(Conv2D(128, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same'))\n    model.add(BatchNormalization())\n    model.add(MaxPooling2D((2, 2)))\n    model.add(Flatten())\n    model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))\n    model.add(Dense(len(classes), activation='softmax'))\n    #opt = Adam(lr=1e-3, decay=1e-3 \/ 200)\n    return model<\/pre>\n\n\n\n<p>The function <em>create_conv_net<\/em> creates the model. It is compiled with <em>categorical_crossentropy<\/em> loss function, see code below.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">model = create_conv_net()\nmodel.compile(Adam(lr=.00001), loss=\"categorical_crossentropy\", metrics=['accuracy'])<\/pre>\n\n\n\n<p>The focus here is to describe the Grad-Cam method, and not the training of the model. Therefore we leave out the specifics on training. More details on training can be found <a href=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/2020\/03\/26\/fruit-recognition-on-a-raspberry-pi\/\">here<\/a>. The code below loads in (Keras method <em>load_weights<\/em>) a previously prepared model checkpoint and shows with the <em>summary <\/em>method the structure of the model. This is a useful information, because we have to look for the last convolutional layer. It is needed later.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">model.load_weights(os.path.join(model_path,\"model_convnet_2021-7-1_callback.h5\"))\nmodel.summary()<\/pre>\n\n\n\n<p>Now we have to create a new model with two outputs layers. In Figure 2 you see a simplified representation of the CNN we use. The last layer (color orange) represents the dense layer. The last CNN layer is in color green. Due to the filters of the CNN layer we have numerous channels. The Grad-CAM method requires the outputs of the channels. So we need to create a new model, based on the one above, which outputs both, the images of the results of the channels from the last CNN layer and the classification decision from the dense layer.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"565\" src=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/GradCAM-1024x565.png\" alt=\"\" class=\"wp-image-4393\" srcset=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/GradCAM-1024x565.png 1024w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/GradCAM-300x166.png 300w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/GradCAM-768x424.png 768w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/GradCAM.png 1486w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Figure 2: Simplified CNN <\/figcaption><\/figure>\n\n\n\n<p>The code below iterates through the layers of CNN model in reverse order and finds the last <em>batch_normalization<\/em> layer. We modeled the neural network in a way that each CNN layer is followed by the <em>batch_normalization<\/em> layer, so we pick the <em>batch_normalization<\/em> layer as an output. The code produces a new model <em>gradModel <\/em>with <em>model<\/em>&#8216;s input; and <em>model<\/em>&#8216;s last CNN layer (actually last <em>batch_nomalization<\/em> layer) and last dense layer as outputs.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">gradModel = None\n\nfor layer in reversed(model.layers):\n    if \"batch_normalization\" in layer.name:\n        print(layer.name)\n        gradModel = Model(inputs=model.inputs, outputs=[model.get_layer(layer.name).output, model.output])\n        break<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">The Grad-CAM method<\/h2>\n\n\n\n<p>The function<em> getGrad<\/em> below is executing the model <em>gradModel<\/em> with an image given as a parameter <em>img<\/em>. It returns the results (<em>conv, predictions<\/em>) of the last CNN layer (actually <em>batch_normalization<\/em>) and the last dense layer. We are now interested in the gradients of the images returned from the last CNN layer with respect to the <em>loss<\/em> of a specific class. The dictionary <em>classnum<\/em> (defined above) outputs a class number and addresses the loss inside <em>predictions<\/em>, see command below.<\/p>\n\n\n\n<p><em> loss = predictions[:, classnum[classname]]<\/em><\/p>\n\n\n\n<p>The gradients of the last CNN layer with respect to the loss of a specific class are calculated with <em>tape<\/em>&#8216;s <em>gradient<\/em> method.  The function<em> getGrad<\/em> returns the input image, the output of the last CNN layer (<em>conv<\/em>) and the gradients of the last CNN layer.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def getGrad(img, classname):\n\n    testpics = np.array([img], dtype=np.float32)\n\n    with tf.GradientTape() as tape:\n\n        conv, predictions =  gradModel(tf.cast(testpics, tf.float32))\n        \n        loss = predictions[:, classnum[classname]]\n\n    grads = tape.gradient(loss, conv)\n\n    return img, conv, grads<\/pre>\n\n\n\n<p>The function <em>getHeatMap<\/em> below creates a heatmap from the output of the image&#8217;s last CNN layer and its gradients. Inside <em>getHeadMap<\/em>, the Tensorflow method <em>reduce_mean<\/em> takes the mean values (<em>pooled_grads<\/em>) of <em>grads<\/em>. The mean value is an indication of how important the output of a channel from the last CNN layer is. The function multiplies its mean values with the corresponding CNN layer outputs (<em>convpic<\/em>) and sums up the output  into <em>heatmap<\/em>. Since we want to have an image to look at, the function <em>getHeatMap<\/em> is rectifying, normalizing and resizing  the heatmap before it is returning it.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def getHeatMap(conv, grads):\n    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))\n    convpic = conv[0]\n    heatmap = convpic @ pooled_grads[..., tf.newaxis]\n    heatmapsq = tf.squeeze(heatmap)\n    heatmapnorm = tf.maximum(heatmapsq, 0) \/ tf.math.reduce_max(heatmapsq)\n    heatmapnp = heatmapnorm.numpy()\n    heatmapresized = cv2.resize(heatmapnp, (dim[0], dim[1]), interpolation = cv2.INTER_AREA)*255\n    heatmapresized = heatmapresized.astype(\"uint8\")\n    ret = np.zeros((dim[0], dim[1]), 'uint8')\n    ret = heatmapresized.copy()\n    return ret<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Testing the images<\/h2>\n\n\n\n<p>The apple, orange and tomato images are stored in separate directories. So the function <em>testgrads<\/em>, see code below, loads in the images from the test directory into the <em>pics0, pics1 <\/em>and <em>pics2 <\/em>lists. They are then added together into <em>pics <\/em>list.<\/p>\n\n\n\n<p>The function <em>testgrads<\/em> normalizes the list of images and predicts its classifications (variable <em>predictions<\/em>). The function <em>testgrads<\/em> calls the<em> getGrad<\/em> function and the <em>getHeatMap<\/em> function to receive a <em>heatmap <\/em>for each image. Numpy&#8217;s<em> argmax<\/em> methods outputs a number which indicates if the image is an apple, an orange or a tomato and moves the result into<em> pos<\/em>. Finally the <em>heatmap<\/em>, which is a grayscale image, is converted into a color image (apple images are converted in green color, orange images into blue color and tomato images into red color). The function <em>testgrads<\/em> is then returning a list of colored <em>heatmaps<\/em> for each image. <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def testgrads(picdir):\n\n    pics0 = [os.path.join(picdir, classes[0], f) for f in os.listdir(os.path.join(picdir, classes[0]))]\n    pics1 = [os.path.join(picdir, classes[1], f) for f in os.listdir(os.path.join(picdir, classes[1]))]\n    pics2 = [os.path.join(picdir, classes[2], f) for f in os.listdir(os.path.join(picdir, classes[2]))]\n  \n    pics = pics0 + pics1 + pics2\n\n    imagelist = []\n    \n    for pic in pics:\n        img = np.zeros((dim[0], dim[1],3), 'uint8')\n        img = cv2.resize(cv2.imread(pic,cv2.IMREAD_COLOR ), (dim[0], dim[1]), interpolation = cv2.INTER_AREA)\n        imagelist.append(img)\n    \n    train_data = np.array(imagelist, dtype=np.float32)\n    \n    train_data -= train_data.mean()\n    train_data \/= train_data.std()\n    \n    predictions = model.predict(train_data)\n    \n    heatmaps = []\n    \n    for i in range(len(train_data)):   \n        heatmapc = np.zeros((dim[0], dim[1],3), 'uint8')\n        pos = np.argmax(predictions[i])\n        img, conv, grads = getGrad(train_data[i], classes[pos])\n        heatmap =  getHeatMap(conv, grads)\n        \n        if pos == 0:\n            posadj = 1\n        elif pos == 1:\n            posadj = 0\n        elif pos == 2:\n            posadj = 2    \n        \n        heatmapc[:,:,posadj] = heatmap[:,:]    \n        \n        heatmaps.append(heatmapc)\n    \n            \n    return imagelist, heatmaps<\/pre>\n\n\n\n<p>Below the code which calls the function <em>testgrads<\/em>. The parameter <em>testdir<\/em> is the path of the test images.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">imagelist, heatlist = testgrads(testdir)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Result<\/h2>\n\n\n\n<p>In Figure 3 you find three images (apple, orange and tomato) passed through the <em>testgrads<\/em> function, see top row. In the middle row, you find the outputs of the <em>testgrad<\/em>s function. These are the visualized heatmaps. The bottom row of Figure 3 are images which were merged with OpenCV&#8217;s <em>weighted<\/em> method. So the heatmap pixels indicate, which group of pixels of the original image led to the prediction decision. You see that the black heatmaps pixels indicate that the background pixels do not lead to any decision. The same is true for the fruit stems. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"349\" height=\"345\" src=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/gradcamresult-1.png\" alt=\"\" class=\"wp-image-4440\" srcset=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/gradcamresult-1.png 349w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/07\/gradcamresult-1-300x297.png 300w\" sizes=\"auto, (max-width: 349px) 100vw, 349px\" \/><figcaption>Figure 3: Original and heatmap images<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">References<\/h2>\n\n\n\n<p>Keras: <a href=\"https:\/\/keras.io\/examples\/vision\/grad_cam\/\">https:\/\/keras.io\/examples\/vision\/grad_cam\/<\/a><\/p>\n\n\n\n<p>pyimagesearch: <a href=\"https:\/\/www.pyimagesearch.com\/2020\/03\/09\/grad-cam-visualize-class-activation-maps-with-keras-tensorflow-and-deep-learning\/\">https:\/\/www.pyimagesearch.com\/2020\/03\/09\/grad-cam-visualize-class-activation-maps-with-keras-tensorflow-and-deep-learning\/<\/a><\/p>\n\n\n\n<p>Why should I trust you: https:\/\/arxiv.org\/pdf\/1602.04938.pdf<\/p>\n\n\n\n<p>Fruit Recognition: https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/2020\/03\/26\/fruit-recognition-on-a-raspberry-pi\/<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a previous blog we documented methods and code for recognizing fruits with a neural network on a raspberry pi. This time we want to go one step further and describe the method and code to create heatmaps for fruit images. The method we use here is called Gradient-weighted Class Activation Mapping (Grad-CAM). A heatmap &hellip; <a href=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/2021\/08\/02\/gradient-weighted-class-activation-mapping-with-fruit-images\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Gradient-weighted Class Activation Mapping with fruit images<\/span><\/a><\/p>\n","protected":false},"author":24,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[4,6,3,5],"class_list":["post-4346","post","type-post","status-publish","format-standard","hentry","category-allgemein","tag-ai","tag-classification","tag-deep-learning","tag-ki"],"_links":{"self":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts\/4346","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/users\/24"}],"replies":[{"embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/comments?post=4346"}],"version-history":[{"count":161,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts\/4346\/revisions"}],"predecessor-version":[{"id":4830,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts\/4346\/revisions\/4830"}],"wp:attachment":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/media?parent=4346"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/categories?post=4346"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/tags?post=4346"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}