{"id":3967,"date":"2021-02-05T09:13:32","date_gmt":"2021-02-05T08:13:32","guid":{"rendered":"http:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/?p=3967"},"modified":"2022-09-07T10:53:25","modified_gmt":"2022-09-07T08:53:25","slug":"barcode-detection-with-a-neural-network","status":"publish","type":"post","link":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/2021\/02\/05\/barcode-detection-with-a-neural-network\/","title":{"rendered":"Barcode detection with a neural network"},"content":{"rendered":"\n<p>In late 2020 we were teaching a class called embedded systems, which includes a semester assignment, as well. Due to the overall situation in 2020 it was difficult to do it at the university, so we decided to issue the semester assignment as a home work. One of topics of the assignments was the detection of barcode with a neural network. It is not too far fetched to issue such an assignment for a embedded systems class, because we have nowadays available embedded systems which are capable of using neural network, such as the jetson nano from <a href=\"https:\/\/developer.nvidia.com\/EMBEDDED\/jetson-nano-developer-kit\">NVIDIA<\/a>.  <\/p>\n\n\n\n<p>The idea we had is to have a camera taking a stream of images and showing it on the display as a video. As soon as an object with a barcode shows up in the scene,  the pixels in the area of the barcode are highlighted and the content of the barcode is shown above the area. <\/p>\n\n\n\n<p>Many of the topics in this blog were covered already before, so we might refer here to previous blogs e.g. during the explanation of the published code.<\/p>\n\n\n\n<p>We assigned the students the first task, which was to collect images with barcodes. Each student had to photograph 500 images. They collected them from everywhere, such as from products in grocery stores, kitchens, bathrooms etc. Since we had seven students taking this assignment, we had finally got 3500 images with barcodes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Labeling<\/h2>\n\n\n\n<p> The next task of the assignment was to label the images. The program <a href=\"http:\/\/labelme.csail.mit.edu\/Release3.0\/\">labelme<\/a> is an application, where a user can label areas of an image by drawing polygons around objects, see Figure 1. The user can save the labeled image to a json file. The json file contains the complete image in a decoded format and the vertices of the polygons.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/label.png\" alt=\"\" class=\"wp-image-3985\" width=\"510\" height=\"308\" srcset=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/label.png 954w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/label-300x181.png 300w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/label-768x464.png 768w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><figcaption>Figure 1: Labeling with Labelme<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The students have done this for each image, so we had 3500 labeled images in json files and we stored them into a directory <em>fullpathdata<\/em>, see code below. The code defines more global variables which are used throughout this post.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">dim = (128, 128) \n\npath =  r'...\/Dokumente\/Barcode'\ndirjson = \"train\/json\"\ndirimages = \"train\/images\"\ndirmasks = \"train\/masks\"\n\ndirjsonvalid = \"valid\/json\"\ndirimagesvalid = \"valid\/images\"\ndirmasksvalid = \"valid\/masks\"\n\ndirjsontest = \"test\/json\"\ndirimagestest = \"test\/images\"\ndirmaskstest = \"test\/masks\"\n\ndirmodels = \"models\"\nmodelname = \"model-bar.h5\"\nmodelweightname = \"model-bar-check.h5\"\n\ndirchecks = \"checks\"\ndirdata = \"data\/jsons\"\n\nfullpathdata = os.path.join(path, dirdata)\n\nfullpathjson = os.path.join(path, dirjson)\nfullpathimages = os.path.join(path, dirimages)\nfullpathmasks = os.path.join(path, dirmasks)\n\nfullpathjsonvalid = os.path.join(path, dirjsonvalid)\nfullpathimagesvalid = os.path.join(path, dirimagesvalid)\nfullpathmasksvalid = os.path.join(path, dirmasksvalid)\n\nfullpathjsontest = os.path.join(path, dirjsontest)\nfullpathimagestest = os.path.join(path, dirimagestest)\nfullpathmaskstest = os.path.join(path, dirmaskstest)\n\nfullpathchecks = os.path.join(path, dirchecks)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Data creation<\/h2>\n\n\n\n<p>Within the next task we sorted the data into three directories: <em>fullpathjson<\/em>, <em>fullpathjsonvalid<\/em> and <em>fullpathjsontest<\/em>. We decided to use 78% of the json files as training data, 20% as validation data, and the remaining 2% as test data.<\/p>\n\n\n\n<p>The code below iterates through the files in <em>fullpathdata<\/em> and copies the filenames into the list <em>validlist<\/em>, <em>testlist<\/em> and <em>trainlist<\/em>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">jsonlist = [os.path.join(filename) for filename in os.listdir(fullpathdata) if filename.endswith(\".json\") ]\n\nshuffle(jsonlist)\n\nnum = 20*len(jsonlist) \/\/ 100\nvalidlist = jsonlist[:num]\ntestlist = jsonlist[num:num+num\/\/10]\ntrainlist =  jsonlist[num+num\/\/10:]<\/pre>\n\n\n\n<p>Finally the json files are copied into the  <em>fullpathjson<\/em>, <em>fullpathjsonvalid<\/em> and <em>fullpathjsontest<\/em> directories, see code below.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">for filename in trainlist:\n    shutil.copy(os.path.join(fullpathdata, filename), os.path.join(fullpathjson, filename))\n    \nfor filename in validlist:\n    shutil.copy(os.path.join(fullpathdata, filename), os.path.join(fullpathjsonvalid, filename))\n    \nfor filename in testlist:\n    shutil.copy(os.path.join(fullpathdata, filename), os.path.join(fullpathjsontest, filename))<\/pre>\n\n\n\n<p>During training, we often figure out, that one of the images is corrupted. It is sometimes hard to see this directly in the error messages of the training processes. For this reason we thought it makes sense to check the data in advance. The function <em>testMasks<\/em> below is opening the json files, loading in the data, decoding the images and reading in the polygons (&#8220;shape&#8221;). Each image must have one label, so we confirm this with the <em>assert<\/em> command. If the assert command fails, the function stops executing, and we remove the json file. It is like sorting out foul fruit.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def testMasks(sourcejsonsdir):\n    count = 0\n    directory = sourcejsonsdir\n    for filename in os.listdir(directory):\n        if filename.endswith(\".json\"):\n            print(\"{}:{}\".format(count,os.path.join(directory, filename)))\n            f = open(os.path.join(directory, filename))\n            data = json.load(f)\n\n            img_arr = data['imageData']  \n            imgdata = base64.b64decode(img_arr)\n\n            img = cv2.imdecode(np.frombuffer(imgdata, dtype=np.uint8), flags=cv2.IMREAD_COLOR)\n\n            assert (len(data['shapes']) == 1)\n\n            for shape in data['shapes']:\n                print(shape['label'])\n                \n            count += 1\n            \n            f.close()<\/pre>\n\n\n\n<p>Below we execute the function<em> testMasks<\/em> against the directories  <em>fullpathjson<\/em>, <em>fullpathjsonvalid<\/em> and <em>fullpathjsontest<\/em>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">testMasks(fullpathjson)\ntestMasks(fullpathjsonvalid)\ntestMasks(fullpathjsontest)<\/pre>\n\n\n\n<p>The images taken by the students do not have square shape. For training however we need square shape images with the model we use. The function <em>getRect<\/em> below extracts a square from the original image and returns four values: the left\/upper corner coordinates (<em>ld, lw<\/em>) and two lengths of the square&#8217;s edge (which is the same value). The left\/upper corner coordinate is randomly chosen within constraints, and the  square&#8217;s edge is set to 90% of the actually width or height of the original images. It depends, which is smaller: height or width.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def getRect(img):\n    \n    width = img.shape[0]\n    height = img.shape[1]\n    lw = 0\n    ld = 0\n    side = 0\n    \n    if height &gt; width:\n    \n        widthscale = int(0.9*width)\n        left = width - widthscale\n        lw = randint(0, left)\n        down = height - widthscale\n        ld = randint(0, down)\n        side = widthscale\n    \n    else:\n    \n        heightscale = int(0.9*height)\n        down = height - heightscale\n        ld = randint(0, down)\n        left = width - heightscale\n        lw = randint(0, left)\n        side = heightscale\n    \n    \n    return (ld,lw,int(side), int(side))<\/pre>\n\n\n\n<p>The function <em>getFrameRGB<\/em> is a utility function to extract a square image from the original image using the return information from the function <em>getRect<\/em>. The function <em>getFrameRGB<\/em>  therefore returns a square image from an original image.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def getFrameRGB(img, rect):\n\n    retimg = np.zeros((rect[2], rect[3], 3), 'uint8')\n    \n    assert(rect[2] == rect[3]) \n\n    retimg[:,:,:] = img[rect[1]:rect[1]+rect[3],rect[0]:rect[0] + rect[2],:]\n    \n    assert (rect[2] == retimg.shape[0])\n    assert (rect[2] == retimg.shape[1])\n    \n    return retimg<\/pre>\n\n\n\n<p> The function <em>getFrameGrey<\/em> is a utility function to extract a greyscale square image from a original image using the return information from the function <em>getRect<\/em>. It corresponds to the above function, but it is only used for greyscale images. We will later use this function for the masks. <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def getFrameGrey(img, rect):\n\n    retimg = np.zeros((rect[2], rect[3]), 'uint8')\n    \n    assert(rect[2] == rect[3]) \n\n    retimg[:,:] = img[rect[1]:rect[1]+rect[3],rect[0]:rect[0] + rect[2]]\n    \n    assert (rect[2] == retimg.shape[0])\n    assert (rect[2] == retimg.shape[1])\n    \n    return retimg<\/pre>\n\n\n\n<p>The above described functions <em>getFrameRGB<\/em> and <em>getFrameGrey<\/em> are both used in the function <em>createMasks<\/em> below. What <em>createMasks<\/em> basically does is to iterate through a directory containing json files. It opens each json file, decodes the image and stores it in a variable <em>img<\/em>. Square information is retrieved from <em>img<\/em> with the functions <em>getRect<\/em>. <\/p>\n\n\n\n<p>Each json file contains a polygon with its vertices. The vertices are parsed out and a new image mask is generated from them. The image <em>mask<\/em> is a greyscale image, with pixels containing the 255 value, if they are inside the polygon, and 0 is they are outside the polygon.<\/p>\n\n\n\n<p>Finally we have two images:<em> img_resized<\/em> and <em>mask_resized<\/em> which where copied from <em>img<\/em> and <em>mask<\/em>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def createMasks(pre, sourcejsonsdir, destimagesdir, destmasksdir):\n\n    assocf = open(os.path.join(path,\"assoc_orig.txt\"), \"w\")\n    \n    count = 0\n    directory = sourcejsonsdir\n    for filename in os.listdir(directory):\n        if filename.endswith(\".json\"):\n            print(\"{}:{}\".format(count,os.path.join(directory, filename)))\n            \n            f = open(os.path.join(directory, filename))\n            data = json.load(f)\n            img_arr = data['imageData']  \n            imgdata = base64.b64decode(img_arr)\n\n            img = cv2.imdecode(np.frombuffer(imgdata, dtype=np.uint8), flags=cv2.IMREAD_COLOR)\n\n            img_resized = img.copy()\n            rect = getRect(img_resized)\n\n            finalmask = np.zeros((img_resized.shape[0], img_resized.shape[1]), 'uint8')\n            mthresh = np.zeros((img_resized.shape[0], img_resized.shape[1]), 'uint8')\n            masks=[]\n\n            for shape in data['shapes']:\n\n                vertices = np.array([[point[1],point[0]] for point in shape['points']])\n                vertices = vertices.astype(int)\n\n                rr, cc = polygon(vertices[:,0], vertices[:,1], img.shape)\n                mask_orig = np.zeros((img.shape[0], img.shape[1]), 'uint8')\n                mask_orig[rr,cc] = 255\n                masks.append(mask_orig)\n\n            for m in masks:\n                _,mthresh = cv2.threshold(m,1,255,cv2.THRESH_BINARY_INV)\n                finalmask = cv2.bitwise_and(finalmask,finalmask,mask = mthresh)\n                finalmask += m\n\n\n            img_resized = img.copy()\n            mask_resized = finalmask.copy()\n            \n            img_store = cv2.resize(getFrameRGB(img_resized, rect), dim, interpolation = cv2.INTER_AREA) \n            mask_store = cv2.resize(getFrameGrey(mask_resized, rect), dim, interpolation = cv2.INTER_AREA)\n            alpha = 0.8 + 0.4*random()\n            beta = int(random()*15)\n            img_store = cv2.convertScaleAbs(img_store, alpha=alpha, beta=beta)\n                \n            cv2.imwrite(os.path.join(destimagesdir, \"{}_0_{:05d}.png\".format(pre,count)), img_store)\n            cv2.imwrite(os.path.join(destmasksdir, \"{}_0_{:05d}.png\".format(pre,count)), mask_store)\n\n            img_store = cv2.resize(getFrameRGB(img_resized, rect), dim, interpolation = cv2.INTER_AREA) \n            mask_store = cv2.resize(getFrameGrey(mask_resized, rect), dim, interpolation = cv2.INTER_AREA)\n            alpha = 0.8 + 0.4*random()\n            beta = int(random()*15)\n            img_store = cv2.convertScaleAbs(img_store, alpha=alpha, beta=beta)\n            img_store = cv2.rotate(img_store, cv2.ROTATE_90_CLOCKWISE)\n            mask_store = cv2.rotate(mask_store, cv2.ROTATE_90_CLOCKWISE)\n            \n            cv2.imwrite(os.path.join(destimagesdir, \"{}_90_{:05d}.png\".format(pre,count)), img_store)\n            cv2.imwrite(os.path.join(destmasksdir, \"{}_90_{:05d}.png\".format(pre,count)), mask_store)\n\n            img_store = cv2.resize(getFrameRGB(img_resized, rect), dim, interpolation = cv2.INTER_AREA) \n            mask_store = cv2.resize(getFrameGrey(mask_resized, rect), dim, interpolation = cv2.INTER_AREA)\n            alpha = 0.8 + 0.4*random()\n            beta = int(random()*15)\n            img_store = cv2.convertScaleAbs(img_store, alpha=alpha, beta=beta)\n            img_store = cv2.rotate(img_store, cv2.ROTATE_180)\n            mask_store = cv2.rotate(mask_store, cv2.ROTATE_180)\n            \n            cv2.imwrite(os.path.join(destimagesdir, \"{}_180_{:05d}.png\".format(pre,count)), img_store)\n            cv2.imwrite(os.path.join(destmasksdir, \"{}_180_{:05d}.png\".format(pre,count)), mask_store)\n\n            img_store = cv2.resize(getFrameRGB(img_resized, rect), dim, interpolation = cv2.INTER_AREA) \n            mask_store = cv2.resize(getFrameGrey(mask_resized, rect), dim, interpolation = cv2.INTER_AREA)\n            alpha = 0.8 + 0.4*random()\n            beta = int(random()*15)\n            img_store = cv2.convertScaleAbs(img_store, alpha=alpha, beta=beta)\n            img_store = cv2.rotate(img_store, cv2.ROTATE_90_COUNTERCLOCKWISE)\n            mask_store = cv2.rotate(mask_store, cv2.ROTATE_90_COUNTERCLOCKWISE)\n            \n            cv2.imwrite(os.path.join(destimagesdir, \"{}_270_{:05d}.png\".format(pre,count)), img_store)\n            cv2.imwrite(os.path.join(destmasksdir, \"{}_270_{:05d}.png\".format(pre,count)), mask_store)\n\n            count += 1\n\n        else:\n            continue\n    f.close()<\/pre>\n\n\n\n<p>Now we do data augmentation within <em>createMasks<\/em>. The function <em>createMasks<\/em> resizes <em>img_resized<\/em> and <em>mask_resized<\/em> and moves the resized images into <em>img_store<\/em> and <em>mask_store<\/em>. Both are now square images since <em>createMasks<\/em> utilizes <em>getRect<\/em>, <em>getFrameRGB<\/em> and <em>getFrameGrey<\/em>. The brightness and the contrast of <em>img_store<\/em> is randomly changed with the opencv <em>convertScaleAbs<\/em> method. Then a rotation is applied to the image. <\/p>\n\n\n\n<p>This is done four times. So  <em>createMasks<\/em> creates for each input image, four rotated output images with randomized brightness and contrast and saves them into a directory.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">createMasks(\"train\", fullpathjson, fullpathimages, fullpathmasks)\ncreateMasks(\"valid\", fullpathjsonvalid, fullpathimagesvalid, fullpathmasksvalid)\ncreateMasks(\"test\", fullpathjsontest, fullpathimagestest, fullpathmaskstest)<\/pre>\n\n\n\n<p>In the above code you see how <em>createMasks <\/em>is applied to the images inside the train, validation and test directories. The output images are stored in <em>fullpathimage,  fullpathimagevalid<\/em> and <em>fullpathimagetest<\/em> directories and the output masks into  <em>fullpathmasks, fullpathmasksvalid<\/em> and <em>fullpathmaskstest<\/em> directories. Since we have originally 3500 images, the code above produces due to data augmentation 3500 times 4 images, which is 14000.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Training<\/h2>\n\n\n\n<p>We generated the model for training with<em> get_unet <\/em>(see below), which was written by Tobias Sterbak. The original code can be found <a href=\"https:\/\/www.depends-on-the-definition.com\/unet-keras-segmenting-images\/\">here<\/a>.  Few modification have been done, which is basically one of the last lines:<\/p>\n\n\n\n<p> <em>c10 = Conv2D(2, (1, 1), activation=&#8221;softmax&#8221;) (c9)<\/em><\/p>\n\n\n\n<p>The output of the model therefore will be a two layer image, which represents a mask indicating the barcode of an input image. Pixels of the first layers indicate a barcode if set to 1, and pixels of the second layer indicate no barcode if set to 1. The code will not be explained any further, so we refer to the <a href=\"https:\/\/www.depends-on-the-definition.com\/unet-keras-segmenting-images\/\">original code<\/a>.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def conv2d_block(input_tensor, n_filters, kernel_size=3, batchnorm=True):\n    # first layer\n    x = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), kernel_initializer=\"he_normal\",\n               padding=\"same\")(input_tensor)\n    if batchnorm:\n        x = BatchNormalization()(x)\n    x = Activation(\"relu\")(x)\n    # second layer\n    x = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), kernel_initializer=\"he_normal\",\n               padding=\"same\")(x)\n    if batchnorm:\n        x = BatchNormalization()(x)\n    x = Activation(\"relu\")(x)\n    return x\n\ndef get_unet(input_img, n_filters=16, dropout=0.5, batchnorm=True):\n    # contracting path\n    c1 = conv2d_block(input_img, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)\n    p1 = MaxPooling2D((2, 2)) (c1)\n    p1 = Dropout(dropout*0.5)(p1)\n\n    c2 = conv2d_block(p1, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)\n    p2 = MaxPooling2D((2, 2)) (c2)\n    p2 = Dropout(dropout)(p2)\n\n    c3 = conv2d_block(p2, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)\n    p3 = MaxPooling2D((2, 2)) (c3)\n    p3 = Dropout(dropout)(p3)\n\n    c4 = conv2d_block(p3, n_filters=n_filters*8, kernel_size=3, batchnorm=batchnorm)\n    p4 = MaxPooling2D(pool_size=(2, 2)) (c4)\n    p4 = Dropout(dropout)(p4)\n    \n    c5 = conv2d_block(p4, n_filters=n_filters*16, kernel_size=3, batchnorm=batchnorm)\n    \n    # expansive path\n    u6 = Conv2DTranspose(n_filters*8, (3, 3), strides=(2, 2), padding='same') (c5)\n    u6 = concatenate([u6, c4])\n    u6 = Dropout(dropout)(u6)\n    c6 = conv2d_block(u6, n_filters=n_filters*8, kernel_size=3, batchnorm=batchnorm)\n\n    u7 = Conv2DTranspose(n_filters*4, (3, 3), strides=(2, 2), padding='same') (c6)\n    u7 = concatenate([u7, c3])\n    u7 = Dropout(dropout)(u7)\n    c7 = conv2d_block(u7, n_filters=n_filters*4, kernel_size=3, batchnorm=batchnorm)\n\n    u8 = Conv2DTranspose(n_filters*2, (3, 3), strides=(2, 2), padding='same') (c7)\n    u8 = concatenate([u8, c2])\n    u8 = Dropout(dropout)(u8)\n    c8 = conv2d_block(u8, n_filters=n_filters*2, kernel_size=3, batchnorm=batchnorm)\n\n    u9 = Conv2DTranspose(n_filters*1, (3, 3), strides=(2, 2), padding='same') (c8)\n    u9 = concatenate([u9, c1], axis=3)\n    u9 = Dropout(dropout)(u9)\n    c9 = conv2d_block(u9, n_filters=n_filters*1, kernel_size=3, batchnorm=batchnorm)\n    \n    c10 = Conv2D(2, (1, 1), activation=\"softmax\") (c9)\n    \n    model = Model(inputs=[input_img], outputs=[c10])\n    return model<\/pre>\n\n\n\n<p>Below you see the code of the data generator used during training. The computer will not be able to load 14000 images into memory, so we have to use batches. The function <em>generatebatchdata<\/em> delivers batches of training data for the training process. Note that <em>generatebatchdata<\/em> additionally randomizes the brightness and contrast of 70% of the training images to augment the data further.  The code was described in previous posts, so we will omit any more description.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def generatebatchdata(batchsize, fullpathimages, fullpathmasks):\n  \n    imagenames = os.listdir(fullpathimages)\n    imagenames.sort()\n\n    masknames = os.listdir(fullpathmasks)\n    masknames.sort()\n\n    for i in range(len(imagenames)):\n        assert(imagenames[i] == masknames[i])\n\n    while True:\n        batchstart = 0\n        batchend = batchsize    \n        \n        while batchstart &lt; len(imagenames):\n            \n            imagelist = []\n            masklist = []\n            \n            limit = min(batchend, len(imagenames))\n\n            for i in range(batchstart, limit):\n                if imagenames[i].endswith(\".png\"):\n                    img = cv2.imread(os.path.join(fullpathimages,imagenames[i]),cv2.IMREAD_COLOR )\n                    if random.random() &gt; 0.3:\n                        alpha = 0.8 + 0.4*random.random();\n                        beta = int(random.random()*15)\n                        img = cv2.convertScaleAbs(img, alpha=alpha, beta=beta)\n                    imagelist.append(img)\n                if masknames[i].endswith(\".png\"):\n                    img0 = np.zeros(dim, 'uint8')\n                    img1 = np.zeros(dim, 'uint8')\n                    img0 = cv2.imread(os.path.join(fullpathmasks,masknames[i]),cv2.IMREAD_UNCHANGED)\n                    img0 = np.where(img0 &gt; 0, 1, 0)   \n                    img1 = np.where(img0 &gt; 0, 0, 1)  \n                    img = np.zeros((dim[0], dim[1], 2),'uint8')\n                    msum = np.sum(np.array(img0) + np.array(img1))\n                    assert(msum == dim[0]*dim[1])\n                    img[:,:,0] = img0[:,:]\n                    img[:,:,1] = img1[:,:]\n                    masklist.append(img)\n\n            train_data = np.array(imagelist, dtype=np.float32)\n            train_mask= np.array(masklist, dtype=np.float32)\n\n            train_data -= train_data.mean()\n            train_data \/= train_data.std()\n            \n            yield (train_data,train_mask)    \n\n            batchstart += batchsize   \n            batchend += batchsize<\/pre>\n\n\n\n<p>We decided to use a batch size of ten for model training. The input layer is instantiated and assigned to <em>input_img<\/em>. The function <em>get_unet<\/em> creates the model and the model is compiled.  <\/p>\n\n\n\n<p>We use predefined callback functions for training, such as early stopping. The callback function were described in previous posts as well.<\/p>\n\n\n\n<p>The variables <em>stepstrainimages<\/em> and <em>stepsvalidimages<\/em> are needed for the <em>fit<\/em> method to set its parameters <em>steps_to_epoch<\/em> and <em>validation_steps<\/em>, see code below.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">batchsize = 10\n\ninput_img = Input((dim[0], dim[1], 3), name='img')\nmodel = get_unet(input_img, n_filters=1, dropout=0.0, batchnorm=True)\nmodel.compile(optimizer=Adam(), loss=\"categorical_crossentropy\", metrics=[\"accuracy\"])\n\n#model.load_weights(os.path.join(path, dirmodels,modelname))\n\ncallbacks = [\n    EarlyStopping(patience=10, verbose=1),\n    ReduceLROnPlateau(factor=0.1, patience=3, min_lr=0.00001, verbose=1),\n    ModelCheckpoint(os.path.join(fullpathmodels,modelweightname), verbose=1, save_best_only=True, save_weights_only=True)\n]\n\nstepstrainimages = len(os.listdir(fullpathimages))\/\/batchsize\nstepsvalidimages = len(os.listdir(fullpathimagesvalid))\/\/batchsize<\/pre>\n\n\n\n<p>Below you find the code to instantiate the data generators for training and validation and the <em>fit<\/em> function to train the model. Number of epochs were set to 20, but the code below can be executed several times in a row (especially if you use a jupyter editor). After training, the model&#8217;s weights (<em>modelname<\/em>) and the model structure (<em>modelnamejson<\/em>) are saved.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">generator_train = generatebatchdata(batchsize, fullpathimages, fullpathmasks)\ngenerator_valid = generatebatchdata(batchsize, fullpathimagesvalid, fullpathmasksvalid)\nmodel.fit(generator_train,steps_per_epoch=stepstrainimages, epochs=20, callbacks=callbacks, validation_data=generator_valid, validation_steps=stepsvalidimages)\n\nmodel.save_weights(os.path.join(path, dirmodels,modelname))\njson_model = model.to_json()\nwith open(os.path.join(path, dirmodels,modelnamejson), \"w\") as json_file:\n    json_file.write(json_model)<\/pre>\n\n\n\n<p>For testing the model, we use the code below. It loads all test images into memory. Since there are only a limited number, it will not face a memory problem to store them into an array. The images are further processed by standardizing with the mean and the standard deviation.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">imagetestlist = []\n\nimagetestnames = os.listdir(fullpathimagestest)\nimagetestnames.sort()\n\nfor imagename in imagetestnames:\n    if imagename.endswith(\".png\"):\n        imagetestlist.append(cv2.imread(os.path.join(fullpathimagestest,imagename),cv2.IMREAD_COLOR ))\n        \ntest_data = np.array(imagetestlist, dtype=np.float32)\ntest_data -= test_data.mean()\ntest_data \/= test_data.std()\n\npredictions = model.predict(test_data, batch_size=1, verbose=1)<\/pre>\n\n\n\n<p>The <em>predict<\/em> method above predicts the test images and returns the result into the <em>predictions<\/em> variable.<\/p>\n\n\n\n<p>Below you find the code to lift the pixel values to 255 in case the predicted images pixel above 0.5.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">ind = 56\nplt.imshow(imagetestlist[ind])\n\nimg = predictions[ind][:,:,0]\nimg = np.where(img &gt; 0.5, 255, 0)\nplt.imshow(img)<\/pre>\n\n\n\n<p>Figure 2 shows two images. The first image on the left side is the original image and the second image on the right side is the predicted mask image. You can see that the mask image is clearly showing the location of the pixels indicating a barcode. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/testpic.png\" alt=\"\" class=\"wp-image-3981\" width=\"456\" height=\"226\" srcset=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/testpic.png 743w, https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/testpic-300x149.png 300w\" sizes=\"auto, (max-width: 456px) 100vw, 456px\" \/><figcaption>Figure 2: Predicted mask<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">Life Example<\/h2>\n\n\n\n<p>The idea of this project was to use a life video stream and apply the predict function on the images of the video stream to generate masks. The masks can be overlaid with the original images and shown on the display.<\/p>\n\n\n\n<p>Another idea is to apply a barcode scanning software to the original image to read the barcode&#8217;s content. The content is then put as text onto the display.<\/p>\n\n\n\n<p>Below we first open the model structure and save it into <em>json_file<\/em>. The function <em>model_from_json<\/em> moves the structure into <em>loaded_model<\/em>. The code then sets the weights, stored in <em>modelname<\/em> into <em>loaded_model<\/em>. <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">json_file = open(os.path.join(fullpathmodels, modelnamejson), 'r')\nloaded_model_json = json_file.read()\njson_file.close()\nloaded_model = model_from_json(loaded_model_json)\n\nloaded_model.load_weights(os.path.join(path, dirmodels,modelname))<\/pre>\n\n\n\n<p>The images of the video stream do not have square size, so we need to extract square images from them. For this we have written the function <em>getRectStatic<\/em>. It is basically does the as same function <em>getRect<\/em>, but the randomization of the image position coordinates was taken out. <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def getRectStatic(img):\n    \n    width = img.shape[0]\n    height = img.shape[1]\n    lw = 0\n    ld = 0\n    side = 0\n    \n    if height &gt; width:\n    \n        widthscale = int(0.9*width)\n        left = width - widthscale\n        lw = left\/\/2\n        down = height - widthscale\n        ld = down\/\/2\n        side = widthscale\n    \n    else:\n    \n        heightscale = int(0.9*height)\n        down = height - heightscale\n        ld = down\/\/2\n        left = width - heightscale\n        lw = left\/\/2\n        side = heightscale\n    \n    \n    return (ld,lw,int(side), int(side))<\/pre>\n\n\n\n<p>Below the code which displays a overlaid image from a webcam with the predicted mask image. The opencv method<em> VideoCapture<\/em> instantiates a video stream object. Inside the endless while loop, an image is read from the video stream. The image is then applied to <em>getRectStatic<\/em> to extract coordinates of a square image. The square image <em>img<\/em> is retrieved then from <em>frame<\/em>. The image <em>img<\/em> is resized and moved into a one element <em>imgpredict<\/em> list. This element inside <em>imgpredict<\/em> is then standardized with the mean and standard deviation method.<\/p>\n\n\n\n<p>The keras method <em>predict<\/em> processes <em>imgpredict <\/em>and moves the predicted output (one mask image) into a one element <em>predictions<\/em> list. The numpy <em>where<\/em> method sets each pixel value of the mask image to 255 or to 0, depending if the value is above 0.3 or below.<\/p>\n\n\n\n<p>The code below resizes the mask image and gets the largest contour of mask image with opencv method <em>findContours<\/em>. The contour is used to compute a rectangle around the contour (<em>minAreaRect<\/em>) and to draw it on the display (<em>drawContour)<\/em>. <\/p>\n\n\n\n<p>The function decode is a <a href=\"https:\/\/pypi.org\/project\/pyzbar\/\">pyzbar<\/a> library function.  It calculates from a barcode image the content of the barcode and returns it into the variable <em>result<\/em>. The code is iterating through the structure of the variable <em>result<\/em> and moves the recognized barcode content into <em>barcodetext<\/em>. The string in <em>barcodetext<\/em> is put onto the display as well.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">vid = cv2.VideoCapture(0) \n\nsaveimg = np.zeros((dim[0], dim[1], 3), \"uint8\")\n \ncount = 0    \n\nbardcodetxt = \"\"\n    \nwhile(True): \n\n    ret, frame = vid.read() \n    \n    d,w,side,_ = getRectStatic(frame)\n    \n    img = np.zeros((side, side, 3), \"uint8\")\n    \n    img[::] = frame[w:w+side, d:d+side,:]\n    \n    imgpredict = []\n    imgpredict.append(cv2.resize(img, dim, interpolation = cv2.INTER_AREA))\n    imgpredict = np.array(imgpredict, dtype=np.float32)\n    imgpredict -= imgpredict.mean()\n    imgpredict \/= imgpredict.std()\n    \n    predictions = loaded_model.predict(imgpredict, batch_size=1, verbose=0)\n    \n    prediction = predictions[0][:,:,0]\n    prediction = np.where(prediction &gt; 0.3, 255, 0)\n    prediction = np.array(prediction, \"uint8\")\n\n    \n    predresized = np.zeros((side, side, 3), np.uint8)\n    predresized[:,:,2] = cv2.resize(prediction, (side, side), interpolation = cv2.INTER_AREA)[:,:]\n    contours, hierarchy = cv2.findContours(prediction,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)\n    cnts = sorted(contours, key=cv2.contourArea)\n\n    result = decode(img)\n    \n    if len(cnts) &gt; 0:\n        rect = cv2.minAreaRect(cnts[-1])\n        box = cv2.boxPoints(rect)\n        box *= side\/dim[0]\n        box = np.int0(box)\n        img = cv2.drawContours(img,[box],0,(0,255,0),2)\n        \n    for index,i in enumerate(result):\n        bardcodetxt=i.data.decode(\"utf-8\")\n        saveimg = newimg.copy()\n        \n\n\n    newimg = cv2.addWeighted(img, 0.5, predresized, 0.5, 0)\n    \n    newimg = cv2.putText(newimg, bardcodetxt, (20,30), cv2.FONT_HERSHEY_SIMPLEX,  1, (255,255,255), 2, cv2.LINE_AA)\n\n    cv2.imshow('frame',newimg) \n\n      \n    key = cv2.waitKey(1) &amp; 0xFF\n    if key == ord('q'): \n        break\n    if key == ord('s'):\n        cv2.imwrite(os.path.join(fullpathimagestest, f\"{count}.png\"), cv2.resize(img, dim, interpolation = cv2.INTER_AREA))\n        count += 1\nvid.release()\ncv2.destroyAllWindows() <\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Results<\/h2>\n\n\n\n<p>In Figure 3 you find a screen shot of the displayed video stream. Here somebody holds a bottle with barcode into the scene of the webcam. The predicted mask of the barcode is overlaid to the original video stream. You find the rectangle around the mask (green). On the above and left corner you can find the content of the barcode computed by the pyzbar library. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/files\/2021\/02\/video.png\" alt=\"\" class=\"wp-image-4077\" width=\"288\" height=\"291\" \/><figcaption>Figure 3: Life barcode detection and barcode content<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The life video stream works actually pretty good. It does show the position of barcodes, whenever you put one in front of the webcam. The barcode reading software (pyzbar) however needs several retries to compute the content of the barcode correctly. In general we can say that neural networks can be used very well for determining the barcode position.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Acknowledgement<\/h2>\n\n\n\n<p>Thank you very much to the master students of the class embedded systems of winter semester 2020. Seven students labeled in their semester assignment 3500 images which is very time consuming work.<\/p>\n\n\n\n<p>Also special thanks to the University of Applied Science Albstadt-Sigmaringen for providing the infrastructure and the appliances to enable this class.   <\/p>\n","protected":false},"excerpt":{"rendered":"<p>In late 2020 we were teaching a class called embedded systems, which includes a semester assignment, as well. Due to the overall situation in 2020 it was difficult to do it at the university, so we decided to issue the semester assignment as a home work. One of topics of the assignments was the detection &hellip; <a href=\"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/2021\/02\/05\/barcode-detection-with-a-neural-network\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Barcode detection with a neural network<\/span><\/a><\/p>\n","protected":false},"author":24,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[4,3,5,7],"class_list":["post-3967","post","type-post","status-publish","format-standard","hentry","category-allgemein","tag-ai","tag-deep-learning","tag-ki","tag-neural-network"],"_links":{"self":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts\/3967","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/users\/24"}],"replies":[{"embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/comments?post=3967"}],"version-history":[{"count":291,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts\/3967\/revisions"}],"predecessor-version":[{"id":4831,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/posts\/3967\/revisions\/4831"}],"wp:attachment":[{"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/media?parent=3967"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/categories?post=3967"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www3.hs-albsig.de\/wordpress\/point2pointmotion\/wp-json\/wp\/v2\/tags?post=3967"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}