Volume 16 Issue 1
Mar.  2018
Article Contents

Delowar Hossain, Genci Capi, Mitsuru Jindai. Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping[J]. Journal of Electronic Science and Technology, 2018, 16(1): 11-15. doi: 10.11989/JEST.1674-862X.61103113
Citation: Delowar Hossain, Genci Capi, Mitsuru Jindai. Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping[J]. Journal of Electronic Science and Technology, 2018, 16(1): 11-15. doi: 10.11989/JEST.1674-862X.61103113

Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping

doi: 10.11989/JEST.1674-862X.61103113
More Information
  • Author Bio:

    Delowar Hossain received the B.Sc. and M.Sc. degrees in computer science and engineering from University of Rajshahi, Rajshahi in 2010 and 2012, respectively. He was a lecturer at the Department of Computer Science and Engineering, Dhaka International University, Dhaka from 2012 to 2014. He is currently pursing the Ph.D. degree with the Graduate School of Science and Engineering for Education, University of Toyama, Toyama. He is also working as a visiting researcher with Hosei University, Tokyo. His research interests include the industrial robot, deep learning, artificial intelligence, computer vision, image processing, intelligent robotics, learning, and evolution

    Genci Capi received the B.Eng. degree from Polytechnic University of Tirana in 1993 and the Ph.D. degree from Yamagata University, Yamagata in 2002. He was a researcher with the Department of Computational Neurobiology, Advanced Telecommunication Research Institute International, Kyoto from 2002 to 2004. In 2004, he joined the Department of System Management, Fukuoka Institute of Technology, Fukuoka as an assistant professor, and in 2006 he was promoted to an associate professor. In 2010, he joined the Department of Electrical and Electronic Systems Engineering, University of Toyama, as a professor. He is currently a professor with the Department of Mechanical Engineering, Hosei University, Tokyo. His research interests include intelligent robots, brain machine interface, multi robot systems, humanoid robots, learning, and evolution

    Mitsuru Jindai received the Ph.D. and M.S. degrees from Okayama University, Okayama in 1999 and 1996, respectively. From 1999 to 2004, he was an assistant professor with Ehime University, Matsuyama. He was an associate professor and a professor with Okayama Prefectural University, Okayama from 2004 to 2011 and from 2011 to 2013, respectively. Currently, he is a professor with the Graduate School of Science and Engineering, University of Toyama. His current research interests include human-robot interaction and image processing

  • Authors’ information: D. Hossain and M. Jindai are with the Graduate School of Science and Engineering for Education, University of Toyama, Toyama 9350343 (e-mail: delowar_cse_ru@yahoo.com; jindai@eng.u-toyama.ac.jp); G. Capi (corresponding author) is with the Department of Mechanical Engineering, Hosei University, Tokyo 184-0002 (e-mail: capi@hosei.ac.jp)
  • Received Date: 2016-11-03
  • Rev Recd Date: 2017-12-01
  • Publish Date: 2018-03-01

通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(6)  / Tables(1)

Article Metrics

Article views(627) PDF downloads(433) Cited by()

Related
Proportional views

Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping

doi: 10.11989/JEST.1674-862X.61103113
  • Author Bio:

  • Corresponding author: D. Hossain and M. Jindai are with the Graduate School of Science and Engineering for Education, University of Toyama, Toyama 9350343 (e-mail: delowar_cse_ru@yahoo.com; jindai@eng.u-toyama.ac.jp);  G. Capi (corresponding author) is with the Department of Mechanical Engineering, Hosei University, Tokyo 184-0002 (e-mail: capi@hosei.ac.jp)

Abstract: The performance of deep learning (DL) networks has been increased by elaborating the network structures. However, the DL netowrks have many parameters, which have a lot of influence on the performance of the network. We propose a genetic algorithm (GA) based deep belief neural network (DBNN) method for robot object recognition and grasping purpose. This method optimizes the parameters of the DBNN method, such as the number of hidden units, the number of epochs, and the learning rates, which would reduce the error rate and the network training time of object recognition. After recognizing objects, the robot performs the pick-and-place operations. We build a database of six objects for experimental purpose. Experimental results demonstrate that our method outperforms on the optimized robot object recognition and grasping tasks.

Delowar Hossain, Genci Capi, Mitsuru Jindai. Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping[J]. Journal of Electronic Science and Technology, 2018, 16(1): 11-15. doi: 10.11989/JEST.1674-862X.61103113
Citation: Delowar Hossain, Genci Capi, Mitsuru Jindai. Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping[J]. Journal of Electronic Science and Technology, 2018, 16(1): 11-15. doi: 10.11989/JEST.1674-862X.61103113
    • Deep learning (DL) starts an upsurge in the artificial intelligence research since Hinton et al.[1] first proposed it. After that, DL has been widely applied in different fields such as complex image recognition, signal recognition, automotive, texture synthesis, military, surveillance, natural language processing, and so on. The main focus of deep architectures is to explain the statistical variations in data and automatically discover the features abstraction from the lower level features to the higher level concepts. The aim is to learn feature hierarchies that are composed of lower level features into higher level features abstraction.

      Researches then began on analyzing DL networks for robotics applications. For robotics applications, object recognition is a very crucial research area. Nevita et al.[2] introduced the object recognition process in 1977. After then, researchers had proposed different types of methods for different types of object recognition problems[3]-[7]. Nowadays, DL is gaining popularity in the applications of robotics object recognition. Many researchers have worked on using DL[8]-[13] for several robot tasks. Those contributions make the robot applications useful in industrial applications as well as household work.

      However, the DL networks creation and training need significant effort and computation. DL has many parameters that have influenced on the network performance. Recently, researchers are working to integrate evolutionary algorithms with DL in order to optimize the network performance. Young et al.[14] addressed multi-node evolutionary neural networks for DL to automating network selection on computational clusters through hyper-parameter optimization using genetic algorithm (GA). A multilayer DL network using a GA was proposed by Lamos-Sweeney[15]. This method reduced the computational complexity and increased the overall flexibility of the DL algorithm. Lander[16] implemented an evolutionary technique in order to find the optimal abstract features for each auto-encoder and increased the overall quality and abilities of DL. Shao et al.[17] developed an evolutionary learning methodology by using the multi-objective genetic programming to generate domain-adaptive global feature descriptors for image classification.

      In this paper, we propose the autonomous robot object recognition and grasping system using a GA and deep belief neural network (DBNN) method. GA is applied to optimize the parameters of DBNN method such as the number of epochs, number of hidden units, and learning rates in the hidden layers, which have much influence on the structures of the network and the quality of performance in DL networks. After optimizing the parameters, the objects are recognized by using the DBNN method. Then, the robot generates a trajectory from the initial position to the object grasping position, picks up the object, and places it in a predefined position.

      The rest of the paper is organized as follows: DBNN method is described in Section 2; the evolution of DBNN parameters is mentioned in Section 3; the GA evolution results are presented in Section 4; the GA and DBNN implementation on the robot are shown in Section 5. Finally, we conclude the paper and give the future work in Section 6.

    • In this paper, we design the robot object recognition and grasping system by successfully applying the DBNN method and GA. The DBNN method is used for object recognition purpose and the GA is used to optimize the DBNN parameters.

    • A DBNN is a generative graphical model composed of stochastic latent variables with multiple hidden layers of units between the input and output layers. A DBNN composes of a stack of restricted Boltzmann machines (RBMs). An RBM consists of a visible layer and a hidden layer or a hidden layer and another hidden layer. The neurons of a layer are fully connected to the neurons of another layer, but the neurons of the same layer are not internally connected with each other. An RBM reaches to the thermal equilibrium when the units of the visible layer are clamped. The general structure of a DBNN method is shown in Fig. 1. The DBNN follows two basic properties: 1) DBNN is a top-down layer-by-layer learning procedure. It has generative weights in the first two hidden layers that would find how the variables in one layer communicate with the variables of another layer, and discriminative weights in the last hidden layer which would be used to classify the objects. 2) After layer-wise learning, the values of the hidden units can be derived by bottom-up pass. It starts from the visible data vector in the bottom layer.

      Figure 1.  General structure of a DBNN method.

      The energy of the joint configuration $\{ v, h\} $ between visible layer and hidden layer in RBM is with bias values

      where v is a set of visible units with $v \in {\{ 0, 1\} ^{{n_v}}}$ and h is a set of hidden units with $h \in {\{ 0, 1\} ^{{n_h}}}$. nv is the total number of units in the visible layer and nh is the total number of units in the hidden layer. a is the bias term of visible units and b is the bias term of hidden units. w represents the weight between visible units and hidden units.

      We apply the DBNN method for object recognition purpose in the proposed experiments. The structures of the proposed implemented DBNN method is shown in Fig. 2. The DBNN method consists of one visible layer, three hidden layers, and one output layer. The visible layer consists of 784 neurons of the input image. The GA[18] finds the optimal number of hidden units in each layer. In our implementation, the number of hidden units in the 1st, 2nd, and 3rd hidden layers are 535, 229, and 355, respectively. The output layer consists of six different types of object classes. In the sampling method, we apply two different types of sampling: Contrastive divergence (CD) and persistent contrastive divergence (PCD). In the first hidden layer, we apply the PCD sampling method, as PCD explores the entire input domain. In the second and third hidden layers, we apply the CD sampling method, as CD explores near the input examples. By combining the two sampling methods, the proposed DBNN method can gather optimal features to recognize objects.

      Figure 2.  Structure of the proposed DBNN method.

      After training the input features, fine-tuned operation is performed using back propagation in order to reduce the discrepancies between the original data and its reconstruction. We use the softmax function to classify the objects. The back propagation operation will terminate if one of the following conditions is satisfied: 1) reach to the best performance, which is defined by the mean squared error (MSE), 2) reach to the maximum validation check, i.e., 6, 3) reach to the minimum gradient, or 4) reach to the maximum epoch number, i.e., 200.

    • Optimization is the process of making the DBNN method better. In the implementation, the aim is to optimize the DBNN parameters in order to improve the performance and the quality of the DL network structures.

      In order to optimize the DBNN parameters, we apply an evolutionary algorithm known as GA. A real value parallel GA[18] is employed in our optimization process, which outperforms on the single population GA in terms of the quality of the solution. Using parallel GA, we optimize the number of hidden units, the number of epochs, and the learning rates to reduce the error rate and training time. Our main contributions in parallel GA are the design of the fitness functions and the design of the parameters in the GA structure.

      The main goal is to find the optimal number of hidden units, epoch numbers, and learning rates. Therefore, the fitness is evaluated to minimize the error rate and network training time. The fitness function is defined as follows

      where eBBP is the number of misclassification divided by the total number of test data before back propagation, eABP is the number of misclassification divided by the total number of test data after back propagation, tBBP is the training time before back propagation, and tDBP is the training time during back propagation operation.

      The GA functions and parameters are presented in Table 1. A variety of mutation rates have been tried and the following mutation rates are found to be the best.

      Function nameParameters
      Number of subpopulations4
      Initial number of individuals (subpopulation)25, 25, 25, 25
      Crossover probability0.8
      Mutation rate (subpopulation)0.100, 0.030, 0.010, 0.003
      Isolation time10 generations
      Migration rate10%
      Results on screenEvery 1 generation
      Competition rate10%
      Termination30 generations

      Table 1.  GA functions and parameters

    • A real value parallel GA has been employed in conjunction with the mutation, selection, and crossover operation. The real value GA performs better comparing with the binary GA. The population size of GA is 100. The best objective value per subpopulation is presented in Fig. 3. It can be seen that the best objective value is 6.54599 on the 24th generation which is shown by red color. The fitness values of each individual through evolution are presented in Fig. 4. In addition, Fig. 4 shows how the number of individuals varies in every subpopulation during evolution. In the initial generation, the fitness value started from 11.5. The worst individuals were removed from the less successful subpopulations. After seven generations, from 50 to 70 in indexes of the individuals, the convergences of the individuals were better. The convergences would be the most successful at the end of the optimization on the 24th generation.

      Figure 3.  Best objective values per subpopulation.

      Figure 4.  Fitness value of individuals for all generation.

      The performances of GA with arbitrarily selected DBNN parameters were compared in [9]. In the case of arbitrarily selected DBNN parameters, the numbers of hidden units in three hidden layers are 1000, 1000, and 2000, the numbers of epochs are 200, 200, and 200, the learning rate of the 1st hidden layer is 0.001, the learning rates of the 2nd and 3rd hidden layers are both 0.1. In the case of GA, the optimal numbers of hidden units in three hidden layers are 535, 229, and 355, the numbers of epochs are 120, 207, and 221, the learning rate of the 1st hidden layer is 0.04474, the learning rates of the 2nd and 3rd hidden layers are both 0.44727. In both cases, the error rate is nearly same, i.e., 0.0133. The training time has been reduced 84% by using the optimized DBNN parameters compared with the arbitrary DBNN parameters.

    • In order to verify the optimized DBNN parameters, the experiments of robot object recognition were considered. A database of six robot graspable objects including four different types of screwdriver, a round ball caster, and a small electric battery was built for the experimental purpose. The database consisted of 1200 images (200 images for each object) in different orientations, positions, and lighting conditions.

      At first, the universal serial bus (USB) camera took a snapshot of the experimental environment and this snapshot would be converted to a grayscale image. Then, a morphological structuring element operation was applied in order to detect the existing objects in the environment. The detected objects were separated based on the centroid of the objects in the size of 28 pixels × 28 pixels. Each image was converted to the input vector of 784 neurons by reshaping operation. In addition, normalization and shuffling operations were performed. After then, these input vectors passed through the three hidden layers. As output, the DBNN generates six probability values for each input vector because we train the database using six different types of objects. From this probability values, the objects are recognized. For example, an object image, a red-black screwdriver in this case, is considered as input in Fig. 5. After processing this 28 pixels × 28 pixels image, the input vector passed through three hidden layers. As output, the DBNN generated six probability values including 0.0001, 0.0028, 0.9999, 0.0001, 0.0000, and 0.0000. The highest probability value is 0.9999 which is belonged to the 3rd object. By the same ways, other objects can be recognized.

      Figure 5.  Sample object recognition process for a red-black screwdriver.

    • For robot object grasping purpose, a PUMA robot manipulator from DENSO Corporation is used. The robot has six degrees of freedom. The robot can grasp any object within the size of 32 mm depth and 68 mm width using the robot gripper. A sequence of experiments for object recognition and robot pick-place operations in different positions, orientations, and lighting conditions were run. The snapshots of the real-time experiments are shown in Fig. 6.

      Figure 6.  Snapshots for object recognition and robot grasping process.

      We designed a graphical user interface (GUI). The robot recognized the requested objects using DBNN method when the user required for an intended object by clicking on GUI. The robot found the grasping position based on the object center. The robot generated a trajectory from the initial position to the object grasping position. After reaching to the object position, the robot adjusted its gripper orientation based on the object orientation. Then the robot grasped the intended object, generated another path trajectory to the predefined destination position, placed the object, and returned to the initial position. At the same way, the remaining objects can be recognized and the robot can perform the pick-place operations.

    • We proposed a GA based DBNN parameters optimization method for robot object recognition and grasping. The proposed method optimized the number of hidden units, the number of epochs, and the learning rates in three hidden layers. We applied the optimized method on the real-time object recognition and robot grasping tasks. The objects were recognized by using the optimized DBNN method, then the robot grasped the objects and placed them in the predefined position. The experimental results show that the proposed method is efficient enough for robot object recognition and grasping tasks.

      The most important part of the future work is to integrate multi-population genetic algorithm (MPGA) with DBNN method. Therefore, we plan to investigate the MPGA performance with the different number of subpopulations and other genetic parameters.

Reference (18)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return