Contents
Overview
The Raspberry Pi (RPi) team is in charge of 2 main functional requirements.
The first requirement is for the RPi to serve as the communication hub between 3 other subsystems: the Arduino board (Arduino), the Nexus 7 tablet (Android) and the computer on which the robot’s algorithms (Algorithm) are ran. To achieve this functionality, the RPi must be able to communicate with Arduino via a USB serial connection, with Android using Bluetooth, with Algorithm over Wifi, and send and receive messages between these subsystems with as little delay as possible for the robot to run quickly.
The second requirement is for the robot to detect and classify 5 out of 15 possible symbols that are placed randomly within the maze, during its exploration of the maze.
Getting Started
Before we work on fulfilling the functional requirements, we first set up the RPi such that it is easy for us to work with, based on the instructions provided on NTULearn.
Setting-up the RPi’s Operating System
The operating system provided on NTULearn is Raspbian (Jesse). In addition to installing Raspbian on the Micro SD card, we configured the RPi to be remotely accessible through Secure Shell (SSH) connections so that we can work with the RPi using our computers.
Setting-up RPi’s Interfaces
For the RPi to serve as a Wifi access point, we set up the RPi as a host access point, set up the network interface and set up a DHCP server. We also set up IP forwarding so that our computers have access to the internet while connected to the RPi, for ease of software development.
For the RPi to connect to Android via Bluetooth, we ran the RPi Bluetooth daemon process in compatible mode, made the RPi discoverable to Android, configured Bluetooth connection between RPi and Android using Serial Port Profile and automated the Bluetooth connection.
Software Development
Since Raspbian (Jesse) comes with Python (versions 2.7 and 3.4), we decided to use Python (3.4) as our programming language. It helps that Python is easy to learn and that it has many libraries that would help us fulfil our functional requirements.
System Architecture – Communications
Connections
We used the PySerial library to establish a serial connection between RPi and Arduino. While practicing with our robot, we found that whenever Arduino disconnects and reconnects to the RPi, the port device name changes – ‘/dev/ttyACM0’, ‘/dev/ttyACM1’, etc… Instead of hard-coding all possible names or using a lot of if-else statements, we used the port name ‘/dev/serial/by-id/usb-Arduino__www.arduino.cc__0043_75232303235351F091C0-if00’ to create the serial connection. This ‘port’ that we used is a symbolic link which always points to the correct port name, such that we do not need extensive code to create the connection.
We used the PyBluez library to access the functionality of RPi’s bluetooth, then connect RPi to Android via bluetooth. Between the RPi and Android, RPi serves as the bluetooth server while Android takes on the role of client. This means that before connection, the RPi’s bluetooth will listen on a radio frequency communication (RFCOMM) channel and advertise its service using the channel. Android will then be able to see the RPi and connect to it, establishing the connection.
We connected RPi to Algorithm using network sockets directly, via the socket library. As with the connection between RPi and Android, RPi serves as the server that Algorithm connects to, as a client.
Parallel-Processing Communication
For handling communications between all subsystems with as little delay as possible, we used the ‘multiprocessing’ library in our main communication class, ‘MultiProcessComms’. In addition, we defined our own protocols so that we know who each message was sent from, and who each message should be sent to. To facilitate asynchronous communication, we used 3 queues: a message queue for messages to Algorithm and Arduino, a message queue for messages to Android, and an image queue.
The message queue is responsible for holding messages that the RPi read from any of the 3 established connections. It also holds messages that RPi would write to Arduino or Algorithm, after determining who each message is for.
The image queue holds pictures that the RPi has taken with its PiCamera, to be sent to the image processing server for image recognition.
We spawned a process for each task that the RPi needs to perform: reading incoming messages from Arduino, reading incoming messages from Android, reading incoming messages from Algorithm, writing messages to Arduino or Algorithm, writing messages to Android, and image processing. These processes (which are based on threads) are ran in parallel, which means little overhead is involved unlike the case where tasks are ran sequentially.
Each of the read processes would retrieve the message from their respective connections and, depending on the header, place the message in the message queues to be sent to correct recipients. If any of subsystems disconnects while a message is being read, an exception would be raised to our MultiProcessComms object, which would handle it by terminating the relevant processes before restarting the connection and processes. The processes have to be re-instantiated, as the connection instances in the processes are not shared states that MultiProcessComms have access to.
The write process would take the first message from the message queue if it is not empty, looks at message header to know the target of this message, then forward the message to either Arduino or Algorithm. The Android-write process behaves just like the other write process, but writes only to Android. If any of subsystems disconnects while a message is being written, a ‘connection-dropped’ value would be updated. An exception would then be raised to our MultiProcessComms object, which would handle it by terminating the correct processes based on the ‘connection-dropped’ value before restarting the connection and processes.
We have a separate write process and queue for Android alone so that if Android’s connection to RPi is dropped, our robot can continue to run while attempting to reconnect to Android, since message-passing between Arduino, Algorithm, and RPi is not interrupted.
Note that any message that was being sent when a connection dropped will be lost. We have another version of MultiProcessComms that made use of double-ended queues, that is capable of reinserting the message back into the correct queue to be sent again. Although it is more reliable, we did not use this version due to its lower efficiency, as well as the fact that our subsystems rarely disconnect during our trial runs.
The image-processing process would take the first picture from the image queue and send it to the image processing server. After getting the reply from the image processing server, the id of any symbol detected would be paired with its corresponding obstacle coordinates to form a string that contains the symbol’s details. This string is then placed in the to-Android message queue that the Android-write process uses.
If all symbols have been detected (tracked using a shared state, ‘image_count’), RPi would send a message to Algorithm to let it know that all symbols have been detected, so that it can resume normal exploration which is faster than exploration with image recognition.
Image Recognition
Overview
We used convolutional neural networks (CNNs) to detect the symbols as they are the best algorithms for computer vision. For image recognition, our strategy is to capture an image using the RPi’s PiCamera once our robot detects an obstacle, based on a message sent by Algorithm, then send it over to our image processing server via ImageZMQ to do image recognition. Doing so means that we can use more powerful CNNs without worrying about the RPi’s resources. Once the image processing server has finished processing the image, it sends the result back to the RPi, which then forwards it to Android for display.
Feature Learning
We used a COCO-pretrained (Common Objects in COntext) faster RCNN (Regional Convolutional Neural Networks) as our base model because it has the highest accuracy in model zoos. We performed transfer learning using Tensorflow, a deep-learning framework, on 200+ hand-labelled images (pictures of the symbols). After training the model for ~6000 epochs, we achieved confidence scores of up to 99% for each class when performing inference on images captured at the maze.
However, confidence scores of true positives have dropped down to ~70%, especially for red-coloured symbols since it appears darker, and those of false positives have reached ~90% under different lighting conditions. Furthermore, there have been cases where our robot move too near to the obstacle when taking a picture, resulting in symbols being detected with significantly lower accuracy.
We remedied this problem by gathering more training images for our model, in which the symbols and obstacles are placed at a much closer distance to the camera. In addition, the positions of obstacles and our robot are varied with respect to the position of ceiling lights during gathering of data to better account for different lighting conditions. In total, 75 training images (5 per symbol) were obtained, of which 20% (15 images) are used for validation. We did transfer learning again using this new set of data to ~6000 epochs, reaching a validation loss of 0.1.
After the second training phase, true positives are almost always detected with high confidence scores, while false positives only have confidence scores up to ~20%. To maximise the number of true positives we can get, we set a confidence threshold of 70%.
Feature Engineering
One of the requirements is that we display the location (coordinates) of detected symbols in the maze, in addition to the detected symbols’ ids. Before RPi captures an image of an obstacle, our robot would position itself right in front of the obstacle, such that at most 3 grids of any potential obstacle would be present in the image. Algorithm then sends the obstacle coordinates: the one on the left, the one in the middle, and the one on the right (if they exist; if not ‘-1′ is used) to RPi. RPi then sends the image to the image processing server.
The image processing server passes the image as input through the object detection model for inference which then returns the bounding boxes of symbols that it has detected, sorted by confidence score. Thereafter, the x-coordinate values of bounding boxes are compared against thresholds to determine the symbols’ positions in the image: left, middle, or right, and format the symbol ids accordingly. Next, the image processing server sends its reply of the formatted symbol ids back to RPi. RPi maps each symbol to its obstacle coordinates before sending it to Android for displaying.
Besides doing feature engineering to get the symbols’ locations, we made use of the following features to maximise detection accuracy: the bottom-right y-coordinate of bounding boxes and red-coloured symbols’ confidence scores.
There are cases our model detect a symbol in the background of a captured image, resulting in a false positive. Since these detected symbols have smaller bounding boxes, which in turn have lower bottom-right y-coordinate values, we set a threshold that filter these false positives out.
Since red-coloured symbols are occasionally detected with lower confidence scores, we have a separate confidence threshold for them that is set to a lower value of 50%.
We tried checking for correctness using the bounding box’s area and height-width ratio too, but did not use these features eventually as the other features used were sufficient.