Week 8 16th July- 22nd July

After attaining our minimum viable product (completion of seamlessly integrating step 1 of the procedure), we decided to prioritize finishing the entire application. Firstly, We created a mathematical counter to loop through the numbers 0 to 4. Then, after some trial and error, we managed to get the code to remove any existing 3d objects. After which, we pieced these 2 together. When the trigger for the next step is activated, the counter will increase by 1, which will represent the progress of the user throughout the 5 steps. When the counter changes, an update listener will proceed to remove the current 3d objects and overlay the 3d object corresponding to the counter. Thus, allowing us to transition from step to step using a trigger.

 

Currently the trigger is designated as a button press but we decided to implement an automatic step tracker so that the transition is much smoother and the experience could be more hands-free. To achieve that, we would have to improve the machine learning model to make it much more reliable and also develop an algorithm to track the steps based on the visual input. As of now, the algorithm is ready and the machine learning model is still in the midst of training.

Week 7 8th July- 15th July

A Glimmer of Hope for the app development

After hitting brick wall after brick wall when trying to integrate our machine learning model with ARCore. We finally managed to get a glimpse on how the integration would work out. We currently managed to get tensorflow to work alongside ARCore and get the models to be displayed on the position that tensorflow has defined. Although we have currently hit our Minimum Viable Product (MVP), much more can still be done to integrate the different features we want to integrate to make the app experience as seamless as possible.

Week 6: 1st July – 7th July

Completion of All Essential Animations

After some time fiddling around, we have finally finished making the essential animations using Autodesk Maya:

1) Insertion of pin, card tray is ejected (but not completely)

2) Card tray is ejected completely, pin is removed

3) Flipping the NanoSIM around

4) Inserting the NanoSIM into the card tray

5) Inserting the card tray back into the smartphone

 

Brick Wall in App integration

Although we have finally managed to successfully get our animations to work out in android studio and managed to get the animations to play on its own without the need for an additional button, we have hit another brick wall in trying to integrate both the machine learning model with the ARCore. Currently, ARCore does not allow for the camera to be shared with tensorflow machine learning models. Thus we have to think up of new ways to capture images from ARCore and pass it to the machine learning model in a format that it can receive. We tried looking for current examples in integrating machine learning with AR, however, most examples involve using tensorflow image classification instead of object detection, which does not provide tracking, which we believe would be useful for AR. Despite the lack of relevant case studies, we managed to get a hint on how to capture images from ARCore and pass it to the tensorflow model. However, more will need to be done to properly integrate our object detection model into it.

Week 5: 24th June – 30th June

Encoding the Animations

Before embarking on incorporating both apps, Wei Bin decided to try to get animations to work on an Android AR app to ensure that everyone can continue on with their work without having to wait if the animation file is compatible. Following the process from incorporating the machine learning model, he tried to search for similar apps and managed to find one done by the google ARCore development team. However, the code is extremely confusing and he has still not yet found the portion where the code pulls out the animation file and thus, has not yet been able to load our animation to see whether it can work. However, we are slowly figuring out different parts of the code through online tutorials and we should be able to get it up by next week.

Improving the Machine Learning Model

In previous attempt to produce a model, it did not perform as well as we have expected. It was unable to pick up intermediate steps when the card slot has been ejected partially. As we envisioned the app to be seamless, it was critical for the app to identify intermediate steps and guide the user along. Hence, Jian Xian saw the need to retrain the model to account for these transitions. In the image below, he had to explicitly label the partially ejected cardslot to improve the accuracy of the model.

To test if the model works, he managed to integrate the model into an Android App and it was exciting when the app managed to identify the intermediate step. This is illustrated in the screenshot below:           

Week 4: 17th June – 23th June

Training the Model

With ample data, it is time to train the model. The model selected is the Single Shot Detector model. It is selected as it provides the best accuracy to latency ratio as compared to the rest. This model has been pre-trained using the COCO data set and it provides the foundation for the model where it is able to recognise curves and edges, making it possible for developers like us to finetune and train the model for our purpose.

The model was trained on Jian Xian’s house desktop GPU, however he had to tweak the training size and batch due to hardware limitations such as GPU memory which restricted the speed at which the model could be trained. After several trial and errors, he managed to configure one sufficiently fast for his GPU. After running it for 7 hours, the model has been trained to step 27818 achieving a loss ranging from 0.8 to 1.5. Which has been reduced from a loss of approximately 7. He stopped the process at this point of time as continued training may cause the model to over-fit the data and render the model inaccurate. The image below illustrates the checkpoints recorded by the programme whilst training. Subsequently, Jian Xian was able to produce a crude model for us to test on their app.

Testing Out the New Machine Model

The machine learning model is finally ready after we requested some changes to it. To understand how to properly incorporate it as an app first, Wei Bin downloaded the official Tensorflow demo app from github. We loaded in our model and replaced theirs but we started to hit multiple bugs along the way. The ML model we initially used had a different setting from theirs and caused our file to be 4x the size of their model. Thus we changed the settings we used to export our ML model and we are one step closer (determined by a different error code). After that we faced the problem of insufficient memory buffer and had to redefine parts of the code to find the problem. After a long round of googling(there are multiple reasons why this problem can arise and multiple ways to define the memory buffer), we changed on of the settings (quantised =false) to get 4x memory buffer and this gave us the correct memory size. With that out of the way, all that is left is to look into how to merge the 2 apps together.

Designing and Tweaking Object Assets on Autodesk Maya

With the switch from Unity to Autodesk Maya, it was time to learn the mechanics of the new software. Unfortunately, though both Autodesk Maya and Autodesk Fusion 360 are inherently software created by the same company, of which the latter we have some experience owing to the fact that we attended a 3D printing workshop in Year 1 Semester 1, their similarities pretty much end there, for both are catered for extremely different purposes; Fusion 360 is mainly for designing prototypes (basically a watered-down version of 3DS Max), while Maya can be used for 3D modelling and animations. Even though Choy Boy had some prior experience with regards to animations (using software such as Adobe AfterEffects), he had some difficulty navigating through the different controls.

We have also purchased our model of the smartphone we’re working on. The model initially looked like this (lighting on centre-right of phone):

Although it looks quite similar to the physical smartphone we had, there was a problem: the screen was too white. Thus, he had to change the material of the screen using Hypershade:

Here, a black colour was used and the diffuse property was reduced drastically (to 0.136 in this picture). Thus, the phone now looked more accurate:

As the model did not come with the card tray, the pin to remove the card tray and a NanoSIM, he had to model those from scratch. Modelling via Fusion 360 would have much easier, however the final choice was to use Maya as the models designed in the latent program could be instantly used for animations.

The final models looked like this:

Week 3: 10th June – 16th June

Data Collection

After understanding how Tensorflow works, Jian Xian began to extract images for training. To obtain acceptable results, approximately 500 labelled images were required. To obtain the images, Jian Xian took a video of the smartphone from multiple angles and light intensity. To be certain that the model will work, the video contains segments where the object is obfuscated. In addition, the camera’s field of view contains other objects which the model needs to identify to increase the robustness of the model.

The frames were then extracted using ffmpeg, a powerful multimedia software on Ubuntu. Subsequently, suitable frames which are clear were chosen and they were labelled using an LabelImg software. A screen shot illustrating the labelling process is displayed below, which identified the presence of an oppoR9 and a cardslot. This is then repeated for another few hundred images.

Linking Image Buttons

After struggling to ensure images look correct on image buttons, Wei Bin also encountered another problem; the designation of the purpose of pressing a button.

Although it sounds simple, getting this button to redirect to another page upon being tapped, had no clear way of doing so, thus he had to go back to googling. He found two methods of detecting a click (tap), setting up a onClickListener or giving the button an attribute :onclick, and chose to use :onclick as I found it more elegant and less code needed.

After that, he found out how to switch activities on the push of a button. By learning that android studio runs activities based on something know as “intent” and a quick search, it  allowed us to write the code needed to switch between screens. This allowed for the button to be cause the switch to another screen. With functioning buttons and multiple screens, majority of the front page was completed.

Week 2: 3rd June – 9th June

Designing the Front Page using XML Coding

As we waited for the machine learning models to be ready, the Android Studio team decided to work on the User Interface first. Although android studio provides a drag and drop tool to assist in designing pages (strictly speaking, activities), Wei Bin had ran into some problems whilst using them. The alignment became a huge mess as everything is aligned to the top right. After quite a bit of googling and cycling between different layouts (linear layout, relative layout…), we managed to find a solution by attaching each object to the bottom of each other and inputting blank spaces to widen elements if necessary. Coding for image buttons in the activities also came up with unexpected issues. As the boxes are much smaller than the original images, we had to play with some of the values such as padding to make sure that the image is not too zoomed in.

Getting Acquainted with Machine Learning

The next step forward is to understand the basics of machine learning, the different kinds of machine algorithms used and the type of algorithms suitable for object detection. The popular models used are Single Shot Detection (SSD), You Only Look Once (YOLO) and Faster RCNNs (Regions with Convoluted Neural Networks). While Jian Xian attempted to understand the basics of the models, it proved to be very difficult as these models are highly mathematical and it is not practical to understand each model. Hence, he opted to understand the overarching tensorflow process from data labelling to pre-training and fine tuning. In addition, he had to read up on important configurable parameters such as learning rate, quantisation, decay factors etc.

By reading up on the topic, he managed to understand how machine learning works on the surface, just enough for him to configure the parameters needed.

 

Week 1: 27th May – 2nd June

First Steps into Augmented Reality

The first step is usually the toughest but most important one. After we have decided on building an Augmented Reality Instructional Manual, we set out to build a basic AR app. This basic app would only be able to place an object on the screen and would serve as a template for us to add our own features to.

As Wei Bin was in charge of the development of the app, he went through CodeAcademy’s guide to Java so he would be able to better understand the language that will be used to code our app. Then, as he had no prior knowledge in app development and AR, he looked up online tutorials on how to build a basic android AR app using ARCore and Android studio. However, as those guides seem to be outdated, he faced multiple bugs along the way and the app would not even compile properly. After a long 2 days of googling and looking at online tutorial for bug fixes, he finally manage to get a solution and got the AR app to work properly.

Initial Designs

Choy Boy created some designs on how the app might look like:

Introduction to Machine Learning

The learning curve for machine learning is inherently very steep as it requires a good background in Python, of which Tensorflow is written on. Tensorflow is a Software Development Kit (SDK) for machine learning offering hundreds of configurations where developers can tweak based on their project. Machine learning is used from audio translation to natural language processing to object detection.

For the purpose of this project, as Jian Xian is in charge of the machine learning, he focused on object detection where he needed the app to know that the object is in the field of view. To begin, he had to setup my development environment in Ubuntu as it is much more convenient to program in a Unix system. After a few days, he had set up the environment and had begun reading on the process of machine learning.