Death of a Good LiPo

It’s a sad day today.

We went flying, as usual, at our usual spot. However, I made a fatal mistake regarding where I positioned myself.

Although we were told by Kanesh to not allow the plane to fly behind us, the reality is that weather conditions and pilot competence is not perfect.

As I was flying several rounds around the field, the wind picked up, causing the plane to fly directly over my head. Because I was standing right beside the road lined with trees, this meant that my view of the plane ended up being blocked by the foilage.

Because the Skysurfer had a tendency to roll to the left, I would normally hold the aileron stick to the right to correct this. With the plane being out of direct line of sight (and also because the white plane was flying against white sky), I lost orientation and let go of the stick, causing the Skysurfer to roll over and enter a nosedive at nearly full throttle.

The good news was that the plane missed the trees and flew between openings in the canopy, so that spared us the effort of rushing to get bamboo poles to try to recover the plane.

The bad news was that the plane crashed nose first into Old Holland Road, a proper paved driveway made of rock solid asphalt. This caused the foam at the nose area to disintegrate, crushing the battery inside. It also destroyed the FPV camera (cracked and dislodged the lens from the PCB)

It is truly sad that because of my incompetence, a brand new battery (that is famous in the drone racing scene)was destroyed. On the bright side, no servos were damaged. All the nose needed was a bit more fibre tape and it was back in action with a new battery.

Skysurfer Returns

Over the past few day as I did my revision during recess week, I could not help but think about the plane’s inability to fly on monday. Sure, it was beaten up and the aircraft’s nose has already been compressed inwards, but this could not have contributed to it not being able to fly afterwards. After all, past experience with the plane taught me that all it takes is balancing of the CG before the plane is airworthy again.

To verify that what caused the plane to crash was actually the CG, I went out to old holland road again this morning(1st October 2020). The wind at the time of my arrival was calm, perfect for flying. However, rain was imminent in the distance (buildings started disappearing) and that caused the wind to pick up speed again. I had to act fast and got to work as soon as I could.

The first few flights involved throwing the plane without throttle to see if will glide. Once I was satisfied, real tests began.

The first powered flight was rather shaky. Using a lighter 1500mAh battery meant that the aircraft was tail heavy and I had quite a challenge controlling the plane. Note how in the onboard camera footage, the plane had a tendency to pitch up, and a hard landing caused the camera to be ejected from the canopy (no damage though):

However, it wasn’t long before the 1500mAh battery was expended (somehow, running the motor also caused quite a bit of voltage sag), leaving me with only the heavier 2200mah battery to fly with. That being said, it balanced out the CG perfectly!

I intended to stay for a while longer since the first storm in front of me had blown away (and MSS’s lightning website did not report any lightning activity) but a bigger storm behind me was brewing and I started hearing thunder in the distance (upon checking the MSS website,  massive lightning activity was reported around the Tuas area), so I had to pack up and call it a day.

Rise…and Fall of the the Skysurfer

This session marks the third flight of our original Skysurfer. As it had been already tested and trimmed in previous sessions, we were expecting to do 1 or 2 warm up flights before mounting the node for tests.Unfortunately, Singapore being just 1 degree north of the equator means that weather conditions vary greatly from day to day. Below was footage of the second flight of the day that ended in a crash

The crash here was entirely due to pilot error: as the white plane was flying against white backdrop of clouds, the pilot over-banked the plane during a turn, resulting in it going inverted. Not realising this, he pushed the elevator forward with the intention of gaining altitude but that caused the plane to nose down instead because of its inversion, hence leading to a crash.

Note that the plane was also rather wobbly throughout the flight. This was because the last time the pilot flew was about 3 weeks prior and had zero stick time on the simulator due to school commitments.

As this crash occurred at significant speed (evidenced by the loud thud), it broke the plastic gears of the elevator and rudder servos, causing quite a bit of downtime as we replaced them (we brought spares, thankfully) and taped up broken pieces of foam.

Second Attempt…and Second Crash

After patching up the plane and waiting out a passing drizzle, we were ready for another go. We noticed this time that the plane’s CG had shifted: it could no longer fly with the battery right up in the nose as this would lead to a 737MAX nose down at launch. We suspect that this might be due to the additional layers of fibre tape used to hold the nose together after the first crash.

Shifting the battery eventually got the plane to a flyable state. Here is some onboard FPV footage(please excuse the choppiness; it was downscaled from 60 to 30fps):

For reasons unknown, the onboard camera stopped recording before the crash occurred.

Here is footage from the ground:

From the footage, it seems as though this crash was caused by the battery shifting forward mid flight, throwing the plane’s centre of gravity off balance. We deduced this to be the likely cause because as the plane pitched down towards the end of the video, the pilot reported having zero control despite having the control stick pulled all the way back.

After this, the plane no longer could fly as well as before. besides the CG being off, the wind was also picking up speed, which led to it being significantly more difficult to control the plane in flight. This, coupled with the fact that on the field for quite some time already (around 3 hours), we decided to call it a day, having about 2 out of 4 batteries left untouched.

With that, we strapped the collector node to a quadcopter belonging to Kanesh since the plane was no longer flyable. Below are two graphs – the first is of the data actually saved to the SD card of the ground node. The second is of the data transmitted from the ground node to the collector node (on a UAV). It worked as expected (after dealing with a timezone display issue where the timezone was assumed to be GMT+0730) – the sampled data matched the collected data!

Data from Ground Node

Data from Collector

Pixhawk Riding on Skysurfer

Due to space constraint in the fuselage of the sky surfer, we are left with no choice but to mount the pixhawk onto the canopy of the sky surfer.

Pixhawk Mount 1.0

Initially we attempt to mount the pixhawk on to the existing foam canopy provided

Pixhawk on Foam Canopy

However, there are more electronic component that have to be attached onto plane along with the pixhawk, for e.g, safety switch, buzzer, GPS module, telemetry etc.

Therefore, there is a need for us to redesign the canopy to accommodate these various electronics.

Pixhawk Mount 2.0

We did not have the CAD model of the canopy of the sky surfer, hence we have to manually measure the current canopy and model it out on Autodesk Inventor. This new design consist of a platform to mount the GPS module, a hole for the safety switch, Micro Usb connection and a cut out to allow auxiliary connectors to be connected.

Pixhawk Mount 1.0

Subsequently, we realised that having the pixhawk expose during flight, the air pressure in the cavity maybe affect the barometer reading in the pixhawk, hence we have to find a solution to cover up the canopy.

Pixhawk Mount 3.0

In this new design, we offset backing by another 10 degrees for a better fit. Next we added a 3 mounting holes for the FPV, and a few cut outs for wirings.

(my Autodesk Inventor decide to die on me, so i had to use fusion 360 for speed prototyping)

Design Improvisation

Printing in Process….

Ready to Takeoff

Sampling

We’ve set the following objectives to guide us in the design of the actual data logging stage:

  1. Samples should be timestamped with absolute time
  2. Different nodes should start/take each sample at the same time (within a certain degree of error)

As such, to simplify setup and maintain accuracy of sample times over a long period of time, we decided to attach a GPS module to the board.

Additionally, we also identified the following methods we can use to save power:

  1. Power off the GPS/SD Card when not in use
    1. Power on the SD Card when necessary:
      1. Client attempts to read file
      2. Samples have to be written to a file
    2. Power on the GPS at a fixed interval, say, every hour, synchronise time, then power off again
  2. Power on the transceiver on the ESP32 at a low duty cycle eg for 1s every minute
  3. Put the ESP32 into light-sleep mode between samples

With these in mind, our firmware does the following for sampling:

  1. Wait until a rising edge is seen on the PPS signal from the GPS module
  2. Configure the Ultra-Low Power Processor on the ESP32 to wake then halt immediately at a fixed period. This is for two reasons:
    1. A wake fires a ISR on the main cores which will then take a sample from the sensor
    2. This wake-halt sequence can be clocked by the 32kHz crystal for the RTC (the main crystal is disabled in light-sleep mode)
  3. Samples are buffered to reduce the number of write operations needed, making use of the 1MB of PSRAM on the WROVER module

Samples are buffered into two alternating buffers (ie samples are written to one until full. While the first buffer is written to file, samples are written to the second buffer). Buffers are prefixed with the following header:

  1. HDR as a header marker (3 bytes)
  2. Number of microseconds since Unix Epoch (uint64_t, 8 bytes)
  3. Size of one sample in bytes (1 byte)
  4. Number of samples following this header (uint32_t, 4 bytes)

The header and buffer are then written to the SD card. A crude version of the workflow described above has been implemented. However, multiple issues still remain:

  1. The RTC is still clocked by the internal 150kHz RC oscillator – when we configure the RTC to use the 32kHz crystal, ISRs are fired much faster than expected (eg the ISR fires at 50Hz when clocked by the internal oscillator, but fires at 250Hz when clocked by the 32kHz crystal). This appears to be a software issue – we’ve verified that the crystal is presenting a 32.768kHz signal to the ESp32 by probing with an oscilloscope.
  2. An ISR was configured for rising edges on the PPS signal. The ISR logs the current absolute time (in us) before updating the system time (based on the PPS info). The output is as follows:
I (497416) logging_task: pulse: 1600768700999984
I (498416) logging_task: pulse: 1600768701999956
I (499416) logging_task: pulse: 1600768702999980
I (500416) logging_task: pulse: 1600768703999980
I (501416) logging_task: pulse: 1600768704999987
I (502416) logging_task: pulse: 1600768705999972
I (503416) logging_task: pulse: 1600768706999987
I (504416) logging_task: pulse: 1600768707999974
I (505416) logging_task: pulse: 1600768708999980
I (506416) logging_task: pulse: 1600768709999972
I (507416) logging_task: pulse: 1600768710999980
I (508416) logging_task: pulse: 1600768711999975
I (509416) logging_task: pulse: 1600768712999972
I (510416) logging_task: pulse: 1600768713999980
I (511416) logging_task: pulse: 1600768714999980
I (512416) logging_task: pulse: 1600768716000006
I (513416) logging_task: pulse: 1600768716999954
I (514416) logging_task: pulse: 1600768717999980
I (515416) logging_task: pulse: 1600768718999988
I (516416) logging_task: pulse: 1600768719999972

The time logged varies substantially from line to line (even though the system time is slower, we still expected a consistent difference). A quick cursory examination here suggests that the system time is about 20us slower than the time reported by the GPS module for every second that passes. Over an hour, this results in a difference of 72ms (this presents issues if we only power on the GPS module for time syncing once in a while). We’ve came up with a few possible reasons for this:

  1. Inaccuracies in the main crystal/internal RC oscillator (due to the issue discussed above)
  2. Interrupt latency (can it really vary that much?)
    1. This was discussed here, however, switching to a FPGA or another microcontroller is not possible at this point in the project
  3. The PPS signal that our GPS module outputs is not accurate enough. The datasheet specifies a deviation of less than 1us, if the module performs to spec, this cannot be the cause.

More work is needed to resolve these issues. However, even with these limitations in mind, it should be possible to present a full proof-of-concept of what we set out to do.

Power Optimisation

We measured 15mA of quiescent current draw on the 5V rail of the sensor (with the ESP32 held in reset). We traced it to the following:

  • 2x power indicator LEDs – approximately 2.6mA each, totalling 5.2mA
  • 2x AZ1117C voltage regulators – datasheet specifies 4mA (typ) to 6mA (max) of quiescent current draw.

With this in mind, we swapped the AZ1117C to AP2114, which have a maximum quiescent current of 90uA.

Masking off other components before removing regulator with hot air

Getting a crude current estimation from the power supply

We should be able to cut quiescent current down to less than 1mA (when powered from the 5V rail) if we disabled the LEDs. Although quiescent current is only a small part of the overall power budget, the 15mA we were observing is still a fairly large amount of power just wasted.

Code Refactoring

Since we now have multiple nodes which can read/write to SD cards (only one of the hand soldered prototypes worked well, the other had signal integrity issues), we can finally write received data from the client to the SD card.

Current Software Architecture

For the sake of simplicity, handling of received packet was done entirely in the ESP-NOW ISR. This worked fine when MtftpClient did not actually write contents to a file (but rather just logged metadata).

However, once MtftpClient started writing to the SD card, the write operations took a substantial amount of time, leading to severe packet loss (as the ESP-NOW ISR took a long time to exit). The obvious solution is to push received data to a ring buffer, then MtftpClient::loop() reads from this ring buffer and actually writes the data to the SD card. While this works fine, we run into yet another issue here – since the server is transmitting data packets as fast as it can read from the SD card, the client cannot actually write data to the SD card fast enough. This results in the ring buffer overflowing.

Rather than bolting on flow control, we can refactor the client to handle incoming packets differently – all received packets are pushed to a ring buffer, then MtftpClient::loop() parses incoming packets and writes them to the SD card as seen below.

Refactored Architecture

By refactoring the packet handling on the client in this way, we solve two problems – firstly, we no longer perform long operations in the ISR. Secondly, if we make the ring buffer larger than the maximum size of the data transferred in a window (plus a bit of overhead for the ring buffer itself to manage the items), we can now buffer an entire window worth of packets. MtftpClient::loop() will only send an ACK once an entire window has been processed successfully, which will only happen once all the writes are done.

This also doubles up as flow control:

  1. The server will send an entire window at once
  2. The client buffers (and requests retransmission of missing blocks if necessary) the entire window in memory
  3. The client slowly reads from the ring buffer and writes data to the SD card
  4. The client acknowledges the window once all data has been processed
  5. The server continues transmission (ie go back to Step 1)

With this implemented, we can fully test the communication between the client and server. A test with 2 nodes next to each other transferring a 1Mb file was successful, the transferred file has been confirmed to be the same by hashing it and comparing that to the original file.

However, in a realistic environment, there would be interference from the environment (whether from Wi-Fi networks, or even from our UAV remote control equipment), which will cause packets to be lost (but not corrupted, the frame check sequence ensures that data is valid). This presents problems especially when key signaling packets (eg ACK) are lost, and one side is left waiting forever for a packet that was already lost. Although the current time-out functionality will handle these cases, relying entirely on the time-out to handle missing packets affects throughput severely. Certain lost packets (eg the server’s SYNC response) can be detected by the other party and handled immediately without having to wait for the timeout.

Identifying these cases and handling them separately should improve throughput.

Separately, multiple bugs were fixed:

  1. Race condition between onTimeout call and loop
  2. The counter used to track buffered ESP-NOW packets was observed to underflow then lock-up the entire system since it never fell below MAX_BUFFERED_TX.  This was fixed by using a counting Semaphore instead of just a variable.

On our most recent test flight last Sunday, we strapped the collector to a UAV and monitored the ground node. Although none of the nodes locked up this time, the data written to the SD card by the collector was corrupted. More debugging is needed to identify and fix this issue.

Edit: The data corruption observed in data written by the collector has been fixed. file_offset is incremented after every write of an in-order, non-buffered block, but writing the buffer to file after handling missing blocks did not increment file_offset.

Shoddy Soldering Joints

Testing of FPV monitors usually involve a great deal of plugging and unplugging of power cables.

Such actions usually wear out components faster than expected, thus exposing weak points. One such weak point was discovered on the XT60 power connector of the FPV monitor which we took from the seniors:


Notice how the solder joint came off cleanly off the connector. This implies that whoever did the soldering for this was not very experienced. The temperature was hot enough for the solder to soak into the wire, but not hot enough for it to adhere to the XT60 connector. What’s more, since this fault was hidden under a layer of heatshrink, it would be impossible to spot until it decides to give way (which thankfully, happened in the lab and not while we’re flying)!

I proceeded to resolder the connector and all has been well ever since.

PCB Assembly

Since we made use of leadless and fine-pitch components in our node design, we decided to try out using solder paste and reflowing the entire board with a hot plate.

Target PCB (Center) Surrounded by Scrap Boards

We taped down scrap PCBs around our PCB to ensure that it cannot shift around. The solder paste stencil was then taped to the surface on one end after aligning it with the pads on the PCB.

The boards were then populated by hand:

Fully Populated Board Before Reflow

FPV Camera Tests

Though not critical to the success of our project, one of the functions we were looking to incorporate was the ability to have a live first person video (FPV) view of the airplane.

To achieve this, we require 4 main components:

  1. Camera (airside)
  2. Video Transmitter (airside)
  3. Video receiver (landside)
  4. Monitor (landside)

Note: The term airside implies that components labelled as such will be on the plane in the air while those marked as landside will remain on the ground at all times.

After doing some research and banking on our previous RC knowledge, we came up with the following list of items to buy from Taobao:

Camera: TOP RC Spotter v2 5.8GHz VTx + Camera

Our camera of choice!

Weighing just 9g, this camera is perfect for a plane with little cargo space like our Skysurfer.

To us, the main selling point was the fact that it was an integrated system: a camera and video transmitter with antenna all in one unit!

Besides saving space, this also means that the wiring job we have to do in order to get the live video feed working is reduced to just providing power to the unit (no need to worry about operating voltage differences between transmitter and camera and figuring out how to manage the rat’s nest of wires between the two aforementioned components)

We opted for the version with the cloverleaf antenna(that mushroom thing sticking out in the picture above) to reduce the likelihood of multipath interference.

Video Transmitter: ….??

As mentioned above, the video transmitter is integrated with the camera, so one less thing to worry about. Next.

Video Receiver: Eachine TS832

Our video receiver of choice

There was no specific reason as to why we chose this model over others, other than the fact that it just works.

Anyway, shortly I began my first tests with the camera, I started wishing that I had gotten a receiver with more frills. More on that later…

Monitor: Taken from MnT Lab

We did not buy a monitor off Taobao as during our first few visits to the lab, we realised that previous batches of PS9888 students bought a handful of monitors, together with spares that were never opened. So in order to save some money, we opted to use those hand-me-downs instead.

The requirement here was simple: all we needed was a monitor that will not display a blue screen (but rather static snow) on signal loss. This is because monitors that blue screen cut out completely when video signal dips below a certain level. As the likelihood of this happening when flying at distance is high, we need a screen that shows snow instead as these will still display a partial picture, giving us a fighting chance at flying our precious model back to safety.

Blue screen bad!

Snow good!

Functionality Test…

After plugging everything in, I realised that there was one simple(but rather serious) problem: I had no idea what channel my glorious video signal was being beamed on! There are tens of channels on the 5.8GHz spectrum so it was quite unrealistic and impractical to cycle through them one by one. It’s quite unfortunate that the RC832 unit we bought did not have auto scanning capabilities.

Seniors to the rescue!

Until I walked to the area behind the MnT lab and noticed sitting in this box this receiver:

“Auto Scan” 

Wasting no time, I set it up immediately and in no time, it automatically tuned to the correct channel, and was even kind enough to tell me what channel it was on!

Press the CH and FR buttons simultaneously and hey presto! Auto scan initialises!

Knowing the exact channel of the FPV transmitter, I could now tune our RC832 using the lookup table provided by the manufacturer.

Everything works but…

Knowing that everything worked was a  good sign. However, the lack of an auto scanning function mean that we had to use the Quanum receiver bought by our seniors. However, another problem ensued: the FPV camera/transmitter we purchased came only with a cloverleaf antenna soldered on, whereas the Quanum Receiver only came with inferior rubber ducky antennas. It’s not ideal mixing both types of antennas as that would reduce our video link range significantly.

FPV Camera with antenna soldered on

Although we bought extra cloverleaf antennas, we intended for them to be used with the RC832 only – which used RP-SMA connectors. The Quanum receiver uses SMA connectors. Bummer!

SMA on the seniors’ receiver

RP-SMA on the receiver we bought

Although leaving everything on one channel and not touching it afterwards would be the ideal scenario, this is unrealistic as we would be doing weekend flying and Old Holland Road – one of Singapore’s few designated drone flying zones – meaning that the area will be jam packed with other hobbyists doing FPV flying. It is quite likely that we will have to change frequencies in order to avoid interference (and crashing as a result)

With that, we figured that the best compromise to bring both receivers: the Quanum to verify the channel that the camera is on, and then tune to that frequency on the RC832.