Rise…and Fall of the the Skysurfer

This session marks the third flight of our original Skysurfer. As it had been already tested and trimmed in previous sessions, we were expecting to do 1 or 2 warm up flights before mounting the node for tests.Unfortunately, Singapore being just 1 degree north of the equator means that weather conditions vary greatly from day to day. Below was footage of the second flight of the day that ended in a crash

The crash here was entirely due to pilot error: as the white plane was flying against white backdrop of clouds, the pilot over-banked the plane during a turn, resulting in it going inverted. Not realising this, he pushed the elevator forward with the intention of gaining altitude but that caused the plane to nose down instead because of its inversion, hence leading to a crash.

Note that the plane was also rather wobbly throughout the flight. This was because the last time the pilot flew was about 3 weeks prior and had zero stick time on the simulator due to school commitments.

As this crash occurred at significant speed (evidenced by the loud thud), it broke the plastic gears of the elevator and rudder servos, causing quite a bit of downtime as we replaced them (we brought spares, thankfully) and taped up broken pieces of foam.

Second Attempt…and Second Crash

After patching up the plane and waiting out a passing drizzle, we were ready for another go. We noticed this time that the plane’s CG had shifted: it could no longer fly with the battery right up in the nose as this would lead to a 737MAX nose down at launch. We suspect that this might be due to the additional layers of fibre tape used to hold the nose together after the first crash.

Shifting the battery eventually got the plane to a flyable state. Here is some onboard FPV footage(please excuse the choppiness; it was downscaled from 60 to 30fps):

For reasons unknown, the onboard camera stopped recording before the crash occurred.

Here is footage from the ground:

From the footage, it seems as though this crash was caused by the battery shifting forward mid flight, throwing the plane’s centre of gravity off balance. We deduced this to be the likely cause because as the plane pitched down towards the end of the video, the pilot reported having zero control despite having the control stick pulled all the way back.

After this, the plane no longer could fly as well as before. besides the CG being off, the wind was also picking up speed, which led to it being significantly more difficult to control the plane in flight. This, coupled with the fact that on the field for quite some time already (around 3 hours), we decided to call it a day, having about 2 out of 4 batteries left untouched.

With that, we strapped the collector node to a quadcopter belonging to Kanesh since the plane was no longer flyable. Below are two graphs – the first is of the data actually saved to the SD card of the ground node. The second is of the data transmitted from the ground node to the collector node (on a UAV). It worked as expected (after dealing with a timezone display issue where the timezone was assumed to be GMT+0730) – the sampled data matched the collected data!

Data from Ground Node

Data from Collector

Sampling

We’ve set the following objectives to guide us in the design of the actual data logging stage:

  1. Samples should be timestamped with absolute time
  2. Different nodes should start/take each sample at the same time (within a certain degree of error)

As such, to simplify setup and maintain accuracy of sample times over a long period of time, we decided to attach a GPS module to the board.

Additionally, we also identified the following methods we can use to save power:

  1. Power off the GPS/SD Card when not in use
    1. Power on the SD Card when necessary:
      1. Client attempts to read file
      2. Samples have to be written to a file
    2. Power on the GPS at a fixed interval, say, every hour, synchronise time, then power off again
  2. Power on the transceiver on the ESP32 at a low duty cycle eg for 1s every minute
  3. Put the ESP32 into light-sleep mode between samples

With these in mind, our firmware does the following for sampling:

  1. Wait until a rising edge is seen on the PPS signal from the GPS module
  2. Configure the Ultra-Low Power Processor on the ESP32 to wake then halt immediately at a fixed period. This is for two reasons:
    1. A wake fires a ISR on the main cores which will then take a sample from the sensor
    2. This wake-halt sequence can be clocked by the 32kHz crystal for the RTC (the main crystal is disabled in light-sleep mode)
  3. Samples are buffered to reduce the number of write operations needed, making use of the 1MB of PSRAM on the WROVER module

Samples are buffered into two alternating buffers (ie samples are written to one until full. While the first buffer is written to file, samples are written to the second buffer). Buffers are prefixed with the following header:

  1. HDR as a header marker (3 bytes)
  2. Number of microseconds since Unix Epoch (uint64_t, 8 bytes)
  3. Size of one sample in bytes (1 byte)
  4. Number of samples following this header (uint32_t, 4 bytes)

The header and buffer are then written to the SD card. A crude version of the workflow described above has been implemented. However, multiple issues still remain:

  1. The RTC is still clocked by the internal 150kHz RC oscillator – when we configure the RTC to use the 32kHz crystal, ISRs are fired much faster than expected (eg the ISR fires at 50Hz when clocked by the internal oscillator, but fires at 250Hz when clocked by the 32kHz crystal). This appears to be a software issue – we’ve verified that the crystal is presenting a 32.768kHz signal to the ESp32 by probing with an oscilloscope.
  2. An ISR was configured for rising edges on the PPS signal. The ISR logs the current absolute time (in us) before updating the system time (based on the PPS info). The output is as follows:
I (497416) logging_task: pulse: 1600768700999984
I (498416) logging_task: pulse: 1600768701999956
I (499416) logging_task: pulse: 1600768702999980
I (500416) logging_task: pulse: 1600768703999980
I (501416) logging_task: pulse: 1600768704999987
I (502416) logging_task: pulse: 1600768705999972
I (503416) logging_task: pulse: 1600768706999987
I (504416) logging_task: pulse: 1600768707999974
I (505416) logging_task: pulse: 1600768708999980
I (506416) logging_task: pulse: 1600768709999972
I (507416) logging_task: pulse: 1600768710999980
I (508416) logging_task: pulse: 1600768711999975
I (509416) logging_task: pulse: 1600768712999972
I (510416) logging_task: pulse: 1600768713999980
I (511416) logging_task: pulse: 1600768714999980
I (512416) logging_task: pulse: 1600768716000006
I (513416) logging_task: pulse: 1600768716999954
I (514416) logging_task: pulse: 1600768717999980
I (515416) logging_task: pulse: 1600768718999988
I (516416) logging_task: pulse: 1600768719999972

The time logged varies substantially from line to line (even though the system time is slower, we still expected a consistent difference). A quick cursory examination here suggests that the system time is about 20us slower than the time reported by the GPS module for every second that passes. Over an hour, this results in a difference of 72ms (this presents issues if we only power on the GPS module for time syncing once in a while). We’ve came up with a few possible reasons for this:

  1. Inaccuracies in the main crystal/internal RC oscillator (due to the issue discussed above)
  2. Interrupt latency (can it really vary that much?)
    1. This was discussed here, however, switching to a FPGA or another microcontroller is not possible at this point in the project
  3. The PPS signal that our GPS module outputs is not accurate enough. The datasheet specifies a deviation of less than 1us, if the module performs to spec, this cannot be the cause.

More work is needed to resolve these issues. However, even with these limitations in mind, it should be possible to present a full proof-of-concept of what we set out to do.

Power Optimisation

We measured 15mA of quiescent current draw on the 5V rail of the sensor (with the ESP32 held in reset). We traced it to the following:

  • 2x power indicator LEDs – approximately 2.6mA each, totalling 5.2mA
  • 2x AZ1117C voltage regulators – datasheet specifies 4mA (typ) to 6mA (max) of quiescent current draw.

With this in mind, we swapped the AZ1117C to AP2114, which have a maximum quiescent current of 90uA.

Masking off other components before removing regulator with hot air

Getting a crude current estimation from the power supply

We should be able to cut quiescent current down to less than 1mA (when powered from the 5V rail) if we disabled the LEDs. Although quiescent current is only a small part of the overall power budget, the 15mA we were observing is still a fairly large amount of power just wasted.

Code Refactoring

Since we now have multiple nodes which can read/write to SD cards (only one of the hand soldered prototypes worked well, the other had signal integrity issues), we can finally write received data from the client to the SD card.

Current Software Architecture

For the sake of simplicity, handling of received packet was done entirely in the ESP-NOW ISR. This worked fine when MtftpClient did not actually write contents to a file (but rather just logged metadata).

However, once MtftpClient started writing to the SD card, the write operations took a substantial amount of time, leading to severe packet loss (as the ESP-NOW ISR took a long time to exit). The obvious solution is to push received data to a ring buffer, then MtftpClient::loop() reads from this ring buffer and actually writes the data to the SD card. While this works fine, we run into yet another issue here – since the server is transmitting data packets as fast as it can read from the SD card, the client cannot actually write data to the SD card fast enough. This results in the ring buffer overflowing.

Rather than bolting on flow control, we can refactor the client to handle incoming packets differently – all received packets are pushed to a ring buffer, then MtftpClient::loop() parses incoming packets and writes them to the SD card as seen below.

Refactored Architecture

By refactoring the packet handling on the client in this way, we solve two problems – firstly, we no longer perform long operations in the ISR. Secondly, if we make the ring buffer larger than the maximum size of the data transferred in a window (plus a bit of overhead for the ring buffer itself to manage the items), we can now buffer an entire window worth of packets. MtftpClient::loop() will only send an ACK once an entire window has been processed successfully, which will only happen once all the writes are done.

This also doubles up as flow control:

  1. The server will send an entire window at once
  2. The client buffers (and requests retransmission of missing blocks if necessary) the entire window in memory
  3. The client slowly reads from the ring buffer and writes data to the SD card
  4. The client acknowledges the window once all data has been processed
  5. The server continues transmission (ie go back to Step 1)

With this implemented, we can fully test the communication between the client and server. A test with 2 nodes next to each other transferring a 1Mb file was successful, the transferred file has been confirmed to be the same by hashing it and comparing that to the original file.

However, in a realistic environment, there would be interference from the environment (whether from Wi-Fi networks, or even from our UAV remote control equipment), which will cause packets to be lost (but not corrupted, the frame check sequence ensures that data is valid). This presents problems especially when key signaling packets (eg ACK) are lost, and one side is left waiting forever for a packet that was already lost. Although the current time-out functionality will handle these cases, relying entirely on the time-out to handle missing packets affects throughput severely. Certain lost packets (eg the server’s SYNC response) can be detected by the other party and handled immediately without having to wait for the timeout.

Identifying these cases and handling them separately should improve throughput.

Separately, multiple bugs were fixed:

  1. Race condition between onTimeout call and loop
  2. The counter used to track buffered ESP-NOW packets was observed to underflow then lock-up the entire system since it never fell below MAX_BUFFERED_TX.  This was fixed by using a counting Semaphore instead of just a variable.

On our most recent test flight last Sunday, we strapped the collector to a UAV and monitored the ground node. Although none of the nodes locked up this time, the data written to the SD card by the collector was corrupted. More debugging is needed to identify and fix this issue.

Edit: The data corruption observed in data written by the collector has been fixed. file_offset is incremented after every write of an in-order, non-buffered block, but writing the buffer to file after handling missing blocks did not increment file_offset.

PCB Assembly

Since we made use of leadless and fine-pitch components in our node design, we decided to try out using solder paste and reflowing the entire board with a hot plate.

Target PCB (Center) Surrounded by Scrap Boards

We taped down scrap PCBs around our PCB to ensure that it cannot shift around. The solder paste stencil was then taped to the surface on one end after aligning it with the pads on the PCB.

The boards were then populated by hand:

Fully Populated Board Before Reflow

Protocol Optimisations

As mentioned in prior posts, we implemented a protocol based off TFTP, which we call Modified Trivial File Transfer Protocol (mtftp). The diagram above illustrates how it works:

  1. The Client (collector on the UAV) sends a Read Request (RRQ), which specifies what file (file index) and where in the file (file offset) to transmit. A window size is also transmitted which indicates how many data packets the Server should send before waiting for a response.
  2. The Server then sends up to window size DATA packets to the Client (the end of file (EOF) is indicated by sending a partial DATA packet, or zero bytes if the file size is a multiple of block length).
  3. The Client then sends an Acknowledge (ACK) packet with the block number of the largest block it successfully received.
    1. This allows for correction of missing blocks: for example, if the Client receives block number 0, 1, 2, 4, 5, 6, 7 with a window size of 8, the Client can detect that block 3 is missing, sending an ACK for block 2.
    2. If the Client receives a block of less than 247 bytes, this indicates the end of file. For example, if the client receives block 0, 1, 2, 3, where block 3 contains 200 bytes, it will not expect any further blocks and sends an ACK for block 3.
    3. If the final block in a window (ie block 7 when window size = 8) is missing, the client will never know that the window has ended. In this case, both the client and server will timeout after a short delay, terminating the entire transfer.
  4. If EOF has not been reached, the Server then advances the file offset by the number of bytes acknowledged, then starts a new window with block number 0.

We use the following terms:

  1. Block – A single chunk of data in a DATA packet. The maximum length of a block is fixed to 247 bytes in our application since ESPNOW has a maximum payload length of 250 bytes, and we use 3 bytes (1 for opcode, 2 for block number) as a packet header.
  2. Window – Transmission of up to window size DATA packets before a response is expected from the client.
  3. Transfer – The transfer of multiple Windows to transfer a file until EOF.

Initial tests demonstrates that the above works in close proximity, but when the Client and Server are far from each other in an urban environment (SPMS Atrium), packet loss is non trivial. As mentioned above, if the Client receives block 0, 1, 2, 4, 5, 6, 7, it sends an ACK for block 2 and the Server starts transmitting the data at block 3 again. This is wasteful – the Client has already received data for block 4, 5, 6, 7.


The flowchart on the left illustrates the initial implementation of the mtftp client. Solid lines depict flows involving actual packet transmission. Blocks received in-order are written to disk (block numbers 0 and 1 in the example at the bottom). However, when a block is missing (block number 2), all future blocks are discarded (block number 3, 4, 5, 6, 7, 8) because we cannot save it to file – there is no guarantee we’ll receive the missing block, and even so, we would need a way to mark that block as missing. Rather, the client sends an ACK for block 1 (the last in-order block received) and the server resumes transmission at block 2. This is obviously inefficient especially when the window size is large, say, 32 blocks. If block 1 is lost, 30 other blocks that may be received successfully are discarded.

Our solution to this is to introduce re-transmission – missing block numbers are stored and subsequent blocks are buffered. This is illustrated by the yellow blocks in the flowchart on the right. At the end of the window, the client transmits a RTX packet containing the missing block numbers. The server then re-transmits these blocks, and the client writes the initially missing blocks along with the buffered blocks to disk. This eliminates the issue described above.

However, there is still another issue here – if the RTX or ACK packets (depicted by the thick black lines in the flowchart on the right) are lost, the communication times out. If it was a RTX requesting for one of the earlier blocks in the window, most of the window will be discarded again. A possible fix for this is for the client to re-transmit a RTX or ACK should communication time out in at this portion. However, a quick implementation proved to be buggy and was reverted. We will revisit this issue later as we we would like to implement other functionality first.

Synchronising Sample Time

One of the objectives we have been working towards is ensuring samples from different nodes are time-synchronised (ie sampled at the same point in time). As mentioned here, GPS receivers are one method to obtain accurate timestamps.

The commonly available U-blox Neo-6M GPS receiver outputs a 1 Hz timepulse, which is configurable between 0.25 Hz to 1 kHz. It would be entirely possible to configure the timepulse frequency to match our target sampling frequency, then use an edge of the timepulse to start sampling. However, keeping a GPS receiver constantly powered on has severe implications on battery life.

As presented in Multi-Channel Data Acquisition System with Absolute Time Synchronization:

Measurement of the last (1000th) sample within a given second disables the timer/counter reset (it still increases every clock cycle) and introduces the microcontroller into the awaiting mode (waiting for the next synchronization pulse). When the pulse arrives, the timer/counter value is first read and then it is immediately reset. If the local clock has its nominal frequency (48 MHz), the read value should be equal to C, which corresponds to 1 ms. If it is not, the difference between C and the number of measured cycles is used to adjust C. This technique compensates for possible drifts of the local-oscillator frequency, i.e., it ensures that recorded samples are uniformly distributed over time with 1 ms spacing between them.

We can tweak this slightly by assuming that the drift of our local oscillator is relatively fixed and will not change that rapidly. This allows us to compute C periodically, say, every hour by powering on the GPS receiver, getting a few timepulse edges, then powering off the GPS receiver. The sampling rate until the next synchronisation to GPS time will then be based off the computed value of C.

The use of an ESP32 here complicates things:

  1. Hardware timers are available. Although these timers support periodic mode (ie fire a ISR at a certain interval), they are clocked by the APB clock which may not be stable over time.
  2. A RTC timer is available, which supports the use of an external 32.768 kHz crystal, but there is no support for using this timer to fire interrupts on the main SoC

With these constraints in mind, we propose the following method of synchronising the time at which sensors are sampled:

  1. Start a hardware timer
  2. Wait for at least 2 timepulse edges, store the difference of hardware timer value at each edge as C
    1. Set system (RTC) time based on the timepulse and NMEA sentences (use this to mark time when storing samples)
  3. C represents the number of timer counts (based off APB clock) in a 1 second interval, start a periodic hardware timer with period = C / target sampling frequency
    1. Configure the periodic ISR to start sampling of sensors

A quick experiment suggests that for the particular ESP32-DevKitC that was used, C is approximately 999990:

I (81745) sampling_task: pulse: 81362899
I (82745) sampling_task: pulse: 82362890
I (83745) sampling_task: pulse: 83362880
I (84745) sampling_task: pulse: 84362870
I (85745) sampling_task: pulse: 85362860
I (86745) sampling_task: pulse: 86362850
I (87745) sampling_task: pulse: 87362840
I (88745) sampling_task: pulse: 88362830
I (89745) sampling_task: pulse: 89362820
I (90745) sampling_task: pulse: 90362809
I (91745) sampling_task: pulse: 91362799
I (92745) sampling_task: pulse: 92362789
I (93745) sampling_task: pulse: 93362779
I (94745) sampling_task: pulse: 94362769
I (95745) sampling_task: pulse: 95362759
I (96745) sampling_task: pulse: 96362749
I (97745) sampling_task: pulse: 97362739
I (98745) sampling_task: pulse: 98362729
I (99745) sampling_task: pulse: 99362719

The observed drift of approximately 10 us every second would result in 36 ms of drift over 1 hour, and 864 ms over 24 hours. Given that our target sampling frequency of 80 Hz has a period of 12.5 ms, if uncorrected, this drift would be an issue.

Further testing for a longer period of time is necessary to establish whether the amount of drift is constant (and probably temperature dependent). Exploring the drift of the RTC clock (once our PCBs with a RTC crystal arrive and are assembled) may also provide alternative possibilities for synchronisation.

Speed Optimisations

The below speeds were observed using a SanDisk Ultra 16Gb A1 MicroSD Card on a hand-wired prototyping board with 2Mb files.

The very first iteration of the readFile function was an naive implementation:

  1. Open file pointer to the requested file
  2. Seek to the requested offset
  3. Read requested number of bytes
  4. Close file pointer
  5. Return data

This was slow: it took 11953726 to transfer 1048579 bytes (85.6kbyte/s).

The obvious optimisation that can be made here is to persist the file pointer between calls to readFile:

  1. Check if file pointer is open, if the current file pointer is for the correct file
    1. If not, close the current file pointer and open the requested one
  2. Seek to the requested offset
  3. Read requested number of bytes
  4. Return data

This improved speeds slightly to 124.2kbyte/s.

We can make use of a byte buffer to buffer data from the SD card, allowing us to read in large chunks:

  1. Attempt to read requested number(usually 247) of bytes from the buffer
    1. If successful, return data
  2. Else, read (up to) 8192 bytes from the SD card and write it into the buffer
  3. Read up to requested number of bytes from the buffer again
  4. Return data

This improves speeds to 229.8kbyte/s.

To prevent buffering too many ESP-NOW packets (or Wi-Fi memory allocations start to fail), code was written to track the number of buffered packets. This code issued a 10ms delay if there were too many buffered packets. It was observed that most of the time after the delay, there were 0 buffered packets left. This means that the 10ms delay might be too much, resulting in dead time where no more data was being sent. Removing this delay improved speeds to 338.8kbyte/s.

While writing this post, we came across this comment which suggested using setvbuf to buffer reads. Using this over our home-brewed ring buffer solution improves speeds substantially to 440.7kbyte/s. This isn’t too far from the 549250bytes/s we observed when testing transfers speeds without doing any processing, though we need to test this out in the field to confirm its performance.

File Transfer

Now that we’ve started on the protocol for file transfer, all we have to do is to attach it to ESP-NOW and the SDMMC controller for data transmission and retrieval respectively as illustrated in the diagram below.

We soldered a quick prototype of a SD Card socket connected to a ESP32-DevKitC:

For the software, we add another layer over mtftp to manage sessions: the collector broadcasts a synchronisation packet. The node(s), upon receiving this packet, echos this packet back to the collector and adds the collector as an ESP-NOW peer (necessary for unicast communication between two ESP-NOW devices). Upon receiving the echoed synchronisation packet, the collector also adds the node as an ESP-NOW peer and hands over control to the mtftp handlers.

Output from the node (top) and collector (bottom)

The software discussed above can be found here.

Communication Protocols

In a previous post, we examined the different options we had (Layer 3 in the OSI Model) for transmitting data.

We’ve focused our efforts so far on working with ESP-NOW for a few reasons:

  1. ESP-NOW does not require devices to be connected to the same Wi-Fi network
  2. TCP has overheads (eg handshakes) which we do not fully need.

Working with UDP would allow us to customise the later protocols to suit our needs, but UDP is still built for communication over an IP network.

ESP-NOW suits our use case perfectly:

  1. Devices do not need to be connected to the same Wi-Fi network
  2. Conceptually, it performs the same higher-throughput-no-features role that UDP does
  3. ESP-NOW supports basic encryption
  4. ESP-NOW has built-in error detection
  5. ESP-NOW supports basic re-transmission of packets that are not received

Our application has the following requirements:

  1. Guaranteed, in-order transmission of files
  2. Ability to discover available files on a remote node
  3. Ability to resume transmission from an arbitrary offset in a file

With these in mind, we implemented a modified version of the Trivial File Transfer Protocol (TFTP). TFTP is a very simple protocol used in resource-constrained environments which makes it particularly suitable for our application. Customisations:

Window Size: We implement the Window size option. TFTP’s default behavior of acknowledging every data packet is known to affect throughput. Acknowledging multiple data packets at a time improves throughput.

Binary Packing: TFTP’s use of ASCII strings is inefficient, limiting how much information we can pack into a single packet in certain situations. We opt to identify files by a number as filenames are not important in our application.

File Offsets: Read requests specify not only the file index, but also the offset at which to start transfers at. This will allow resuming of data transfers (eg the UAV goes out of range).

The work-in-progress implementation discussed above can be found here. A test project integrating both client and server together can be found here.