Sampling

We’ve set the following objectives to guide us in the design of the actual data logging stage:

  1. Samples should be timestamped with absolute time
  2. Different nodes should start/take each sample at the same time (within a certain degree of error)

As such, to simplify setup and maintain accuracy of sample times over a long period of time, we decided to attach a GPS module to the board.

Additionally, we also identified the following methods we can use to save power:

  1. Power off the GPS/SD Card when not in use
    1. Power on the SD Card when necessary:
      1. Client attempts to read file
      2. Samples have to be written to a file
    2. Power on the GPS at a fixed interval, say, every hour, synchronise time, then power off again
  2. Power on the transceiver on the ESP32 at a low duty cycle eg for 1s every minute
  3. Put the ESP32 into light-sleep mode between samples

With these in mind, our firmware does the following for sampling:

  1. Wait until a rising edge is seen on the PPS signal from the GPS module
  2. Configure the Ultra-Low Power Processor on the ESP32 to wake then halt immediately at a fixed period. This is for two reasons:
    1. A wake fires a ISR on the main cores which will then take a sample from the sensor
    2. This wake-halt sequence can be clocked by the 32kHz crystal for the RTC (the main crystal is disabled in light-sleep mode)
  3. Samples are buffered to reduce the number of write operations needed, making use of the 1MB of PSRAM on the WROVER module

Samples are buffered into two alternating buffers (ie samples are written to one until full. While the first buffer is written to file, samples are written to the second buffer). Buffers are prefixed with the following header:

  1. HDR as a header marker (3 bytes)
  2. Number of microseconds since Unix Epoch (uint64_t, 8 bytes)
  3. Size of one sample in bytes (1 byte)
  4. Number of samples following this header (uint32_t, 4 bytes)

The header and buffer are then written to the SD card. A crude version of the workflow described above has been implemented. However, multiple issues still remain:

  1. The RTC is still clocked by the internal 150kHz RC oscillator – when we configure the RTC to use the 32kHz crystal, ISRs are fired much faster than expected (eg the ISR fires at 50Hz when clocked by the internal oscillator, but fires at 250Hz when clocked by the 32kHz crystal). This appears to be a software issue – we’ve verified that the crystal is presenting a 32.768kHz signal to the ESp32 by probing with an oscilloscope.
  2. An ISR was configured for rising edges on the PPS signal. The ISR logs the current absolute time (in us) before updating the system time (based on the PPS info). The output is as follows:
I (497416) logging_task: pulse: 1600768700999984
I (498416) logging_task: pulse: 1600768701999956
I (499416) logging_task: pulse: 1600768702999980
I (500416) logging_task: pulse: 1600768703999980
I (501416) logging_task: pulse: 1600768704999987
I (502416) logging_task: pulse: 1600768705999972
I (503416) logging_task: pulse: 1600768706999987
I (504416) logging_task: pulse: 1600768707999974
I (505416) logging_task: pulse: 1600768708999980
I (506416) logging_task: pulse: 1600768709999972
I (507416) logging_task: pulse: 1600768710999980
I (508416) logging_task: pulse: 1600768711999975
I (509416) logging_task: pulse: 1600768712999972
I (510416) logging_task: pulse: 1600768713999980
I (511416) logging_task: pulse: 1600768714999980
I (512416) logging_task: pulse: 1600768716000006
I (513416) logging_task: pulse: 1600768716999954
I (514416) logging_task: pulse: 1600768717999980
I (515416) logging_task: pulse: 1600768718999988
I (516416) logging_task: pulse: 1600768719999972

The time logged varies substantially from line to line (even though the system time is slower, we still expected a consistent difference). A quick cursory examination here suggests that the system time is about 20us slower than the time reported by the GPS module for every second that passes. Over an hour, this results in a difference of 72ms (this presents issues if we only power on the GPS module for time syncing once in a while). We’ve came up with a few possible reasons for this:

  1. Inaccuracies in the main crystal/internal RC oscillator (due to the issue discussed above)
  2. Interrupt latency (can it really vary that much?)
    1. This was discussed here, however, switching to a FPGA or another microcontroller is not possible at this point in the project
  3. The PPS signal that our GPS module outputs is not accurate enough. The datasheet specifies a deviation of less than 1us, if the module performs to spec, this cannot be the cause.

More work is needed to resolve these issues. However, even with these limitations in mind, it should be possible to present a full proof-of-concept of what we set out to do.

Code Refactoring

Since we now have multiple nodes which can read/write to SD cards (only one of the hand soldered prototypes worked well, the other had signal integrity issues), we can finally write received data from the client to the SD card.

Current Software Architecture

For the sake of simplicity, handling of received packet was done entirely in the ESP-NOW ISR. This worked fine when MtftpClient did not actually write contents to a file (but rather just logged metadata).

However, once MtftpClient started writing to the SD card, the write operations took a substantial amount of time, leading to severe packet loss (as the ESP-NOW ISR took a long time to exit). The obvious solution is to push received data to a ring buffer, then MtftpClient::loop() reads from this ring buffer and actually writes the data to the SD card. While this works fine, we run into yet another issue here – since the server is transmitting data packets as fast as it can read from the SD card, the client cannot actually write data to the SD card fast enough. This results in the ring buffer overflowing.

Rather than bolting on flow control, we can refactor the client to handle incoming packets differently – all received packets are pushed to a ring buffer, then MtftpClient::loop() parses incoming packets and writes them to the SD card as seen below.

Refactored Architecture

By refactoring the packet handling on the client in this way, we solve two problems – firstly, we no longer perform long operations in the ISR. Secondly, if we make the ring buffer larger than the maximum size of the data transferred in a window (plus a bit of overhead for the ring buffer itself to manage the items), we can now buffer an entire window worth of packets. MtftpClient::loop() will only send an ACK once an entire window has been processed successfully, which will only happen once all the writes are done.

This also doubles up as flow control:

  1. The server will send an entire window at once
  2. The client buffers (and requests retransmission of missing blocks if necessary) the entire window in memory
  3. The client slowly reads from the ring buffer and writes data to the SD card
  4. The client acknowledges the window once all data has been processed
  5. The server continues transmission (ie go back to Step 1)

With this implemented, we can fully test the communication between the client and server. A test with 2 nodes next to each other transferring a 1Mb file was successful, the transferred file has been confirmed to be the same by hashing it and comparing that to the original file.

However, in a realistic environment, there would be interference from the environment (whether from Wi-Fi networks, or even from our UAV remote control equipment), which will cause packets to be lost (but not corrupted, the frame check sequence ensures that data is valid). This presents problems especially when key signaling packets (eg ACK) are lost, and one side is left waiting forever for a packet that was already lost. Although the current time-out functionality will handle these cases, relying entirely on the time-out to handle missing packets affects throughput severely. Certain lost packets (eg the server’s SYNC response) can be detected by the other party and handled immediately without having to wait for the timeout.

Identifying these cases and handling them separately should improve throughput.

Separately, multiple bugs were fixed:

  1. Race condition between onTimeout call and loop
  2. The counter used to track buffered ESP-NOW packets was observed to underflow then lock-up the entire system since it never fell below MAX_BUFFERED_TX.  This was fixed by using a counting Semaphore instead of just a variable.

On our most recent test flight last Sunday, we strapped the collector to a UAV and monitored the ground node. Although none of the nodes locked up this time, the data written to the SD card by the collector was corrupted. More debugging is needed to identify and fix this issue.

Edit: The data corruption observed in data written by the collector has been fixed. file_offset is incremented after every write of an in-order, non-buffered block, but writing the buffer to file after handling missing blocks did not increment file_offset.

Protocol Optimisations

As mentioned in prior posts, we implemented a protocol based off TFTP, which we call Modified Trivial File Transfer Protocol (mtftp). The diagram above illustrates how it works:

  1. The Client (collector on the UAV) sends a Read Request (RRQ), which specifies what file (file index) and where in the file (file offset) to transmit. A window size is also transmitted which indicates how many data packets the Server should send before waiting for a response.
  2. The Server then sends up to window size DATA packets to the Client (the end of file (EOF) is indicated by sending a partial DATA packet, or zero bytes if the file size is a multiple of block length).
  3. The Client then sends an Acknowledge (ACK) packet with the block number of the largest block it successfully received.
    1. This allows for correction of missing blocks: for example, if the Client receives block number 0, 1, 2, 4, 5, 6, 7 with a window size of 8, the Client can detect that block 3 is missing, sending an ACK for block 2.
    2. If the Client receives a block of less than 247 bytes, this indicates the end of file. For example, if the client receives block 0, 1, 2, 3, where block 3 contains 200 bytes, it will not expect any further blocks and sends an ACK for block 3.
    3. If the final block in a window (ie block 7 when window size = 8) is missing, the client will never know that the window has ended. In this case, both the client and server will timeout after a short delay, terminating the entire transfer.
  4. If EOF has not been reached, the Server then advances the file offset by the number of bytes acknowledged, then starts a new window with block number 0.

We use the following terms:

  1. Block – A single chunk of data in a DATA packet. The maximum length of a block is fixed to 247 bytes in our application since ESPNOW has a maximum payload length of 250 bytes, and we use 3 bytes (1 for opcode, 2 for block number) as a packet header.
  2. Window – Transmission of up to window size DATA packets before a response is expected from the client.
  3. Transfer – The transfer of multiple Windows to transfer a file until EOF.

Initial tests demonstrates that the above works in close proximity, but when the Client and Server are far from each other in an urban environment (SPMS Atrium), packet loss is non trivial. As mentioned above, if the Client receives block 0, 1, 2, 4, 5, 6, 7, it sends an ACK for block 2 and the Server starts transmitting the data at block 3 again. This is wasteful – the Client has already received data for block 4, 5, 6, 7.


The flowchart on the left illustrates the initial implementation of the mtftp client. Solid lines depict flows involving actual packet transmission. Blocks received in-order are written to disk (block numbers 0 and 1 in the example at the bottom). However, when a block is missing (block number 2), all future blocks are discarded (block number 3, 4, 5, 6, 7, 8) because we cannot save it to file – there is no guarantee we’ll receive the missing block, and even so, we would need a way to mark that block as missing. Rather, the client sends an ACK for block 1 (the last in-order block received) and the server resumes transmission at block 2. This is obviously inefficient especially when the window size is large, say, 32 blocks. If block 1 is lost, 30 other blocks that may be received successfully are discarded.

Our solution to this is to introduce re-transmission – missing block numbers are stored and subsequent blocks are buffered. This is illustrated by the yellow blocks in the flowchart on the right. At the end of the window, the client transmits a RTX packet containing the missing block numbers. The server then re-transmits these blocks, and the client writes the initially missing blocks along with the buffered blocks to disk. This eliminates the issue described above.

However, there is still another issue here – if the RTX or ACK packets (depicted by the thick black lines in the flowchart on the right) are lost, the communication times out. If it was a RTX requesting for one of the earlier blocks in the window, most of the window will be discarded again. A possible fix for this is for the client to re-transmit a RTX or ACK should communication time out in at this portion. However, a quick implementation proved to be buggy and was reverted. We will revisit this issue later as we we would like to implement other functionality first.

Synchronising Sample Time

One of the objectives we have been working towards is ensuring samples from different nodes are time-synchronised (ie sampled at the same point in time). As mentioned here, GPS receivers are one method to obtain accurate timestamps.

The commonly available U-blox Neo-6M GPS receiver outputs a 1 Hz timepulse, which is configurable between 0.25 Hz to 1 kHz. It would be entirely possible to configure the timepulse frequency to match our target sampling frequency, then use an edge of the timepulse to start sampling. However, keeping a GPS receiver constantly powered on has severe implications on battery life.

As presented in Multi-Channel Data Acquisition System with Absolute Time Synchronization:

Measurement of the last (1000th) sample within a given second disables the timer/counter reset (it still increases every clock cycle) and introduces the microcontroller into the awaiting mode (waiting for the next synchronization pulse). When the pulse arrives, the timer/counter value is first read and then it is immediately reset. If the local clock has its nominal frequency (48 MHz), the read value should be equal to C, which corresponds to 1 ms. If it is not, the difference between C and the number of measured cycles is used to adjust C. This technique compensates for possible drifts of the local-oscillator frequency, i.e., it ensures that recorded samples are uniformly distributed over time with 1 ms spacing between them.

We can tweak this slightly by assuming that the drift of our local oscillator is relatively fixed and will not change that rapidly. This allows us to compute C periodically, say, every hour by powering on the GPS receiver, getting a few timepulse edges, then powering off the GPS receiver. The sampling rate until the next synchronisation to GPS time will then be based off the computed value of C.

The use of an ESP32 here complicates things:

  1. Hardware timers are available. Although these timers support periodic mode (ie fire a ISR at a certain interval), they are clocked by the APB clock which may not be stable over time.
  2. A RTC timer is available, which supports the use of an external 32.768 kHz crystal, but there is no support for using this timer to fire interrupts on the main SoC

With these constraints in mind, we propose the following method of synchronising the time at which sensors are sampled:

  1. Start a hardware timer
  2. Wait for at least 2 timepulse edges, store the difference of hardware timer value at each edge as C
    1. Set system (RTC) time based on the timepulse and NMEA sentences (use this to mark time when storing samples)
  3. C represents the number of timer counts (based off APB clock) in a 1 second interval, start a periodic hardware timer with period = C / target sampling frequency
    1. Configure the periodic ISR to start sampling of sensors

A quick experiment suggests that for the particular ESP32-DevKitC that was used, C is approximately 999990:

I (81745) sampling_task: pulse: 81362899
I (82745) sampling_task: pulse: 82362890
I (83745) sampling_task: pulse: 83362880
I (84745) sampling_task: pulse: 84362870
I (85745) sampling_task: pulse: 85362860
I (86745) sampling_task: pulse: 86362850
I (87745) sampling_task: pulse: 87362840
I (88745) sampling_task: pulse: 88362830
I (89745) sampling_task: pulse: 89362820
I (90745) sampling_task: pulse: 90362809
I (91745) sampling_task: pulse: 91362799
I (92745) sampling_task: pulse: 92362789
I (93745) sampling_task: pulse: 93362779
I (94745) sampling_task: pulse: 94362769
I (95745) sampling_task: pulse: 95362759
I (96745) sampling_task: pulse: 96362749
I (97745) sampling_task: pulse: 97362739
I (98745) sampling_task: pulse: 98362729
I (99745) sampling_task: pulse: 99362719

The observed drift of approximately 10 us every second would result in 36 ms of drift over 1 hour, and 864 ms over 24 hours. Given that our target sampling frequency of 80 Hz has a period of 12.5 ms, if uncorrected, this drift would be an issue.

Further testing for a longer period of time is necessary to establish whether the amount of drift is constant (and probably temperature dependent). Exploring the drift of the RTC clock (once our PCBs with a RTC crystal arrive and are assembled) may also provide alternative possibilities for synchronisation.

Speed Optimisations

The below speeds were observed using a SanDisk Ultra 16Gb A1 MicroSD Card on a hand-wired prototyping board with 2Mb files.

The very first iteration of the readFile function was an naive implementation:

  1. Open file pointer to the requested file
  2. Seek to the requested offset
  3. Read requested number of bytes
  4. Close file pointer
  5. Return data

This was slow: it took 11953726 to transfer 1048579 bytes (85.6kbyte/s).

The obvious optimisation that can be made here is to persist the file pointer between calls to readFile:

  1. Check if file pointer is open, if the current file pointer is for the correct file
    1. If not, close the current file pointer and open the requested one
  2. Seek to the requested offset
  3. Read requested number of bytes
  4. Return data

This improved speeds slightly to 124.2kbyte/s.

We can make use of a byte buffer to buffer data from the SD card, allowing us to read in large chunks:

  1. Attempt to read requested number(usually 247) of bytes from the buffer
    1. If successful, return data
  2. Else, read (up to) 8192 bytes from the SD card and write it into the buffer
  3. Read up to requested number of bytes from the buffer again
  4. Return data

This improves speeds to 229.8kbyte/s.

To prevent buffering too many ESP-NOW packets (or Wi-Fi memory allocations start to fail), code was written to track the number of buffered packets. This code issued a 10ms delay if there were too many buffered packets. It was observed that most of the time after the delay, there were 0 buffered packets left. This means that the 10ms delay might be too much, resulting in dead time where no more data was being sent. Removing this delay improved speeds to 338.8kbyte/s.

While writing this post, we came across this comment which suggested using setvbuf to buffer reads. Using this over our home-brewed ring buffer solution improves speeds substantially to 440.7kbyte/s. This isn’t too far from the 549250bytes/s we observed when testing transfers speeds without doing any processing, though we need to test this out in the field to confirm its performance.

File Transfer

Now that we’ve started on the protocol for file transfer, all we have to do is to attach it to ESP-NOW and the SDMMC controller for data transmission and retrieval respectively as illustrated in the diagram below.

We soldered a quick prototype of a SD Card socket connected to a ESP32-DevKitC:

For the software, we add another layer over mtftp to manage sessions: the collector broadcasts a synchronisation packet. The node(s), upon receiving this packet, echos this packet back to the collector and adds the collector as an ESP-NOW peer (necessary for unicast communication between two ESP-NOW devices). Upon receiving the echoed synchronisation packet, the collector also adds the node as an ESP-NOW peer and hands over control to the mtftp handlers.

Output from the node (top) and collector (bottom)

The software discussed above can be found here.

Communication Protocols

In a previous post, we examined the different options we had (Layer 3 in the OSI Model) for transmitting data.

We’ve focused our efforts so far on working with ESP-NOW for a few reasons:

  1. ESP-NOW does not require devices to be connected to the same Wi-Fi network
  2. TCP has overheads (eg handshakes) which we do not fully need.

Working with UDP would allow us to customise the later protocols to suit our needs, but UDP is still built for communication over an IP network.

ESP-NOW suits our use case perfectly:

  1. Devices do not need to be connected to the same Wi-Fi network
  2. Conceptually, it performs the same higher-throughput-no-features role that UDP does
  3. ESP-NOW supports basic encryption
  4. ESP-NOW has built-in error detection
  5. ESP-NOW supports basic re-transmission of packets that are not received

Our application has the following requirements:

  1. Guaranteed, in-order transmission of files
  2. Ability to discover available files on a remote node
  3. Ability to resume transmission from an arbitrary offset in a file

With these in mind, we implemented a modified version of the Trivial File Transfer Protocol (TFTP). TFTP is a very simple protocol used in resource-constrained environments which makes it particularly suitable for our application. Customisations:

Window Size: We implement the Window size option. TFTP’s default behavior of acknowledging every data packet is known to affect throughput. Acknowledging multiple data packets at a time improves throughput.

Binary Packing: TFTP’s use of ASCII strings is inefficient, limiting how much information we can pack into a single packet in certain situations. We opt to identify files by a number as filenames are not important in our application.

File Offsets: Read requests specify not only the file index, but also the offset at which to start transfers at. This will allow resuming of data transfers (eg the UAV goes out of range).

The work-in-progress implementation discussed above can be found here. A test project integrating both client and server together can be found here.

ESP-NOW Data Transfer

After realising the need for faster data transfer, we started looking at the ESP32 and the various protocols it supports. Although using layer 4 protocols like TCP or UDP is possible, the overheads of handling connections to an ad-hoc network could be problematic. We thus examined ESP-NOW, a protocol defined by Espressif which sends data over raw 802.11 packets directly without the need for devices to be on the same WLAN network.

Packet Capture of ESP-NOW data being sent and the resulting acknowledgement packets

Code was written to test the effects of the following parameters on the throughput of ESP-NOW:

  1. Wi-Fi Data Rate
  2. Broadcast or Unicast:
    1. Sending packets directly to a peer identified by their MAC address results in the peer sending an Acknowledgement packet back
    2. Broadcasting packets (send to MAC FF:FF:FF:FF:FF:FF) will not cause Acknowledgement packets to be sent, potentially speeding up transfer.

The test code transmits 250 byte (maximum supported packet length) ESP-NOW packets as fast as possible from a device (henceforth the master) to another device (henceforth the slave) for 3 seconds, sleeps for 10 seconds, then repeats.

The slave device then transmits the number of packets received at 1 second intervals back to the master device, which then prints out this number.

The following data rates have been observed between two ESP32-DevKitC’s approximately 1m apart in a standard household environment (ie with Wi-Fi interference):

Transmission Type Wi-Fi Data Rate Max Throughput Observed (bytes) in 1s
Unicast WIFI_PHY_RATE_1M_L 89250
Broadcast WIFI_PHY_RATE_1M_L 101000
Unicast WIFI_PHY_RATE_11M_L 349250
Broadcast WIFI_PHY_RATE_11M_L 497000
Unicast WIFI_PHY_RATE_54M 778500
Broadcast WIFI_PHY_RATE_54M 963000
Unicast WIFI_PHY_RATE_MCS0_LGI 442000
Broadcast WIFI_PHY_RATE_MCS0_LGI 502250
Unicast WIFI_PHY_RATE_MCS4_LGI 756750
Broadcast WIFI_PHY_RATE_MCS4_LGI 930750
Unicast WIFI_PHY_RATE_MCS7_LGI 224750
Broadcast WIFI_PHY_RATE_MCS7_LGI 706250
Unicast WIFI_PHY_RATE_MCS0_SGI 464750
Broadcast WIFI_PHY_RATE_MCS0_SGI 535000
Unicast WIFI_PHY_RATE_MCS4_SGI 771750
Broadcast WIFI_PHY_RATE_MCS4_SGI 945250
Unicast WIFI_PHY_RATE_MCS7_SGI 734250
Broadcast WIFI_PHY_RATE_MCS7_SGI 833500

We can observe that sending broadcast instead of unicast packets speeds up data transmissions slightly. Increasing the Wi-Fi bit rate also has a significant impact on data rates, but comes at the cost of significantly more packet loss (packets not acknowledged by peer).

Quick testing in outdoor conditions (ESP32-DevKitC placed at a distance of 20m with line-of-sight visibility) shows similar (or slight better) data rates at WIFI_PHY_RATE_1M_L and WIFI_PHY_RATE_11M_L when compared to the results above. Data rates of up to 549250 bytes/s were observed at WIFI_PHY_RATE_24M. Almost all packets were lost at WIFI_PHY_RATE_54M.

12 bit data sampled at 80Hz results in 553Mbit or 70875 kbyte of data, which would take approximate 145s to transmit at 500000 bytes/s.

These data rates were also observed using the on-board PCB antenna. An alternative option is to connect an external antenna to ESP32-WROOM-32U modules, which is likely to improve the data rate.

Data Transfer Rates

Examining Requirements

At our meeting with Prof Benoit today for possible applications of our project, it was suggested that we look into infra-sound and pressure sensors.

While these are interesting sensors, their sampling rate requirements of 80Hz or more are problematic.

Initially, we envisioned applications like monitoring of temperature, sampling once every minute at most. Assuming 12 bits per sample, over one week, this works out to 7 days * 24 hours * 60 minutes * 12 = 118kbit~ of data collected. With the SX1262 supporting LoRa speeds of up to 62.5kbit/s, this is easily transferred within a few seconds.

However, sampling (even just one) sensor at 80Hz is almost two orders of magnitude larger than what we planned for!

If we were to consider even just 12 bits per sample again (a higher resolution is likely necessary for audio/pressure), we are looking at 7 days * 24 hours * 3600 seconds * 80 * 12 = 553Mbit~ of data.

Transferring 553Mbit of data at 62.5kbit/s will take over 9000s of continuous transmission. When considering that we could be receiving data from multiple sensors at the same time, together with how far a LoRa signal reaches (and duty cycle limits we should respect), its clear that our existing design cannot scale to this level. We have not even considered protocol overheads for ensuring data integrity.

Alternatives

The next easily accessible protocol that comes to mind is Wi-Fi. We take a close look at the ESP32, a microcontroller from Espressif Systems that comes with a 2.4GHz radio. We have a quite a few options here:

Wi-Fi

Espressif’s documentation claims a maximum in-air throughput of 30Mbit/s over UDP. However, these are lab conditions and real-world performance will likely be lower. We also have to consider the overheads of joining a network.

Raw Packets/ESP-NOW

The ESP32 supports transmitting raw 802.11 packets, which can lead to some interesting possibilities like receiving packets over a distance of kilometres. Our application is slightly different – we cannot carry such a large antenna on a UAV, we don’t need 10km of range, but we do need a higher data transfer rate than what was achieved in the video.

ESP-NOW has been empirically tested to achieve rates of up to 460kb/s, but is assumed to be using the default Wi-Fi bit rate of 1Mbit/s. The bit rate is configurable.

Moving Forward

It is quite clear that LoRa will not support such high data transfer rates. Alternatives like Wi-Fi on a ESP2 will support much higher data transfer rates in theory, but we will still need to benchmark the different options and evaluate whether the theoretical performance is achievable in our setup.