As mentioned in prior posts, we implemented a protocol based off TFTP, which we call Modified Trivial File Transfer Protocol (mtftp). The diagram above illustrates how it works:
- The Client (collector on the UAV) sends a Read Request (RRQ), which specifies what file (file index) and where in the file (file offset) to transmit. A window size is also transmitted which indicates how many data packets the Server should send before waiting for a response.
- The Server then sends up to window size DATA packets to the Client (the end of file (EOF) is indicated by sending a partial DATA packet, or zero bytes if the file size is a multiple of block length).
- The Client then sends an Acknowledge (ACK) packet with the block number of the largest block it successfully received.
- This allows for correction of missing blocks: for example, if the Client receives block number 0, 1, 2, 4, 5, 6, 7 with a window size of 8, the Client can detect that block 3 is missing, sending an ACK for block 2.
- If the Client receives a block of less than 247 bytes, this indicates the end of file. For example, if the client receives block 0, 1, 2, 3, where block 3 contains 200 bytes, it will not expect any further blocks and sends an ACK for block 3.
- If the final block in a window (ie block 7 when window size = 8) is missing, the client will never know that the window has ended. In this case, both the client and server will timeout after a short delay, terminating the entire transfer.
- If EOF has not been reached, the Server then advances the file offset by the number of bytes acknowledged, then starts a new window with block number 0.
We use the following terms:
- Block – A single chunk of data in a DATA packet. The maximum length of a block is fixed to 247 bytes in our application since ESPNOW has a maximum payload length of 250 bytes, and we use 3 bytes (1 for opcode, 2 for block number) as a packet header.
- Window – Transmission of up to window size DATA packets before a response is expected from the client.
- Transfer – The transfer of multiple Windows to transfer a file until EOF.
Initial tests demonstrates that the above works in close proximity, but when the Client and Server are far from each other in an urban environment (SPMS Atrium), packet loss is non trivial. As mentioned above, if the Client receives block 0, 1, 2, 4, 5, 6, 7, it sends an ACK for block 2 and the Server starts transmitting the data at block 3 again. This is wasteful – the Client has already received data for block 4, 5, 6, 7.
The flowchart on the left illustrates the initial implementation of the mtftp client. Solid lines depict flows involving actual packet transmission. Blocks received in-order are written to disk (block numbers 0 and 1 in the example at the bottom). However, when a block is missing (block number 2), all future blocks are discarded (block number 3, 4, 5, 6, 7, 8) because we cannot save it to file – there is no guarantee we’ll receive the missing block, and even so, we would need a way to mark that block as missing. Rather, the client sends an ACK for block 1 (the last in-order block received) and the server resumes transmission at block 2. This is obviously inefficient especially when the window size is large, say, 32 blocks. If block 1 is lost, 30 other blocks that may be received successfully are discarded.
Our solution to this is to introduce re-transmission – missing block numbers are stored and subsequent blocks are buffered. This is illustrated by the yellow blocks in the flowchart on the right. At the end of the window, the client transmits a RTX packet containing the missing block numbers. The server then re-transmits these blocks, and the client writes the initially missing blocks along with the buffered blocks to disk. This eliminates the issue described above.
However, there is still another issue here – if the RTX or ACK packets (depicted by the thick black lines in the flowchart on the right) are lost, the communication times out. If it was a RTX requesting for one of the earlier blocks in the window, most of the window will be discarded again. A possible fix for this is for the client to re-transmit a RTX or ACK should communication time out in at this portion. However, a quick implementation proved to be buggy and was reverted. We will revisit this issue later as we we would like to implement other functionality first.