Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol gets out of sequence if the client misses packets. #30

Open
fifteenhex opened this issue Aug 27, 2020 · 2 comments
Open

Protocol gets out of sequence if the client misses packets. #30

fifteenhex opened this issue Aug 27, 2020 · 2 comments

Comments

@fifteenhex
Copy link
Contributor

I'm fixing up ethernet on a little embedded thing and I noticed this:

The server sends block 10. The client doesn't see it so it sends an ack for block 9 again expecting get block 10 resent
but gets 11 instead.

Packet dumps:

19:10:13.194542 70:85:c2:8e:66:d7 (oui Unknown) > be:e8:98:09:13:90 (oui Unknown), ethertype IPv4 (0x0800), length 558: (tos 0x0, ttl 64, id 1798, offset 0, flags [DF], proto UDP (17), length 544)
    192.168.3.235.tftp > 192.168.3.245.1028:  516 DATA block 10
        0x0000:  bee8 9809 1390 7085 c28e 66d7 0800 4500  ......p...f...E.
        0x0010:  0220 0706 4000 4011 a896 c0a8 03eb c0a8  ....@.@.........
        0x0020:  03f5 0045 0404 020c 8b4e 0003 000a 0020  ...E.....N......
        0x0030:  521a c9f8 3820 d9f8 3820 120c 1204 c9f8  R...8...8.......
        0x0040:  3820 d9f8 3820 c9f8 c820 7047 fff7 e7bf  8...8.....pG....
        0x0050:  7047 d9f8 c830 41ea 0051 41f4 4061 43f8  pG...0A..QA.@aC.
        0x0060:  2010 7047 08b5 d9f8 0020 d9f8 0430 db07  ..pG.........0..
        0x0070:  0ad5 02eb c002 906c 000d d36c 916c 1b0d  .......l...l.l..
        0x0080:  03eb 1153 8342 00d8 08bd 1e21 fff7 e1ff  ...S.B.....!....
        0x0090:  0130 f2e7 11ee 103f 43f4 8053 01ee 103f  .0.....?C..S...?
        0x00a0:  bff3 6f8f 7047 11ee 103f 11ee 103f 23f4  ..o.pG...?...?#.
        0x00b0:  8053 01ee 103f bff3 6f8f 7047 11ee 100f  .S...?..o.pG....
        0x00c0:  c0f3 0030 7047 10b5 11ee 104f 14f0 0104  ...0pG.....O....
        0x00d0:  27d1 fff7 f3f9 2046 1221 0134 fff7 b9ff  '......F.!.4....
        0x00e0:  b4f5 805f f7d1 0020 fff7 bcff 0023 02ee  ..._.........#..
        0x00f0:  503f d9f8 c830 23f4 7f53 23f0 3f03 43f0  P?...0#..S#.?.C.
        0x0100:  5903 02ee 103f 4ff0 ff33 03ee 103f fff7  Y....?O..3...?..
        0x0110:  9fff 11ee 103f 43f0 0103 01ee 103f bff3  .....?C......?..
        0x0120:  6f8f 11ee 103f 43f0 0403 01ee 103f bff3  o....?C......?..
        0x0130:  6f8f 10bd 10b5 11ee 103f 5b07 09d5 11ee  o........?[.....
        0x0140:  104f fff7 acf9 24f0 0504 01ee 104f bff3  .O....$......O..
        0x0150:  6f8f 10bd 11ee 100f c0f3 8000 7047 0020  o...........pG..
        0x0160:  7047 4ff0 8062 0020 c9f8 3c20 7047 ff22  pGO..b....<.pG."
        0x0170:  024b 1a80 7922 1a80 fee7 b81c 001f 08b5  .K..y"..........
        0x0180:  fff7 88ff bde8 0840 fff7 9dbf 11ee 300f  [email protected].
        0x0190:  40f0 4000 01ee 300f 7047 bff3 5f8f 0022  @[email protected].._.."
        0x01a0:  074b 1a80 bff3 5f8f 0122 1a80 054a 1388  .K...._.."...J..
        0x01b0:  9bb2 bff3 5f8f db04 f9d5 7047 00bf 1444  ...._.....pG...D
        0x01c0:  201f 4044 201f fff7 e8bf fff7 e6bf 0d4b  [email protected]
        0x01d0:  1b78 d92b 13d0 05d8 ae2b 12d0 c22b 06d0  .x.+.....+...+..
        0x01e0:  0020 7047 ee2b 14bf 0020 0520 7047 064b  ..pG.+......pG.K
        0x01f0:  9b8a b3f5 886f 0cbf 0320 0220 7047 0420  .....o......pG..
        0x0200:  7047 0120 7047 003c 001f 0040 001f 08b5  pG..pG.<...@....
        0x0210:  0ef0 8ffe 4ff4 f072 024b 0020 1a60 08bd  ....O..r.K...`..
        0x0220:  00bf 0020 0016 08b5 0246 fff7 d0ff       .........F....
19:10:18.194645 be:e8:98:09:13:90 (oui Unknown) > 70:85:c2:8e:66:d7 (oui Unknown), ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 255, id 47, offset 0, flags [DF], proto UDP (17), length 32)
    192.168.3.245.1028 > 192.168.3.235.tftp:  4 ACK block 9
        0x0000:  7085 c28e 66d7 bee8 9809 1390 0800 4500  p...f.........E.
        0x0010:  0020 002f 4000 ff11 f26c c0a8 03f5 c0a8  .../@....l......
        0x0020:  03eb 0404 0045 000c 0000 0004 0009 0000  .....E..........
        0x0030:  0000 0000 0000 0000 0000 0000            ............
19:10:18.194816 70:85:c2:8e:66:d7 (oui Unknown) > be:e8:98:09:13:90 (oui Unknown), ethertype IPv4 (0x0800), length 558: (tos 0x0, ttl 64, id 2225, offset 0, flags [DF], proto UDP (17), length 544)
    192.168.3.235.tftp > 192.168.3.245.1028:  516 DATA block 11
        0x0000:  bee8 9809 1390 7085 c28e 66d7 0800 4500  ......p...f...E.
        0x0010:  0220 08b1 4000 4011 a6eb c0a8 03eb c0a8  ....@.@.........
        0x0020:  03f5 0045 0404 020c 8b4e 0003 000b 0138  ...E.....N.....8
        0x0030:  0528 13d8 dfe8 00f0 030c 0c10 100e 0849  .(.............I
        0x0040:  1046 20f0 85f9 0038 18bf 0120 4042 08bd  .F.....8....@B..
        0x0050:  0549 f5e7 0549 f3e7 0549 f1e7 4ff0 ff30  .I...I...I..O..0
        0x0060:  f5e7 4574 0220 4f74 0220 5974 0220 6374  ..Et..Ot..Yt..ct
        0x0070:  0220 08b5 1ef0 93fb 0020 08bd 0000 2de9  ..............-.
        0x0080:  f04d 2b4b 8ab0 1b78 0622 8df8 1030 294b  .M+K...x."...0)K

Server output:

ERROR(tftpd): Got ACK with incoherent data packet number. Aborting transfer.
@rgov
Copy link

rgov commented Oct 9, 2020

The serveACK method in tftpserver.py handles this, but the implementation leaves room for improvement, in fact there is a # TODO comment that it is incomplete.

My quick fix was to edit __next_send in state.py to save the file position before the file is read:

self.last_pos = self.file.tell()
fromfile = self.file.read(blksize - len(self.tosend))

In serveACK, I changed the logic to:

            if peer_state.packetnum == num:
                # Got the ACK we expected, carry on
                pass
            elif peer_state.packetnum == num + 1:
                # Got another ACK for the previous packet, send it again
                peer_state.file.seek(peer_state.last_pos, os.SEEK_SET)
                peer_state.packetnum = num
            else:
                # TODO: handle ACK recovery when operating in streaming mode.
                peer_state.state = state.STATE_ERROR
                peer_state.error = proto.ERROR_ILLEGAL_OP
                l.error('Got ACK with incoherent data packet number. '
                        'Aborting transfer.',
                        extra=peer_state.extra(notify.TRANSFER_FAILED))

To understand this a bit better, num is the incoming ACK number and peer_state.packetnum is the last packet number we sent. We expect peer_state.packetnum to equal num in the normal case.

However if we receive the same ACK twice, we want to rewind the file back to the saved last_pos and do another read. The current code simply ignores it and carries on sending the next block even though the client never acknowledged receiving the last one.

I don't know how you are supposed to recover from a bad ACK when a windowsize option has been set, but thankfully my client doesn't set it, so this fix works for me. I also don't know if I handled the wraparound case properly.

@mpetazzoni Could you please consider whether this is a better solution than the current behavior?

@semirke
Copy link

semirke commented Oct 10, 2024

The serveACK method in tftpserver.py handles this, but the implementation leaves room for improvement, in fact there is a # TODO comment that it is incomplete.

My quick fix was to edit __next_send in state.py to save the file position before the file is read:

self.last_pos = self.file.tell()
fromfile = self.file.read(blksize - len(self.tosend))

In serveACK, I changed the logic to:

            if peer_state.packetnum == num:
                # Got the ACK we expected, carry on
                pass
            elif peer_state.packetnum == num + 1:
                # Got another ACK for the previous packet, send it again
                peer_state.file.seek(peer_state.last_pos, os.SEEK_SET)
                peer_state.packetnum = num
            else:
                # TODO: handle ACK recovery when operating in streaming mode.
                peer_state.state = state.STATE_ERROR
                peer_state.error = proto.ERROR_ILLEGAL_OP
                l.error('Got ACK with incoherent data packet number. '
                        'Aborting transfer.',
                        extra=peer_state.extra(notify.TRANSFER_FAILED))

To understand this a bit better, num is the incoming ACK number and peer_state.packetnum is the last packet number we sent. We expect peer_state.packetnum to equal num in the normal case.

However if we receive the same ACK twice, we want to rewind the file back to the saved last_pos and do another read. The current code simply ignores it and carries on sending the next block even though the client never acknowledged receiving the last one.

I don't know how you are supposed to recover from a bad ACK when a windowsize option has been set, but thankfully my client doesn't set it, so this fix works for me. I also don't know if I handled the wraparound case properly.

@mpetazzoni Could you please consider whether this is a better solution than the current behavior?

I think Im in love with you.

Thanks! :)
Seems to work. My client also sets block size and an - I guess - unreal widow size, too.
But your patch seems to work so far. I cannot be 100% sure as the event came up randomly, but has been ok for the last 10 tests in a row.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants