Match against all 8 PNG file signature bytes#190
Match against all 8 PNG file signature bytes#190sudhir-b wants to merge 2 commits intoh2non:masterfrom
Conversation
|
APNG matcher was made by me and i forgot(?) to edit PNG matcher |
|
Done, thanks for looking - is anything else I need to do? |
|
Looks good to me |
| if (len(buf) > 8 and | ||
| buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47, | ||
| 0x0d, 0x0a, 0x1a, 0x0a])): | ||
| buf[0] == 0x89 and |
There was a problem hiding this comment.
startswith() works quickest ;)
There was a problem hiding this comment.
hmm, indeed (Python 3.12.5, w10, but idk how all this test will be perform on older versions of python)
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 232 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
1000000 loops, best of 5: 208 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(b'\x89PNG\r\n\x1a\n')"
5000000 loops, best of 5: 93.2 nsec per loopand some "what if" testing
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))"
1000000 loops, best of 5: 251 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[:8] == b'\x89PNG\r\n\x1a\n'"
5000000 loops, best of 5: 44.5 nsec per loop
python -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf[0:1] == b'\x89' and buf[1:2] == b'P' and buf[2:3] == b'N' and buf[3:4] == b'G' and buf[4:5] == b'\r' and buf[5:6] == b'\n' and buf[6:7] == b'\x1a' and buf[7:8] == b'\n'"
1000000 loops, best of 5: 398 nsec per loopThere was a problem hiding this comment.
try b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a' insead bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])
There was a problem hiding this comment.
$ python3 -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))"
2000000 loops, best of 5: 173 nsec per loop
$ python3 -m timeit -s "buf = b'\x89PNG\r\n\x1a\n'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 60 nsec per loop
There was a problem hiding this comment.
Perhaps, the speedup cause is not in comparizon function, but in data transformation:
$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0] == b'\x89' and buf[1] == b'P' and buf[2] == b'N' and buf[3] == b'G' and buf[4] == b'\r' and buf[5] == b'\n' and buf[6] == b'\x1a' and buf[7] == b'\n' else 0"
100 loops, best of 5: 3.65 msec per loop
$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0:1] == b'\x89' and buf[1:2] == b'P' and buf[2:3] == b'N' and buf[3:4] == b'G' and buf[4:5] == b'\r' and buf[5:6] == b'\n' and buf[6:7] == b'\x1a' and buf[7:8] == b'\n' else 0"
10 loops, best of 5: 28.5 msec per loop
$ python3 -m timeit -s "s=0; buf = b'\x89PNG\r\n\x1a\n' + b'x'*100000" "for i in range(100000): s+=1 if buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4e and buf[3] == 0x47 and buf[4] == 0x0d and buf[5] == 0x0a and buf[6] == 0x1a and buf[7] == 0x0a else 0"
20 loops, best of 5: 13.7 msec per loop
b'\x89' is compared quickest than 0x89 (integer). Cannot say about old versions of python.
There was a problem hiding this comment.
i was curious how "buf.startswith(bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a]))" will be perform, thats why i test it
here is some more test, first half bytes match
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 242 nsec per loop
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
2000000 loops, best of 5: 138 nsec per loop
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 94.8 nsec per loop
python -m timeit -s "buf = b'\x89PNGFOOBARFOOBAR'" "buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'"
5000000 loops, best of 5: 73.9 nsec per loopzero bytes match
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[:8] == bytearray([0x89, 0x50, 0x4e, 0x47,0x0d, 0x0a, 0x1a, 0x0a])"
1000000 loops, best of 5: 244 nsec per loop
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[0] == 0x89 and buf[1] == 0x50 and buf[2] == 0x4E and buf[3] == 0x47 and buf[4] == 0x0D and buf[5] == 0x0A and buf[6] == 0x1A and buf[7] == 0x0A"
10000000 loops, best of 5: 33.5 nsec per loop
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')"
5000000 loops, best of 5: 95.3 nsec per loop
python -m timeit -s "buf = b'FOOBARFOOBAR'" "buf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a'"
5000000 loops, best of 5: 66.5 nsec per loopbuf[:8] == b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a' faster than buf.startswith(b'\x89\x50\x4e\x47\x0d\x0a\x1a\x0a')
and is fastest only if first byte do/not match, then times will be worse than this two matchers (at least on my pc)
There was a problem hiding this comment.
Oops dint saw your message when posting this one
buf[0] == b'\x89' and buf[1] == b'P' and ... will not work because of different types
>>> buf = b'\x89PNG\r\n\x1a\n'
>>> buf[0] == b'\x89'
False
>>> type(buf[0])
<class 'int'>
>>> type(buf[0:1])
<class 'bytes'>
>>> type(0x89)
<class 'int'>
Update the PNG matcher to match all 8 bytes of the PNG signature as in:
https://www.w3.org/TR/png/#5PNG-file-signature
I'm not sure that this makes any functional difference but spotted a discrepancy and thought I'd open this PR for completeness.
As an aside, it looks like the APNG matcher matches against an identical
bytearrayto begin with - is there a reason for this difference in the two matchers?