Incorrect usage of sys.getsizeof
to calculate the byte size of event data
#236
Labels
sys.getsizeof
to calculate the byte size of event data
#236
Description
The
trigger_batch
andtrigger
functions in the library are usingsys.getsizeof(event['data'])
to measure the size of the event data. However,sys.getsizeof()
returns the size of the object in memory, which includes overhead and doesn't accurately represent the actual byte size of the data when encoded for transmission over HTTP. This can lead to inconsistencies and false positives when checking against the 10KB limit, resulting inValueError: Too much data
exceptions even when the data is within acceptable limits.pusher-http-python/pusher/pusher_client.py
Lines 117 to 143 in 239d67b
Steps to Reproduce:
Upon modifying the
trigger_batch
function to add some logging as follows:I get the following output:
Notice how the result of
sys.getsizeof
and the UTF-8 encoded byte size is drastically different for the last event just because it contains one non-ascii character (≥
).Expected Behavior:
The function should allow sending event data that is under the 10KB limit when encoded, without raising an exception.
Actual Behavior:
A ValueError is raised stating "Too much data" even when the actual encoded data size is under 10KB.
Analysis:
Using
sys.getsizeof()
is not reliable for measuring the size of the data to be sent over the network. It measures the memory footprint of the object in Python, which can include additional overhead and doesn't correspond to the actual size of the encoded data.Here is some more proof on how
sys.getsizeof
can be wildly inaccurate for calculating the byte size of data:Proposed Solution:
Replace the size check using
sys.getsizeof(event['data'])
withlen(event['data'].encode('utf-8'))
to accurately measure the byte size of the data when encoded.Additional Information:
The text was updated successfully, but these errors were encountered: