Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aws sagemaker training pipe mode reading random number of bytes

I am using my own algorithm and loading data in json format from s3. Because of the huge size of data, I need to setup pipe mode. I have followed the instructions as given in: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/pipe_bring_your_own/train.py. As a result, I am able to setup pipe and read data successfully. Only problem is that fifo pipe is not reading the specified amount of bytes. For example, given path to s3-fifo-channel,

    number_of_bytes_to_read = 555444333
    with open(fifo_path, "rb", buffering=0) as fifo:
        while True:
            data = fifo.read(number_of_bytes_to_read)

The length of data should be 555444333 bytes, but it is always less 12,123,123 bytes or so. Data in S3 looks as following:

s3://s3-bucket/1122/part1.json
s3://s3-bucket/1122/part2.json
s3://s3-bucket/1133/part1.json
s3://s3-bucket/1133/part2.json

and so. Is there any way to enforce the number of bytes to be read? Any suggestion will be helpful. Thanks.

like image 618
AbdulRehmanLiaqat Avatar asked Nov 21 '25 06:11

AbdulRehmanLiaqat


1 Answers

We just needed to add some positive value in the buffering and the problem was solved. Code will buffer 555444333 Bytes and then process 111222333 bytes each time. Since our files are in Json, we can easily convert incoming bytes to string and then clean strings by removing incomplete json parts. Final code looks like:

number_of_bytes_to_read = 111222333
number_of_bytes_to_buffer = 555444333
with open(fifo_path, "rb", buffering=number_of_bytes_to_buffer) as fifo:
    while True:
      data = fifo.read(number_of_bytes_to_read)
like image 146
AbdulRehmanLiaqat Avatar answered Nov 22 '25 18:11

AbdulRehmanLiaqat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!