Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading and processing data from an S3 stream

I’d like to read data from a large file (order of gbs) in S3 and process it on-the-fly (as opposed to loading the entire file in memory or caching it locally). In some cases, the processing may be lengthy and could potentially “stall” the reading process for several minutes or longer. That is, the connection used to stream the data may become idle for several minutes or more. Below is a contrived example that demonstrates this:

InputStream readStream = s3Client.getObject(GetObjectRequest.builder().bucket(bucketLocation).key(fileLocation).build());
readStream.readNBytes(100);
Thread.sleep(600000); // Wait for 10 mins
readStream.readAllBytes(); // Throws SocketException

In this example, the second read attempt will throw a java.net.SocketException: Connection reset error.

I’ve made several attempts to configure the HttpClient to keep the connection open, including the following configuration:

S3Client s3Client = S3Client.builder()
            .httpClient(
                ApacheHttpClient.builder()
                    .maxConnections(100)
                    .tcpKeepAlive(Boolean.TRUE)
                    .connectionTimeToLive(Duration.ofHours(1))
                    .connectionMaxIdleTime(Duration.ofHours(1))
                    .socketTimeout(Duration.ofHours(1))
                    .connectionTimeout(Duration.ofHours(1))
                    .build())
           .region(region)
           .credentialsProvider(awsCredentials)
           .build();

Unfortunately, none of these settings seem to have any impact in resolving this particular problem. Is there anything else I’m missing here? Or is this design inherently flawed?

like image 490
a63312001 Avatar asked Mar 19 '26 23:03

a63312001


1 Answers

Look at using the Amazon S3 Transfer Manager to work with larger Amazon S3 objects over the standard client. The client object is S3TransferManager.

The Amazon S3 Transfer Manager is an open source, high level file transfer utility for the AWS SDK for Java 2.x. Use it to transfer files and directories to and from Amazon S3.

See the documentation, including examples, in the AWS SDK Java V2 Developer Guide.

Amazon S3 Transfer Manager

like image 71
smac2020 Avatar answered Mar 22 '26 13:03

smac2020