I wrote the code below to simulate upload to S3 from Lazy ByteString (which will be received over the network socket. Here, we simulate by reading from a file of size ~100MB). The problem with the code below is that it seems to be forcing the read of entire file into memory instead of chunking it (cbytes) - will appreciate pointers on why chunking is not working:
import Control.Lens
import Network.AWS
import Network.AWS.S3
import Network.AWS.Data.Body
import System.IO
import Data.Conduit (($$+-))
import Data.Conduit.Binary (sinkLbs,sourceLbs)
import qualified Data.Conduit.List as CL (mapM_)
import Network.HTTP.Conduit (responseBody,RequestBody(..),newManager,tlsManagerSettings)
import qualified Data.ByteString.Lazy as LBS
example :: IO PutObjectResponse
example = do
-- To specify configuration preferences, newEnv is used to create a new Env. The Region denotes the AWS region requests will be performed against,
-- and Credentials is used to specify the desired mechanism for supplying or retrieving AuthN/AuthZ information.
-- In this case, Discover will cause the library to try a number of options such as default environment variables, or an instance's IAM Profile:
e <- newEnv NorthVirginia Discover
-- A new Logger to replace the default noop logger is created, with the logger set to print debug information and errors to stdout:
l <- newLogger Debug stdout
-- The payload for the S3 object is retrieved from a file that simulates lazy bytestring received over network
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
-- We now run the AWS computation with the overriden logger, performing the PutObject request:
runResourceT . runAWS (e & envLogger .~ l) $
send ((putObject "yourtestenv-change-it-please" "testbucket/test" cbytes) & poContentType .~ Just "text; charset=UTF-8")
main = example >> return ()
Running the executable with RTS -s option shows that entire thing is read into memory (~113MB maximum residency - I did see ~87MB once). On the other hand, if I use chunkedFile, it is chunked correctly (~10MB maximum residency).
It's clear this bit
inb <- LBS.readFile "out"
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (sourceLbs inb)
should be rewritten as
lenb <- System.IO.withFile "out" ReadMode hFileSize -- evaluates to 104857600 (100MB)
let cbytes = toBody $ ChunkedBody (1024*128) (fromIntegral lenb) (C.sourceFile "out")
As you wrote it, the purpose of conduits is defeated. The entire file would need to be accumulated by LBS.readFile, but then broken apart chunk by chunk when fed to sourceLBS. (If lazy IO is working right, this might not happen.) sourceFile reads the file incrementally, chunk by chunk. It may be that, e.g. toBody accumulates the whole file, in which case the point of conduits is defeated at a different point. Glancing at the source for send and so on I can't see anything that would do this, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With