I am trying to write a large dataset (10 cols, 100M records) from R to SAP HANA using RJDBC's dbWritetable in the following way
library("RJDBC")
drv <- JDBC("com.sap.db.jdbc.Driver", "/data/hdbclient/ngdbc.jar", "'")
database <- dbConnect( drv,"jdbc:sap://servername", "USER", "PASS")
dbWriteTable(database, "largeSet", largeSet)
This works, but is extremely slow (75k records per HOUR). I have tested RODBC (sqlsave) as well and this shows the same issue.
Looking at the code behind dbWriteTable it seems that writing is record by record (i.e. the same as insert into) and indeed using a line by line insert into using dbSendUpdate shows the same performance. I have verified that the problem is not in the connection speed itself.
ROracle has a bulk_write option which seems to solve this issue, but since I am trying to write to HANA I need RJDBC or RODBC.
Can anyone tell me how I can speed up the write to HANA by running a bulk write or some other method?
If your main goal is a speed-up, without changing too much else, you could switch to the sjdbc package, which is a lot more performant in this respect than RJDBC (which sadly didn't get much attention in recent years).
While I write this and check back on CRAN, it looks like Simon has just recently found back to it and published a new release only a week ago. This does in fact include an improvement to dbSendUpdate:
https://cran.r-project.org/web/packages/RJDBC/NEWS
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With