I know this is something of a "classic question", but does the mysql/grails (deployed on Tomcat) put a new spin on considering how to approach storage of user's uploaded files.
I like using the database for everything (simpler architecture, scaling is just scaling the database). But using the filesystem means we don't lard up mysql with binary files. Some might also argue that apache (httpd) is faster than Tomcat for serving up binary files, although I've seen numbers that actually show just putting Tomcat on the front of your site can be faster than using an apache (httpd) proxy.
How should I choose where to place user's uploaded files?
Thanks for your consideration, time and thought.
I don't know if one can make general observations about this kind of decision, since it's really down to what you are trying to do and how high up the priority list NFRs like performance and response time are to your application.
If you have lots of users, uploading lots of binary files, with a system serving large numbers of those uploaded binary files then you have a situation where the costs of storing files in the database include:
Benefits are
Given the same user situation where you store to the filesystem you will need to address
We had a similar problem to solve as this for our Grails site where the content editors are uploading hundreds of pictures a day. We knew that driving all that demand through the application when it could be better used doing other processing was wasteful (given that the expected demand for pages was going to be in the millions per week we definitely didn't want images to cripple us).
We ended up creating upload -> file system solution. For each uploaded file a DB meta-data record was created and managed in tandem with the upload process (and conversely read that record when generating the GSP content link to the image). We served requests off disk through Apache directly based on the link requested by the browser. But, and there is always a but, remember that with things like filesystems you only have content per machine.
We had the headache of making sure images got re-synchronised onto every server, since unlike a DB which sits behind the cluster and enables the cluster behave uniformly, files are bound to physical locations on a server.
Another problem you might run up against with filesystems is folder content size. When you start having folders where there are literally tens of thousands of files in them, the folder scan at the OS level starts to really drag. To avert this problem we had to write code which managed image uploads into yyyy/MM/dd/image.name.jpg folder structures, so that no one folder accumulated hundreds of thousands of images.
What I'm implying is that while we got the performance we wanted by not using the DB for BLOB storage, that comes at the cost of development overhead and systems management.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With