Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best approach to collecting log files from remote machines?

I have over 500 machines distributed across a WAN covering three continents. Periodically, I need to collect text files which are on the local hard disk on each blade. Each server is running Windows server 2003 and the files are mounted on a share which can be accessed remotely as \server\Logs. Each machine holds many files which can be several Mb each and the size can be reduced by zipping.

Thus far I have tried using Powershell scripts and a simple Java application to do the copying. Both approaches take several days to collect the 500Gb or so of files. Is there a better solution which would be faster and more efficient?

like image 231
John Channing Avatar asked Oct 15 '25 07:10

John Channing


2 Answers

I guess it depends what you do with them ... if you are going to parse them for metrics data into a database, it would be faster to have that parsing utility installed on each of those machines to parse and load into your central database at the same time.

Even if all you are doing is compressing and copying to a central location, set up those commands in a .cmd file and schedule it to run on each of the servers automatically. Then you will have distributed the work amongst all those servers, rather than forcing your one local system to do all the work. :-)

like image 64
Ron Savage Avatar answered Oct 17 '25 05:10

Ron Savage


The first improvement that comes to mind is to not ship entire log files, but only the records from after the last shipment. This of course is assuming that the files are being accumulated over time and are not entirely new each time.

You could implement this in various ways: if the files have date/time stamps you can rely on, running them through a filter that removes the older records from consideration and dumps the remainder would be sufficient. If there is no such discriminator available, I would keep track of the last byte/line sent and advance to that location prior to shipping.

Either way, the goal is to only ship new content. In our own system logs are shipped via a service that replicates the logs as they are written. That required a small service that handled the log files to be written, but reduced latency in capturing logs and cut bandwidth use immensely.

like image 41
Godeke Avatar answered Oct 17 '25 05:10

Godeke



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!