Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to copy files from one directory to another

I need to copy files from one directory to another, depending on the existence of the file name in a table of SQL database.

For this I use the following code:

using(SqlConnection connection = new SqlConnection("datasource or route"))
{

  connection.Open();

  using(SqlCommand cmd = new SqlCommand("SELECT idPic, namePicFile FROM DocPicFiles", connection))
  using (SqlDataReader reader = cmd.ExecuteReader())
  {

    if (reader != null)
    {
      while (reader.Read())
      {
        //picList IS AN ARRAY THAT Contains All the files names in a directory
        if (picList.Any(s => s.Contains(reader["namePicFile"].ToString())))
        {
          File.Copy("theFile  in the Directory or array picList",  "the destiny directory"+ ".jpg", false)
        }
      }
    }
  }
}

Is there any way that this can be done in less time? It takes 1 hour for do that, for 20.876 records.

like image 584
ger Avatar asked Oct 23 '25 19:10

ger


2 Answers

File.Copy is as fast as it gets. You must keep in mind that you depend on the file transfer speed dictated by your hardware and at 20000 files, the latency for data access also comes into play. If you are doing this on a HDD, you could see a big improvement after switching to SSD or some other fast medium.

For this case alone, most likely the hardware is your bottleneck.

EDIT: I consider keeping the connection to the database open for such a long time as a bad practice. I suggest you fetch all the needed data in some in-memory cache (array, list, whatever) and then iterate through that as you copy the files. A db connection is a precious resource and on applications that must handle high concurrency (but not only), releasing the connection fast is a must.

like image 126
Mihai Caracostea Avatar answered Oct 26 '25 08:10

Mihai Caracostea


Allow me to make a guess - No. There is no way to do it faster.

How come I am so confident? Because file copying requires talking to disk and that is a horribly slow operation. Even further, if you try to go for multi-threading, the results will go slower instead of faster because the mechanical operation of moving the head over the disk isn't sequential anymore, which may have been earlier by chance.

See answers to this question I asked earlier.

Switch to SSDs if you aren't yet using them, otherwise you are getting the best already.

Below here is something for us to put into perspective what does slow mean in disk writing when compared to caches. If cache access is taking 10 min., it implies that it takes 2 years to read from disk. All the accesses are shown in the image below. Clearly when your code will execute, the bottleneck will be disk writes. The best you can do it to let the disk writes stay sequential.

enter image description here

like image 31
displayName Avatar answered Oct 26 '25 08:10

displayName