Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is EnumerateFiles much quicker than calculating the sizes

Tags:

c#

.net

For my WPF project, I have to calculate the total file size in a single directory (which could have sub directories).

Sample 1

DirectoryInfo di = new DirectoryInfo(path);
var totalLength = di.EnumerateFiles("*.*", SearchOption.AllDirectories).Sum(fi => fi.Length);

if (totalLength / 1000000 >= size)
    return true;

Sample 2

 var sizeOfHtmlDirectory = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories);
 long totalLength = 0;
 foreach (var file in sizeOfHtmlDirectory)
 {
     totalLength += new FileInfo(file).Length;
     if (totalLength / 1000000 >= size)
         return true;
 }

Both samples work.

Sample 1 complete in a massivly faster time. I've not timed this accurately but on my PC, using the same folder with the same content/file sizes, Sample 1 takes a few seconds, Sample 2 takes a few minutes.

EDIT

I should point out, the bottle neck in Sample 2 is within the foreach loop! It reads the GetFiles quickly and enters the foreach loop quickly.

My question is, how do I find out why this is the case?

like image 621
MyDaftQuestions Avatar asked Apr 22 '15 14:04

MyDaftQuestions


People also ask

What does it mean to Enumerate files?

EnumerateFiles(String, String, EnumerationOptions) Returns an enumerable collection of full file names that match a search pattern and enumeration options in a specified path, and optionally searches subdirectories. EnumerateFiles(String) Returns an enumerable collection of full file names in a specified path.

How do I get the size of a directory in C#?

To calculate the size of a folder in C#, use the Directory. EnumerateFiles Method and get the files. Creates all directories and subdirectories in the specified path unless they already exist. Creates all the directories in the specified path, unless the already exist, applying the specified Windows security.

How do I enumerate files and folders?

To enumerate directories and files, use methods that return an enumerable collection of directory or file names, or their DirectoryInfo, FileInfo, or FileSystemInfo objects. If you want to search and return only the names of directories or files, use the enumeration methods of the Directory class.


1 Answers

Contrary to what the other answers indicate the main difference is not EnumerateFiles vs GetFiles - it's DirectoryInfo vs Directory - in the latter case you only have strings and have to create new FileInfo instances separately which is very costly.

DirectoryInfo returns FileInfo instances that use cached information vs directly creating new FileInfo instances which does not - more details here and here.

Relevant quote (via "The Old New Thing"):

In NTFS, file system metadata is a property not of the directory entry but rather of the file, with some of the metadata replicated into the directory entry as a tweak to improve directory enumeration performance. Functions like Find­First­File report the directory entry, and by putting the metadata that FAT users were accustomed to getting "for free", they could avoid being slower than FAT for directory listings. The directory-enumeration functions report the last-updated metadata, which may not correspond to the actual metadata if the directory entry is stale.

like image 167
BrokenGlass Avatar answered Oct 13 '22 00:10

BrokenGlass