I want to detect if I saw a file already and would like to identify it with something unique. Under Linux there is the inode number together with the device id (see stat() or fstat()). I assume under Windows I would find something similar.
To start easy, the boost::filesystem offers convenient methods, e.g. I can use boost::filesystem::recursive_directory_iterator to traverse the directory tree. The file_status gives me if it is a regular file, but not the inode number.
The closest thing I found was boost::filesystem::equivalent() taking two paths. I guess this is also the most portable design.
The thing is that I would like to put the inode numbers into a database to have a quick lookup. I cannot do this with this function, I would have to call equivalent() with all paths already existing in the database.
Am I out of luck and boost will not provide me such information due to portability reasons?
(edit) The intention is to detect duplicates via hardlinks during one scan of a folder tree. equivalent() does exactly that, but I would have to do a quadratic algorithm.
The Windows CRT implementation of stat always uses zero for the inode, so you will have to roll your own. This is because on Windows FindFirstfile is faster than GetFileInformationByHandle, so stat uses FindFirstFile, which does not include the inode information. If you don't need the inode, that's great, performance win. But if you do, the following will help.
The NTFS equivalent to the INODE is the MFT Record Number, otherwise known as the file ID. It has slightly different properties, but to within a margin of error can be used for the same purposes as the INODE, i.e. identifying whether two paths point to the same file.
You can use GetFileInformationByHandle or GetFileInformationByHandleEx to retrieve this information. You will first have to call CreateFile to obtain the file handle.
FILE_READ_ATTRIBUTES rights only to get the file ID. FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETEOPEN_EXISTING as the disposition.Once you have the handle, use one of the GetFileInformation functions to obtain the file ID, then close the handle.
This information you need is available in the BY_HANDLE_FILE_INFORMATION nFileIndexLow and nFileIndexHigh members or if ReFS is in use, then a 128 bit file ID may be in use. To obtain this you must use the updated function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With