My file names look the same but they are not.
I copied many_img/
from Debian1 to OS X, then from OS X to Debian2 (for maintenance purpose) with using rsync -a -e ssh
on each step to preserve everything.
If i do ls many_img/img1/*
i get visually the same output on Debian1 and Debian2 :
prévisionnel.jpg
But somehow, ls many_img/img1/* | od -c
gives different results:
On Debian1:
0000000 p r 303 251 v i s i o n n e l . j p
0000020 g \n
On Debian2:
0000000 p r e 314 201 v i s i o n n e l . j
0000020 p g \n
Thus my web app on Debian2 cannot match the picture in the file system with filename in database.
i thought maybe i need to change file encoding, but it looks like it's already utf-8 on every OS:
convmv --notest -f iso-8859-15 -t utf8 many_img/img1/*
Returns:
Skipping, already UTF-8
Is there a command to get back all my 40 thousands file names like on my Debian 1 from my Debian 2 (without transfering all again) ? I am confused if it is a file name encoding problem or anything else ?
I finaly found command line conversion tools i was looking for (thanks @Mark for setting me on the right track !)
Ok, i didn't know OS X was encoding file names under the hood with a different UTF-8 Normalization.
HSF+ file system encode every single file name character in UTF-16. Unicode characters are Decomposed on OS X versus Precomposed on Linux OS.
é
for instance (Latin small letter e with acute accent), is technically a (U+00E9)
character on Linux
and is decomposed into a base letter "e" (U+0065)
and an acute accent (U+0301)
in its decomposed form (NFD) on OS X.
This command executed from Linux OS will convert file name from NFD to NFC:
convmv --notest --nfc -f utf8 -t utf8 /path/to/my/file
This command executed from OS X will rsync over ssh with NFD to NDC on the fly conversion:
rsync -a --iconv=utf-8-mac,utf-8 -e ssh path/to/my/local/directory/* user@destinationip:/remote/path/
I tested the two methods and it works like a charm.
Note:
--iconv
option is only available with rsync V3 whereas OS X provides an old 2.6.9 version by default so you'll need to update it first.
Typically to check and upgrade :
rsync --version
brew install rsync
echo 'export PATH=/usr/local/bin:$PATH' >> ~/.profile
The first filename contains the single character é
while the second contains a simple e
followed by the combining character ́
(COMBINING ACUTE ACCENT). They're both valid Unicode, they're just normalized differently. It appears the OS normalized the filename as it created the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With