Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Russian (or non-latin) filename load Qt Windows

I'm trying to load files on a Qt/C++ software that include non-Latin characters. The problem reported by a user with Russian filenames and I tried to quickly fix it with the code below.

Example filename was (I don't read or write russian!): Летний сад.dgr

bool QDepthmapView::loadFile(const QString &fileName)
{
    m_open_file_name = fileName;
    m_redraw_all = 1;
    // this fixes the problem on a MacOSX but NOT on Windows!
    QByteArray ba = fileName.toUtf8(); 
    char *file = ba.data();
    // end of fix
    if(pDoc->OnOpenDocument(file)) // quick fix for weird chars (russian filename bug report)
    {
        // removed 
    }
    return false;
}

The above fix was a quick a dirty thing I found online, it works on my MacOSX10.8, but it seems that Windows handles non-ASCII chars a bit differently and I'm not familiar with it.

I'm looking for a multi-platform solution (the software runs on Win, Mac and Linux) for loading non-ASCII filenames.

EDIT regarding comments below: OnOpenDocument goes to:

int QGraphDoc::OnOpenDocument(char* lpszPathName) 
{

   m_opened_name = QString(lpszPathName);

   int ok = m_meta_graph->read( lpszPathName );
// removed //

}

####
int read( const pstring& filename )
{
// cleared

#ifdef _WIN32
   ifstream stream( filename.c_str(), ios::binary | ios::in );
#else
   ifstream stream( filename.c_str(), ios::in );
#endif

//cleared

   stream.read( (char *) &version, sizeof( version ) );

// cleared
   }
####

template <class T>
istream& pmemvec<T>::read( istream& stream, streampos offset )
{
   if (offset != streampos(-1)) {
      stream.seekg( offset );
   }
   // READ / WRITE USES 32-bit LENGTHS (number of elements)
   // n.b., do not change this to size_t as it will cause 32-bit to 64-bit conversion problems
   unsigned int length;
   stream.read( (char *) &length, sizeof(unsigned int) );
   m_length = size_t(length);
   if (m_length >= storage_size()) {
      if (m_data) {
         delete [] m_data;
         m_data = NULL;
      }
      while (m_length >= storage_size())
         m_shift++;
      m_data = new T [storage_size()];
      if (!m_data)
         throw pexception( pexception::MEMORY_ALLOCATION, sizeof(T) * storage_size() );
   }
   if (m_length != 0) {
      stream.read( (char *) m_data, sizeof(T) * streamsize(m_length) );
   }
   return stream;
}
like image 759
OHTO Avatar asked Oct 16 '25 16:10

OHTO


1 Answers

Welcome to the wonderful world of Windows local encodings.

Windows internally works in UTF-16 (as QString does), but its "legacy" narrow-char APIs work with the "local codepage", which normally is the same as the system codepage (although it can customized on a per-thread basis - but no, it cannot be set to UTF-8 since Windows 10 1903 it's now possible!).

This means that most functions that work with chars passing them straight to Windows APIs (as normally happens with the C/C++ file facilities) expect strings encoded with the current codepage.

QString does support the toLocal8Bit method to provide a narrow-char representation of its content in the current system encoding, which should be the local CP on Windows and UTF-8 on any sanely configured UNIX.

Problem is, QString to UTF-8 is a lossless conversion, since they both can represent all the Unicode codepoints; QString to local codepage not so much - for example, Russian characters cannot be encoded in the usual Windows-1252 CP.

For this reason, using toLocal8Bit you can give the stream a file name in the encoding it expects, but you won't be able to open files that contain characters not included in the current codepage.

Long story short: the way that usually avoids any problem is to always keep paths as QString and open files with QFile. QFile deals internally with this insanity by calling the "widechar" versions of the Windows APIs with UTF-16 strings, and converting to UTF-8 as appropriate on UNIX systems.

If you really need to work with other file handling functions, you have two choices: either use toLocal8Bit and give up working on files with "non local" names on Windows, or provide a separated code path for Windows that works with wchar_ts (down to the wide-char version of C library and Windows API functions).

like image 86
Matteo Italia Avatar answered Oct 18 '25 06:10

Matteo Italia



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!