F.E.I.S was my first large personal project. It's a cross-platform desktop app. It is written in C++ and uses SFML and Dear ImGui as its main libraries.
"Special characters" in file names and paths have never worked well in my code and it took me really, really long to learn and understand why that was the case, and also to find a solution I like.
How I got there
~ Denial ~
I like std::filesystem
's API, it reminds my of the warmth and sweetness of my native pathlib
in Python.
I wanted to use std::filesystem::path
in F.E.I.S to :
- Twiddle and store file names and paths
- Open, read and write files
Doing these seemingly simple things while keeping my code cross-platform was such a fucking NIGHTMARE that it fueled me with enough motivation to write this blog post.
On Linux (OS in which I code and test), everything works fine, I can open a file whose path looks like this :
/home/syméon/charts/ありふれたせかいせいふく.memon
And it works, it opens, it reads it fine, everything works. (notice the é
and the japanese text)
But on Windows, SOL. I get told the file either does not exist or is completely empty.
The Roots of Evil (Encoding)
~ Anger ~
It's kind of a running gag in my blog posts, but the problem at the very heart of all of this is, yet again, encoding problems and Windows shenanigans.
From what I can tell, C++'s standard library just leaves you for dead if you try to do anything concrete with it, as usual.
No matter which way you choose to open a file in C++ (fstream
or fopen
), behind that, your implementation of the standard library has to use your OS's API to actually open the file for you.
I'm not going to pretend like I actually know what is precisely going on here, but in general, there are close to no issues here on Linux :
- you can store paths and file names encoded as utf-8 in
std::string
s - if you do,
std::string
↔std::filesystem::path
conversions work fine - utf-8 works pretty ok up and down Linux, from the GUI down to the syscall
On Windows, it's a nice glass shard salad :
- you can store paths and file names encoded as utf-8 in
std::string
s, in the same way you can gobble up sand if you feel like it. - if you do,
std::string
↔std::filesystem::path
conversions won't work the way you expect, unless you go through very deliberate efforts to circumvent problems. - even if you manage to fix that, Windows has two file APIs, only one of which actually handles "special characters", and it looks like none of the standard lib implementations use it for some reason. This means it's litterally impossible to open a file with special characters in its path by passing a
std::string
to another part of C++'s standard lib
I hurts so bad to see a language which takes itself as seriously as C++ fumble the bag this hard. The official "solution" suggested in C++ is to have some ugly Windows-only code that has to work with concepts that only make sense in MicrosoftLand and don't have any reason to exist elsewhere like converting or handling "wide" and "narrow" strings etc ...
Anyway, please bore someone else with your bad design choices. We are in ${current_year}
, utf-8 everywhere or nothing, I won't be taking comments.
My solution
~ Bargaining ~
When you have to use std::string
s that contain file names and paths, make sure they actually hold utf-8.
Add these two tiny functions to you code to correctly convert between std::string
and std::filesystem::path
:
std::filesystem::path string_to_path(const std::string& utf8s) {
const std::u8string u8s{utf8s.cbegin(), utf8s.cend()};
return std::filesystem::path{u8s};
}
std::string path_to_string(const std::filesystem::path& path) {
const auto u8s = path.u8string();
std::string result{u8s.cbegin(), u8s.cend()};
return result;
}
They both assume your std::string
s contain utf-8 and route conversions via std::u8string
s to force the "right" conversions, no matter the OS.
Re-read every single piece of your code where you handle std::string
s or std::filesystem::path
s and modify it to use these functions if needs be (i.e. pretty much always).
Store and handle std::filesystem::path
s in your code, not std::string
s.
Use the nowide library to open your files, I recommend the standalone branch to not have to drag all of fucking boost with it.
To open a file, give nowide
your std::filesystem::path
s converted back to std::string
s with your newly added conversion routines.
Patch the other libs you use so they also use nowide
to open files, easier said than done.
So
~ Depression and Acceptance ~
C++ is a huge joke, if you code in C++ you have some kind of ego problem, you want to prove something to the world. I'm trying to heal from that.