message_data_static

Discussion time!
Context:
In the past week or so, I have been working with @EllipticEllipsis to add "Message extraction support" to ZAPD. The goal was being able to handle every language/encoding OoT and MM supports.
The first problem was the "format codes" this game uses to be able to produce special things, like printing an item icon in the message, playing a sound, showing a highscore, printing a controller's button in text, etc. We decided to handle those with macros and the C preprocessor string concatenation. There are more problems here (mainly enums problems), but I'll list them at the end.
Then we had issues with japanese. Mainly the 2byte encoding used, which is shift-jis. If the text were exported as-is, then we have the problem of it not being easily editable. One solution would be just export the text as-is and tell everyone who want to edit it to open the file with shift-jis encoding instead of the default utf-8 (it is easy to do with vscode and probably a lot more editors too). Other possible solution would be convert the text to utf-8 during extraction and convert it back to shift-jis during compilation. We felt like this may be a bigger decision that should be discussed here.
After that, we wanted to extract messages from the iQue version too. The main problem of this version is the lack of pre-leaks documentation of this game, so we had to discover how it worked ourselves. We end up discovering the iQue versions use the "format codes" of the non-japanese encoding, and adds a few more to be able to handle the chinese characters as a two byte encoding (maybe somebody already knew this, idk). We currently don't know what encoding is being used in iQue, we only know that each 2-byte sequence is directly and sequentially mapped to each texture in the font file (which i named `cn_font_static`). If anybody knows something about the iQue encoding, let us know!
Then MM. The messaging system changed in MM, because of course it would be different. In OoT, each message is just kinda a raw string (`const char[]`) with a few format codes shenanigans. MM in the other side decided that each message needs a header before the actual message (which is slightly different for japanese vs non-japanese messages). MM also decided it needs like the of triple format codes/special characters, and not reuse any format code of OoT, so a whole new set of macros needs to be made for MM. Also, MM still needs the OoT macros/formatcodes because it uses them for the ending credits (`staff_message_data_static`). It isn't completely bad, but writing another bunch of dumb macros is tirng.
(I think it is funny that the iQue version is more similar to "normal" OoT than MM.)

Current state:
Currently, ZAPD is able to extract non-japanese text. For the foreign characters (ie, é or ü), ZAPD extracts them in a utf-8 compatible way. In the actual compilation phase, a small python script is used to convert those characters back to the corresponding format-code. This way, compiling `nes_message_data_static`, `fra_`, `ger_` and `staff_` does :OK: in the current OoT repo.
[add random screenshot]

ZAPD can extract japanese text right now too, but it is currently limited to extracting it as shift-jis, so external tools would be needed to properly mod those files. I wasn't able to test and see if the compilation would be :OK: since this file is not part of the PAL version of the game, but looking at the compiled .o file with vbindiff, it looks like that it should be :OK:.
[add japanese screenshot]

iQue is being extracted too, but still has the issue of using an unknown encoding.
[add chinese screenshot]

MM will need to add the headers message structs to their repo, but the extraction is working (I'm halfway of writing macros, but the text is legible).
[add MM screenshot]

Finally, current problems:
Finally, here is a list of problems that needs to be discussed:
1. Since the messages are extracted as `char[]` we can't use the enums we have for sfx, itemsids, etc as macro arguments.
2. Should we convert back and forth the japanese messages during extraction/compilation to utf-8? or would be a better solution to open those files in shift-jis?
3. iQue

Minor problems:
1. When compiling japanese, any 2bytes character which has the form `0xXX5C` (lower byte is `5C`), the `5C` part is omitted, the the rest of the current message is shifted. A workaround is escaping that character, but this is far from optimal.
2. Japanese has an unknown symbol at the very end of `jpn_message_data_static` which is not part of shift-jis. It is not used in normal gameplay (probably). The current workaround is a macro.

I really want to thank @EllipticEllipsis for taking the time to help me to take decisions, investigating encodings outside and inside the game, among others. Without his help I would have had a lot more troubles with japanese, and iQue wouldn't even be a possibility.