Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- >>66575417
- You're overcomplicating it. I tried to solve it in my head, and this is what I came up with. It should be good enough. Haven't tested this code, since I can't be bothered with the pregeneration, but it should work.
- static[] is a big ass const char array, it should look something like this
- "LATIN CAPITAL LETTER A\0\x94B\0\x94C\0" {...} "Z\0LEFT SQUARE BRACKET\0REVERSE SOLIDUS\0\x81IGHT SQUARE BRACKET\0CIRCUMFLEX ACCENT\0LOW LINE\0GRAVE ACCENT\0LATIN SMALL LETTER A\0\x82B\0\x82C\0" {...}
- my notes:
- [code]
- type - full or continue
- if continue, last string, overwrite from n, include null term
- type = full (0000 0000) or continue (1xxx xxxx) -- write from n-bin(10000000)
- [/code]
- code:
- [code]
- #define HIGH 0x80
- #define HARDCODED_VALUE (1024*1024)
- char* data_start = malloc(HARDCODED_VALUE);
- char* data = data_start;
- const char* static = static_start;
- if (data[0] > HIGH)
- { unsigned char n = data[0] - HIGH;
- strcpy(data, static);
- static += strlen(static);
- strcpy(data+n, static);
- } else if (data [0] != '\0')
- { static += strlen(static);
- strcpy(data, static);
- } else //null terminator
- { break;
- }
- size_t len = data-data_start; //replace HARDCODED_VALUE with this+1
- [/code]
- To index, first transform the codepoint so it's contiguous (e.g. equals line number in https://unicode.org/Public/UNIDATA/UnicodeData.txt), then do
- [code]
- const char* index(const char* data, unsigned int n)
- {
- while (--codepoint, data += strlen(data))
- ;
- return data;
- }
- [/code]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement