Hello.
The expertise is here, that’s why I am asking about this, here.
Ok, so I have ported an old *nix program to mac os x, and used some libraries, to make it handle utf-8, and use utf16 internally, in order be able to use the ICU library for sorting by the collaction, as defined by a users locale.
I have some issues:
I am using 64 bit widechars, and then I guess things are converted to utf32, but that is something I am going to overlook for now. There are libraries performing the conversion from multibyte to wide characters, and they may be different on each platform, so I really know nothing about how the wcs (widecharacter) will be represented.
I have read up on this
I guess that, when I read characters (multibyte, where several bytes are used to represent one character outside the ascii 7-bit range) from a file into wide characters, then there will be no errors regardless, as long as the buffer for the wide characters holds the same size as the length of buffer with bytes.
Converting from widecharacters to multibyte: I reckon that if I allocate 4 times the size of widecharacters, regardless, then I will be good, in all situations.
And, regardless of the size of the widecharacter: if I allocate 4 times the length for the utf16-buffer when I convert to utf-16 from widechars, then I shall be good at all moments.
Conversely, when I convert from utf-16 to widechars then I will allocate a buffer the length of the utf-16 string times 4 (for the case that the widechar is just one byte wide, which is legal by the standard.)
(wchar may vary widely in size, on different platforms, and what kind of encoding it is used to represent will vary with the width of it, as I have understood).
When it comes to the multibytes from the file into utf16, then I guess the same size is needed, that is an utf16 string will never be larger than a multibyte string.
From utf-16, into multibytes (that represent utf-8) I will use a buffersize 4 times the length of the utf16 string to be good at all times.
If someone can see any errors in this, I’ll be delighted!