Saturday, 18 September 2010

wstring vs. ustring

The problem with Windows cygwin gcc using 16 bits' strings instead of the hard coded 32 that I use is solved in the only way possible within C.

The problem is about the following: I rely on wide characters being signed, irrespective of what the local C implementation uses for wide characters. As a means to (in future) allow Unicode upper plane code points1) I also implement wide characters with signed integers (assuming them to use 4 byte). The normal wcs string functions – f.ex. the comparison wcscmp instead of strcmp for old narrow characters – varies according to implementation, so I provide my own counterparts in ucstr.h. They're not complete, but they work for my purposes, including others that aren't on the web.

Fine, so far – but since cygwin use 16 bits internally for wide chars, the wide string literals, f.ex. L"ψ⁶ Aur" 2) won't work for my personal 32 bits wide chars. Therefore I have to replace them with uchar array declarations like
   uchar _L_psi_e6_Aur[] = {L'ψ',L'⁶',L' ',L'A',L'u',L'r',0}.
and replacing all occurrences of L"ψ⁶ Aur" with _L_psi_e6_Aur. It's not beautiful, but it's the only way to make the code portable.


1)ponder f.ex. using a Mesopotamian Cuneiform glyph here, which is realistic, since there was Cuneiform star catalogues, f.ex. MUL.APIN, then such a Cuneiform glyph is implemented in plane 1 of Unicode, which needs 21 bits, which is so odd that 32 bits is provided by integer is the best match,
2)a star in the eastern Auriga, formerly Telescopium Herscheli

A crucial question: why not use another programming language than C. Yes, why not, but the languages not to use for this kind of application are Java and C# (and all .NET languages), because they require loading of a heavy virtual machine that are not used for the conversion process. Other not-to-use-for-this languages are JavaScript and PHP, because of their web oriented implementations. Relevant languages are Python and C++. The current implementation will stick to C because of lib spinoffs that can be used in implementation of programming language experiments of mine – C++ won't do for parallel programming – until a full set of star maps can be generated: nothing is impossible in C, but many solutions require ugly-like-h*ck code.

No comments: