p0z3r

Wednesday, November 09, 2005

Py_UNICODE_SIZE

Unicode continues to be a learning experience. There was a recent blog post in which a single comment brought to light certain shortcomings in the SK input box. Of those, the big one was unicode output was broken. It was outputting char type instead of unicode type.

To give you some background on what is happening, you can read this mailing list post:
http://mail.python.org/pipermail/python-dev/2005-May/053264.html

Needless to say Alex and I ended up spending some time on this and figured out there is some differences between how some distributions define Py_UNICODE_SIZE. On our systems (gentoo and opensuse), it is defined as 4 bytes. Some other distributions *cough, Mandriva, cough* define it as 2 bytes. I think we finally have it nailed down, but if anyone is running SVN 3.5 branch, please update and report any bugs you may find with it. Btw, if you want to see for yourself what your system has defined, just grep Py_UNICODE_SIZE pyconfig.h on your system.

Not only will this hopefully fix the unicode problems, but it will also allow Kaddressamba to be a reality when 3.5 ships.

2 Comments:

  • Hmm...
    more /usr/include/python2.4/pyconfig.h | grep Py_UNICODE_SIZE
    gives 2 here on Slackware.

    (It's upped to 10.2 from 10.0 by "slapt-get --distupgrade" so not sure if the problem is still in current.)

    What arguments do I have to present to THE PAT (a.k.a "The One" :) ) to make the shift. I am fairly clueless, but there was some reason for it to be 2, right?

    By Daniel "Suslik" D., at 12:05 AM  

  • @daniel:
    I'm not sure of why exactly some choose 2 or 4, but regardless we have a fix that seems to work for both now. We had one person test it on Mandriva(which uses 2) and it seems to work fine there. This means you should be okay to go forward with your plans for Kaddressamba.

    By p0z3r, at 1:03 AM  

Post a Comment

<< Home