[DUG] Upgrading to XE - Unicode strings questions
Ross Levis
ross at stationplaylist.com
Wed Nov 24 15:59:44 NZDT 2010
It's a shame UTF-8 wasn't made the standard in Delphi. It's commonly used in audio file tags, for example, which I have to deal with.
My software needs to search for songs with specific artists or titles, and it sounds like I'm going to have problems where the information is visually the same but entered differently in different parts of the world, using all sorts of 3rd party software.
Ross.
-----Original Message-----
From: delphi-bounces at delphi.org.nz [mailto:delphi-bounces at delphi.org.nz] On Behalf Of Todd
Sent: Wednesday, 24 November 2010 11:27 AM
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Hi John
You can find out whether a unicode string is inside the BMP by
converting it to UTF-32 and checking that the new string is twice the
length of the original (UTF-16) string.
> A user could specifically choose to enter that character in either form - this is unlikely, yes. Or, two users using the same codepage could choose to enter the character differently.
>
> Or if your data is coming from two separate external sources.
>
> The *only* way to be sure is to normalise before processing.
>
Agreed. That will eliminate any issues with composite codepoints.
>> You only ever get issues if you cross codepage boundaries
>> (like for example if you have users in different countries
>> storing data in a database - which is why international
>> databases often use UTF-8 to store data instead of their
>> native charactersets).
>>
> This makes no sense at all to me.
>
> "ö" encoded as #$006F + #$0308 **OR** #$00f6 even in UTF-8. Whether you encode using UTF-8, UTF-16 or UTF-32, a single accented character codepoint vs a character followed by a diacritic are still two distinct "character" sequences.
>
True. I think the point is that UTF-8 is the most compact format without
data loss, regardless of whether the codepoints are composite or not.
Todd.
_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi at delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: unsubscribe
More information about the Delphi
mailing list