[DUG] Upgrading to XE - Unicode strings questions

Colin Johnsun colin.adug at gmail.com
Tue Nov 23 15:22:03 NZDT 2010


Doh! Thanks Jolyon for clearing that misunderstanding on my part. I was
aware of the surrogate pair issue but I wrongly assumed that this might have
been taken care by the iterator implementation. I guess not.

Thanks again!
Cheers,
Colin

On 23 November 2010 13:06, Jolyon Smith <jsmith at deltics.co.nz> wrote:

> Colin, the for C in loop and the for i := 1 to Length() loops are
> functionally identical!  The only difference is that the “for in” version
> incurs the slight overhead of the enumerator framework invoked by the
> compiler and runtime magic to support that syntax.
>
>
>
> But in neither case will the loop itself help detect/respond to surrogate
> pairs (a single “WideChar” is potentially only ½ the data required to form a
> complete “*character*”).  The only way to reduce an iterator over a string
> to a simple char-wise loop, whether explicit or using enumerators, is to
> first convert to UTF32, the facilities for which in the Delphi RTL are
> <cough> rudimentary, to put it politely.  Non-existent may be nearer the
> mark.
>
>
>
> The precise mechanics of the loop construct used is not material to that
> problem.
>
>
>
>
>
> However, just as before Unicode when most people didn’t care and just wrote
> code that assumed ANSI==ASCII, these days people won’t care and will write
> code that assumes that Unicode==BMP (Basic Multilingual Plane), ignoring
> surrogate pairs just as they used to ignore extended ASCII and ANSI
> characters.
>
>
>
> And for most people, that will probably actually work.
>
>
>
> J
>
>
>
>
>
> *From:* delphi-bounces at delphi.org.nz [mailto:delphi-bounces at delphi.org.nz]
> *On Behalf Of *Colin Johnsun
> *Sent:* Tuesday, 23 November 2010 14:31
> *To:* NZ Borland Developers Group - Delphi List
>
> *Subject:* Re: [DUG] Upgrading to XE - Unicode strings questions
>
>
>
> I won't answer everything but just on this one question:
>
> On 23 November 2010 11:04, John Bird <johnkbird at paradise.net.nz> wrote:
>
> Extra question:
>
> It looks like code like
>
>    for i:=1 to length(string1) do
>    begin
>            DoSomethingWithOneChar(string1[i]);
>    end;
>
> cannot be used reliably.   The problems are that length(string1) looks like
> it cannot be safely used - as unicode characters may include 2 codepoints
> and length(string1) highlights that there is a difference between the
> number
> of unicode characters in a string and the number of codepoints.   Still
> figuring out what is the best practice here, as I have quite a lot of
> string
> routines.   Should be be OK as long as the unicode text actually is ASCII.
>
>
>
>
>
> you can use something like this:
>
>
>
> var
>
>   C: Char;
>
> ...
>
>   for C in String1 do
>
>   begin
>
>     DoSomethingWithOneChar(C);
>
>   end;
>
>
>
> In this case you don't need to know the index of each character, you just
> get the char using the for..in..do loop.
>
>
>
>
>
>
>
> _______________________________________________
> NZ Borland Developers Group - Delphi mailing list
> Post: delphi at delphi.org.nz
> Admin: http://delphi.org.nz/mailman/listinfo/delphi
> Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject:
> unsubscribe
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserver.123.net.nz/pipermail/delphi/attachments/20101123/47917276/attachment.html 


More information about the Delphi mailing list