<HTML><HEAD></HEAD>

<BODY dir=ltr>

<DIV dir=ltr>

<DIV style="FONT-FAMILY: 'Arial'; COLOR: #000000; FONT-SIZE: 10pt">

<DIV>As I understand it iterating over a string with Chars does get around the 

problem of surrogate pairs, as any character you are currently on might be 

either 1,2 or more bytes if it contains surrogate pairs, but just one unicode 

character.&nbsp;&nbsp; So if one is after iterating over the characters in the 

string your code should be perfect.</DIV>

<DIV>&nbsp;</DIV>

<DIV>My question is if you are not using <FONT face=Calibri><FONT 

style="FONT-SIZE: 12pt">&nbsp; for C in String1 do and want to use&nbsp;&nbsp; 

</FONT></FONT></DIV>

<DIV><FONT face=Calibri><FONT style="FONT-SIZE: 12pt">for i:=1 to 

length(string1) do</FONT></FONT></DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV><FONT size=3 face=Calibri>what do&nbsp; you use instead of length to get 

the number of characters in the string in general?&nbsp; length is not the 

number of characters, its the umber of code-points (including surrogate pairs 

counted separately)&nbsp; if I understand correctly.</FONT></DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV><FONT size=3 face=Calibri>Separate issue - I understand that if one wants 

to iterate over the bytes of a string then one uses byte rather than char, and 

then one does have to investigate each byte to see if it is part of a surrogate 

pair.&nbsp; There look to be routines for this – however I am guessing most 

won’t be needing to do this. Fortunately!</FONT></DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV><FONT size=3 face=Calibri>Also – I think&nbsp; getting what we used to call 

the ASCII value of a character, or creating a character still works the 

same-&nbsp; in fact for english alphabet the codes are the same I 

understand?&nbsp; Can someone confirm.&nbsp;&nbsp; (ie the character might use 2 

bytes if encoded as unicode string, but the value stored for ‘A’ is still 41 hex 

or 65 decimal.&nbsp;&nbsp; Which means I think that one can do</FONT></DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV><FONT size=3 face=Calibri>code1,code2:integer;</FONT></DIV>

<DIV><FONT size=3 face=Calibri>char1:ansichar;</FONT></DIV>

<DIV><FONT size=3 face=Calibri>char2:char;</FONT></DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV>&nbsp;&nbsp;&nbsp; <FONT size=3 face=Calibri>char1:=’A’;</FONT></DIV>

<DIV>&nbsp;&nbsp;&nbsp; <FONT size=3 

face=Calibri>char2:=’A’;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

//unicode char 2 bytes</FONT></DIV>

<DIV>&nbsp;&nbsp;&nbsp; <FONT size=3 

face=Calibri>code1:=ord(char1);</FONT></DIV>

<DIV>&nbsp;&nbsp;&nbsp; <FONT size=3 

face=Calibri>code2:=ord(char2);</FONT></DIV>

<DIV><FONT size=3 face=Calibri></FONT>&nbsp;</DIV>

<DIV><FONT size=3 face=Calibri>in this case I think code1=code2 ??&nbsp; anyone 

confirm this.&nbsp;&nbsp; Of course once one goes away from English/latin 8859 

characters this is no longer going to be true.</FONT></DIV>

<DIV><BR></DIV>

<DIV>&nbsp;</DIV>

<DIV style="FONT-FAMILY: 'Arial'; COLOR: #000000; FONT-SIZE: 10pt">John</DIV>

<DIV style="FONT-FAMILY: 'Arial'; COLOR: #000000; FONT-SIZE: 10pt">

<DIV 

style="FONT-STYLE: normal; DISPLAY: inline; FONT-FAMILY: 'Calibri'; COLOR: #000000; FONT-SIZE: small; FONT-WEIGHT: normal; TEXT-DECORATION: none"><FONT 

size=2 face=Arial></FONT></DIV>&nbsp;</DIV>

<DIV 

style="FONT-STYLE: normal; DISPLAY: inline; FONT-FAMILY: 'Calibri'; COLOR: #000000; FONT-SIZE: small; FONT-WEIGHT: normal; TEXT-DECORATION: none">Doh! 

Thanks Jolyon for clearing that misunderstanding on my part. I was aware of the 

surrogate pair issue but I wrongly assumed that this might have been taken care 

by the iterator implementation. I guess not. 

<DIV>&nbsp;</DIV>

<DIV>Thanks again!</DIV>

<DIV>Cheers,</DIV>

<DIV>Colin</DIV>

<DIV>

<DIV>&nbsp;</DIV>

<DIV class=gmail_quote>On 23 November 2010 13:06, Jolyon Smith <SPAN 

dir=ltr>&lt;<A 

href="mailto:jsmith@deltics.co.nz">jsmith@deltics.co.nz</A>&gt;</SPAN> 

wrote:<BR>

<BLOCKQUOTE 

style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" 

class=gmail_quote>

  <DIV lang=EN-NZ vlink="purple" link="blue">

  <DIV>

  <P class=MsoNormal><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt">Colin, the 

  for C in loop and the for i := 1 to Length() loops are functionally 

  identical!&nbsp; The only difference is that the “for in” version incurs the 

  slight overhead of the enumerator framework invoked by the compiler and 

  runtime magic to support that syntax.</SPAN></P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <P class=MsoNormal><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt">But in 

  neither case will the loop itself help detect/respond to surrogate pairs (a 

  single “WideChar” is potentially only ½ the data required to form a complete 

  “<U>character</U>”).&nbsp; The only way to reduce an iterator over a string to 

  a simple char-wise loop, whether explicit or using enumerators, is to first 

  convert to UTF32, the facilities for which in the Delphi RTL are &lt;cough&gt; 

  rudimentary, to put it politely.&nbsp; Non-existent may be nearer the 

  mark.</SPAN></P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <P class=MsoNormal><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt">The precise 

  mechanics of the loop construct used is not material to that 

  problem.</SPAN></P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <P class=MsoNormal><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt">However, just 

  as before Unicode when most people didn’t care and just wrote code that 

  assumed ANSI==ASCII, these days people won’t care and will write code that 

  assumes that Unicode==BMP (Basic Multilingual Plane), ignoring surrogate pairs 

  just as they used to ignore extended ASCII and ANSI characters.</SPAN></P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <P class=MsoNormal><SPAN style="COLOR: #1f497d; FONT-SIZE: 11pt">And for most 

  people, that will probably actually work.</SPAN></P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <P class=MsoNormal><SPAN 

  style="FONT-FAMILY: wingdings; COLOR: #1f497d; FONT-SIZE: 11pt">J</SPAN><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN></P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <P class=MsoNormal><SPAN 

  style="COLOR: #1f497d; FONT-SIZE: 11pt"></SPAN>&nbsp;</P>

  <DIV 

  style="BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0cm; PADDING-LEFT: 0cm; PADDING-RIGHT: 0cm; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt">

  <P class=MsoNormal><B><SPAN style="FONT-SIZE: 10pt" 

  lang=EN-US>From:</SPAN></B><SPAN style="FONT-SIZE: 10pt" lang=EN-US> <A 

  href="mailto:delphi-bounces@delphi.org.nz" 

  target=_blank>delphi-bounces@delphi.org.nz</A> [mailto:<A 

  href="mailto:delphi-bounces@delphi.org.nz" 

  target=_blank>delphi-bounces@delphi.org.nz</A>] <B>On Behalf Of </B>Colin 

  Johnsun<BR><B>Sent:</B> Tuesday, 23 November 2010 14:31<BR><B>To:</B> NZ 

  Borland Developers Group - Delphi List</SPAN></P>

  <DIV class=im><BR><B>Subject:</B> Re: [DUG] Upgrading to XE - Unicode strings 

  questions</DIV></DIV>

  <P class=MsoNormal>&nbsp;</P>

  <P style="MARGIN-BOTTOM: 12pt" class=MsoNormal>I won't answer everything but 

  just on this one question:</P>

  <DIV>

  <DIV></DIV>

  <DIV class=h5>

  <DIV>

  <P class=MsoNormal>On 23 November 2010 11:04, John Bird &lt;<A 

  href="mailto:johnkbird@paradise.net.nz" 

  target=_blank>johnkbird@paradise.net.nz</A>&gt; wrote:</P>

  <P class=MsoNormal>Extra question:<BR><BR>It looks like code 

  like<BR><BR>&nbsp;&nbsp; for i:=1 to length(string1) do<BR>&nbsp;&nbsp; 

  begin<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

  DoSomethingWithOneChar(string1[i]);<BR>&nbsp;&nbsp; end;<BR><BR>cannot be used 

  reliably.&nbsp;&nbsp; The problems are that length(string1) looks like<BR>it 

  cannot be safely used - as unicode characters may include 2 codepoints<BR>and 

  length(string1) highlights that there is a difference between the number<BR>of 

  unicode characters in a string and the number of codepoints.&nbsp;&nbsp; 

  Still<BR>figuring out what is the best practice here, as I have quite a lot of 

  string<BR>routines.&nbsp;&nbsp; Should be be OK as long as the unicode text 

  actually is ASCII.</P>

  <DIV>

  <P class=MsoNormal>&nbsp;</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;</P></DIV>

  <DIV>

  <P class=MsoNormal>you can use something like this:</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;</P></DIV>

  <DIV>

  <P class=MsoNormal>var</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp; C: Char;</P></DIV>

  <DIV>

  <P class=MsoNormal>...</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp; for C in String1 do</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp; begin</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;&nbsp;&nbsp; DoSomethingWithOneChar(C);</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp; end;</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;</P></DIV>

  <DIV>

  <P class=MsoNormal>In this case you don't need to know the index of each 

  character, you just get the char using the for..in..do loop.</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;</P></DIV>

  <DIV>

  <P class=MsoNormal>&nbsp;</P></DIV>

  <DIV>

  <P 

  class=MsoNormal>&nbsp;</P></DIV></DIV></DIV></DIV></DIV></DIV><BR>_______________________________________________<BR>NZ 

  Borland Developers Group - Delphi mailing list<BR>Post: <A 

  href="mailto:delphi@delphi.org.nz">delphi@delphi.org.nz</A><BR>Admin: <A 

  href="http://delphi.org.nz/mailman/listinfo/delphi" 

  target=_blank>http://delphi.org.nz/mailman/listinfo/delphi</A><BR>Unsubscribe: 

  send an email to <A 

  href="mailto:delphi-request@delphi.org.nz">delphi-request@delphi.org.nz</A> 

  with Subject: unsubscribe<BR></BLOCKQUOTE></DIV>

<DIV>&nbsp;</DIV></DIV>

<P>

<HR>

_______________________________________________<BR>NZ Borland Developers Group - 

Delphi mailing list<BR>Post: delphi@delphi.org.nz<BR>Admin: 

http://delphi.org.nz/mailman/listinfo/delphi<BR>Unsubscribe: send an email to 

delphi-request@delphi.org.nz with Subject: 

unsubscribe</DIV></DIV></DIV></BODY></HTML>