<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-NZ link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Colin, the for C in loop and the for i := 1 to Length() loops are functionally identical! The only difference is that the “for in” version incurs the slight overhead of the enumerator framework invoked by the compiler and runtime magic to support that syntax.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>But in neither case will the loop itself help detect/respond to surrogate pairs (a single “WideChar” is potentially only ½ the data required to form a complete “<u>character</u>”). The only way to reduce an iterator over a string to a simple char-wise loop, whether explicit or using enumerators, is to first convert to UTF32, the facilities for which in the Delphi RTL are <cough> rudimentary, to put it politely. Non-existent may be nearer the mark.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>The precise mechanics of the loop construct used is not material to that problem.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>However, just as before Unicode when most people didn’t care and just wrote code that assumed ANSI==ASCII, these days people won’t care and will write code that assumes that Unicode==BMP (Basic Multilingual Plane), ignoring surrogate pairs just as they used to ignore extended ASCII and ANSI characters.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>And for most people, that will probably actually work.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:Wingdings;color:#1F497D'>J</span><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> delphi-bounces@delphi.org.nz [mailto:delphi-bounces@delphi.org.nz] <b>On Behalf Of </b>Colin Johnsun<br><b>Sent:</b> Tuesday, 23 November 2010 14:31<br><b>To:</b> NZ Borland Developers Group - Delphi List<br><b>Subject:</b> Re: [DUG] Upgrading to XE - Unicode strings questions<o:p></o:p></span></p></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal style='margin-bottom:12.0pt'>I won't answer everything but just on this one question:<o:p></o:p></p><div><p class=MsoNormal>On 23 November 2010 11:04, John Bird <<a href="mailto:johnkbird@paradise.net.nz">johnkbird@paradise.net.nz</a>> wrote:<o:p></o:p></p><p class=MsoNormal>Extra question:<br><br>It looks like code like<br><br> for i:=1 to length(string1) do<br> begin<br> DoSomethingWithOneChar(string1[i]);<br> end;<br><br>cannot be used reliably. The problems are that length(string1) looks like<br>it cannot be safely used - as unicode characters may include 2 codepoints<br>and length(string1) highlights that there is a difference between the number<br>of unicode characters in a string and the number of codepoints. Still<br>figuring out what is the best practice here, as I have quite a lot of string<br>routines. Should be be OK as long as the unicode text actually is ASCII.<o:p></o:p></p><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>you can use something like this:<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>var<o:p></o:p></p></div><div><p class=MsoNormal> C: Char;<o:p></o:p></p></div><div><p class=MsoNormal>...<o:p></o:p></p></div><div><p class=MsoNormal> for C in String1 do<o:p></o:p></p></div><div><p class=MsoNormal> begin<o:p></o:p></p></div><div><p class=MsoNormal> DoSomethingWithOneChar(C);<o:p></o:p></p></div><div><p class=MsoNormal> end;<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>In this case you don't need to know the index of each character, you just get the char using the for..in..do loop.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal> <o:p></o:p></p></div></div></div></body></html>