<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-NZ link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>@ Colin : No worries.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>@All : One other thing to point out is that when working with genuine, actual <u>Unicode</u> strings you should be careful to use the correct ANSI() functions... yes, you read that right.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> S := Uppercase(S);<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Will NOT convert Unicode characters (just as it would previously have not converted non-ASCII characters).<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'> S := ANSIUppercase( S );<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>On the other hand will. The same goes for the likes of SameText() vs ANSISameText() etc.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>If you were writing for extended character sets in the past you were most likely already using these routines, but if you weren’t (perhaps because “Delphi doesn’t support extended chars very well”) and are now thinking that by simply upgrading to a Unicode Delphi all such things are magically taken care of, you will be in for a shock.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Better yet, use the routines introduced in the <b>Character</b> unit (why not <b>UnicodeUtils</b>? DOH!)<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>The only problem you then have is if you want to write string handling/manipulating code that will be portable between Unicode and non-Unicode Delphi compilers.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> delphi-bounces@delphi.org.nz [mailto:delphi-bounces@delphi.org.nz] <b>On Behalf Of </b>Colin Johnsun<br><b>Sent:</b> Tuesday, 23 November 2010 15:22<br><b>To:</b> NZ Borland Developers Group - Delphi List<br><b>Subject:</b> Re: [DUG] Upgrading to XE - Unicode strings questions<o:p></o:p></span></p></div><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Doh! Thanks Jolyon for clearing that misunderstanding on my part. I was aware of the surrogate pair issue but I wrongly assumed that this might have been taken care by the iterator implementation. I guess not.<o:p></o:p></p><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>Thanks again!<o:p></o:p></p></div><div><p class=MsoNormal>Cheers,<o:p></o:p></p></div><div><p class=MsoNormal>Colin<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>On 23 November 2010 13:06, Jolyon Smith <<a href="mailto:jsmith@deltics.co.nz">jsmith@deltics.co.nz</a>> wrote:<o:p></o:p></p><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'>Colin, the for C in loop and the for i := 1 to Length() loops are functionally identical! The only difference is that the “for in” version incurs the slight overhead of the enumerator framework invoked by the compiler and runtime magic to support that syntax.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'>But in neither case will the loop itself help detect/respond to surrogate pairs (a single “WideChar” is potentially only ½ the data required to form a complete “<u>character</u>”). The only way to reduce an iterator over a string to a simple char-wise loop, whether explicit or using enumerators, is to first convert to UTF32, the facilities for which in the Delphi RTL are <cough> rudimentary, to put it politely. Non-existent may be nearer the mark.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'>The precise mechanics of the loop construct used is not material to that problem.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'>However, just as before Unicode when most people didn’t care and just wrote code that assumed ANSI==ASCII, these days people won’t care and will write code that assumes that Unicode==BMP (Basic Multilingual Plane), ignoring surrogate pairs just as they used to ignore extended ASCII and ANSI characters.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'>And for most people, that will probably actually work.</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;font-family:Wingdings;color:#1F497D'>J</span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span style='font-size:11.0pt;color:#1F497D'> </span><o:p></o:p></p><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b><span lang=EN-US style='font-size:10.0pt'>From:</span></b><span lang=EN-US style='font-size:10.0pt'> <a href="mailto:delphi-bounces@delphi.org.nz" target="_blank">delphi-bounces@delphi.org.nz</a> [mailto:<a href="mailto:delphi-bounces@delphi.org.nz" target="_blank">delphi-bounces@delphi.org.nz</a>] <b>On Behalf Of </b>Colin Johnsun<br><b>Sent:</b> Tuesday, 23 November 2010 14:31<br><b>To:</b> NZ Borland Developers Group - Delphi List</span><o:p></o:p></p><div><p class=MsoNormal><br><b>Subject:</b> Re: [DUG] Upgrading to XE - Unicode strings questions<o:p></o:p></p></div></div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;margin-bottom:12.0pt'>I won't answer everything but just on this one question:<o:p></o:p></p><div><div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>On 23 November 2010 11:04, John Bird <<a href="mailto:johnkbird@paradise.net.nz" target="_blank">johnkbird@paradise.net.nz</a>> wrote:<o:p></o:p></p><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Extra question:<br><br>It looks like code like<br><br> for i:=1 to length(string1) do<br> begin<br> DoSomethingWithOneChar(string1[i]);<br> end;<br><br>cannot be used reliably. The problems are that length(string1) looks like<br>it cannot be safely used - as unicode characters may include 2 codepoints<br>and length(string1) highlights that there is a difference between the number<br>of unicode characters in a string and the number of codepoints. Still<br>figuring out what is the best practice here, as I have quite a lot of string<br>routines. Should be be OK as long as the unicode text actually is ASCII.<o:p></o:p></p><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>you can use something like this:<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>var<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> C: Char;<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>...<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> for C in String1 do<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> begin<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> DoSomethingWithOneChar(C);<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> end;<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>In this case you don't need to know the index of each character, you just get the char using the for..in..do loop.<o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div><div><p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'> <o:p></o:p></p></div></div></div></div></div></div><p class=MsoNormal><br>_______________________________________________<br>NZ Borland Developers Group - Delphi mailing list<br>Post: <a href="mailto:delphi@delphi.org.nz">delphi@delphi.org.nz</a><br>Admin: <a href="http://delphi.org.nz/mailman/listinfo/delphi" target="_blank">http://delphi.org.nz/mailman/listinfo/delphi</a><br>Unsubscribe: send an email to <a href="mailto:delphi-request@delphi.org.nz">delphi-request@delphi.org.nz</a> with Subject: unsubscribe<o:p></o:p></p></div><p class=MsoNormal><o:p> </o:p></p></div></div></body></html>