[DUG] Upgrading to XE - Unicode strings questions

Todd todd.martin.nz at gmail.com
Tue Nov 23 16:43:22 NZDT 2010


Hi John
> Extra question:
>
> It looks like code like
>
>      for i:=1 to length(string1) do
>      begin
>              DoSomethingWithOneChar(string1[i]);
>      end;
>
> cannot be used reliably.

I think the solution here is not to concentrate on unicode vs widechar 
vs ansichar, but rather on what DoSomethingWithOneChar() is actually 
trying to achieve.

Does the function even make sense for non-ANSI characters? Only a more 
concrete example can be discussed with meaning.

Todd.

>   The problems are that length(string1) looks like
> it cannot be safely used - as unicode characters may include 2 codepoints
> and length(string1) highlights that there is a difference between the number
> of unicode characters in a string and the number of codepoints.   Still
> figuring out what is the best practice here, as I have quite a lot of string
> routines.   Should be be OK as long as the unicode text actually is ASCII.
>
> Q2 – With XE do the .pas and .dfm files become unicode text and hence cannot
> be read by earlier Delphi, eg D2007 any more?
>
> Answer - Is a project option from what I have read?, yes not portable if
> unicode.
>
> Q3 – I do a lot of reading ascii data files, and writing back.   Using
> mainly TFilestream and stringlists.   Does this in general mean I will need
> to use file variables declared as Ansichar and AnsiString instead of Char
> and String?
> (I would prefer to use the standard VCL where possible)
>
> If I have variables
>          as1:Ansistring;
>          s2:string;
>
> Q4 –         if I do s2:=as1  does this convert ansistrings to unicode?
>
> Answer - yes, there are performance issues to watch out for if conversion
> happens a lot.
>
> Q5 – if I do as1:=s2 does this convert a unicode string to ansistring?
>
>      (otherwise how do I do this?)
>
> Answer - yes, there are performance issues to watch out for if conversion
> happens a lot.
>
> Q6 – I understand any code like
>
>              char1:=string1[i];
>              if char1 in [‘a’..’z’] then
>              begin
>                      message:=string[i]+’ - character is lowercase’;
>              end
>
>          will break, as ansi characters are ordinal (less than 256 or 512)
> and set comparisons ['a'..'z']  or ['a','b','c']    can be used, this set
> code cannot be used for unicode characters.   What is the replacement?
>
> Answer - There is CharInSet call and numerous extra housekeeping functions
> added in TCharacter.
>
> Q7 – do literals like  #13#10 still mean carriage return and linefeed?  #9
> means tab?
>          if I have code like (logline string1 string2 are string)
>
>          logline:=FormatDateTime(‘dd-mmm-yyyy hh:nn:ss’,now) + string1 +
> #13#10+#9 + string2;
>          ShowMessage(logline);
>          Button1.hint:=logline;
>          writeln(f,logline);
>
>          these work D5-D2007   - ie a 2 line messagebox text, 2 line hint,
> and 2 lines written to a log file.
>          is this still going to work?
>
>          do carriage returns/tabs/other control characters have to be defined
> differently, eg as constants?
>
> Answer - not figured out yet - anyone else know?
>
> Q8 – stringlist1.loadfromfile(‘Test1.txt’);
>          what happens if this file is ascii text being read into a stringlist
> which is unicode strings.
>
> Answer - Default is Ascii text for loadfromfile and savetofile, use
> overloaded routines for Unicode
>
> Q9 -   stringlist1.savetofile(‘Test1.txt’)
>           presumably this is no longer ascii text.   How do I save and read a
> stringlist to/from a file if it is to be Ansi text?
>
> Q10 – If there are complexities in Q8 and Q9 is there a TAnsiStringlist
> type (for ansistrings) as well as a unicode TStringlist type?
>          (I use stringlists a lot)
>
> Answer - unicodestring lists can save to ascii or unicode files, so
> TAnsiStringlist not needed.
>
> Q11 – do inifiles become unicode too?
>
> Answer - looks like no?  Not clear?  Anyone else know?
>
> Q12 – does Windows Notepad open unicode text files correctly?   or can it
> only be used on Ansi text files?
>
> Anyone know this?
>
> Q13 - It looks like most programmers editors read and write ascii and
> unicode encoding.....the one I use seems to distinguish between UTF-8 and
> unicode as well – what is the difference?
>
> Anyone know this?
>
> John
>
> _______________________________________________
> NZ Borland Developers Group - Delphi mailing list
> Post: delphi at delphi.org.nz
> Admin: http://delphi.org.nz/mailman/listinfo/delphi
> Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject:
> unsubscribe
>
> _______________________________________________
> NZ Borland Developers Group - Delphi mailing list
> Post: delphi at delphi.org.nz
> Admin: http://delphi.org.nz/mailman/listinfo/delphi
> Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: unsubscribe



More information about the Delphi mailing list