[DUG] Upgrading to XE - Unicode strings questions

Tue Nov 23 16:36:21 NZDT 2010

Hi John
> Extra question:
>
> It looks like code like
>
>      for i:=1 to length(string1) do
>      begin
>              DoSomethingWithOneChar(string1[i]);
>      end;
>
> cannot be used reliably.

I think the solution here is to not to concentrate so much on unicode, 
but rather on what DoSomethingWithOneChar() is trying to achieve.
Does the function even make sense for non-ANSI characters?

Todd.

> The problems are that length(string1) looks like
> it cannot be safely used - as unicode characters may include 2 codepoints
> and length(string1) highlights that there is a difference between the number
> of unicode characters in a string and the number of codepoints.   Still
> figuring out what is the best practice here, as I have quite a lot of string
> routines.   Should be be OK as long as the unicode text actually is ASCII.
>
> Q2 – With XE do the .pas and .dfm files become unicode text and hence cannot
> be read by earlier Delphi, eg D2007 any more?
>
> Answer - Is a project option from what I have read?, yes not portable if
> unicode.
>
> Q3 – I do a lot of reading ascii data files, and writing back.   Using
> mainly TFilestream and stringlists.   Does this in general mean I will need
> to use file variables declared as Ansichar and AnsiString instead of Char
> and String?
> (I would prefer to use the standard VCL where possible)
>
> If I have variables
>          as1:Ansistring;
>          s2:string;
>
> Q4 –         if I do s2:=as1  does this convert ansistrings to unicode?
>
> Answer - yes, there are performance issues to watch out for if conversion
> happens a lot.
>
> Q5 – if I do as1:=s2 does this convert a unicode string to ansistring?
>
>      (otherwise how do I do this?)
>
> Answer - yes, there are performance issues to watch out for if conversion
> happens a lot.
>
> Q6 – I understand any code like
>
>              char1:=string1[i];
>              if char1 in [‘a’..’z’] then
>              begin
>                      message:=string[i]+’ - character is lowercase’;
>              end
>
>          will break, as ansi characters are ordinal (less than 256 or 512)
> and set comparisons ['a'..'z']  or ['a','b','c']    can be used, this set
> code cannot be used for unicode characters.   What is the replacement?
>
> Answer - There is CharInSet call and numerous extra housekeeping functions
> added in TCharacter.
>
> Q7 – do literals like  #13#10 still mean carriage return and linefeed?  #9
> means tab?
>          if I have code like (logline string1 string2 are string)
>
>          logline:=FormatDateTime(‘dd-mmm-yyyy hh:nn:ss’,now) + string1 +
> #13#10+#9 + string2;
>          ShowMessage(logline);
>          Button1.hint:=logline;
>          writeln(f,logline);
>
>          these work D5-D2007   - ie a 2 line messagebox text, 2 line hint,
> and 2 lines written to a log file.
>          is this still going to work?
>
>          do carriage returns/tabs/other control characters have to be defined
> differently, eg as constants?
>
> Answer - not figured out yet - anyone else know?
>
> Q8 – stringlist1.loadfromfile(‘Test1.txt’);
>          what happens if this file is ascii text being read into a stringlist
> which is unicode strings.
>
> Answer - Default is Ascii text for loadfromfile and savetofile, use
> overloaded routines for Unicode
>
> Q9 -   stringlist1.savetofile(‘Test1.txt’)
>           presumably this is no longer ascii text.   How do I save and read a
> stringlist to/from a file if it is to be Ansi text?
>
> Q10 – If there are complexities in Q8 and Q9 is there a TAnsiStringlist
> type (for ansistrings) as well as a unicode TStringlist type?
>          (I use stringlists a lot)
>
> Answer - unicodestring lists can save to ascii or unicode files, so
> TAnsiStringlist not needed.
>
> Q11 – do inifiles become unicode too?
>
> Answer - looks like no?  Not clear?  Anyone else know?
>
> Q12 – does Windows Notepad open unicode text files correctly?   or can it
> only be used on Ansi text files?
>
> Anyone know this?
>
> Q13 - It looks like most programmers editors read and write ascii and
> unicode encoding.....the one I use seems to distinguish between UTF-8 and
> unicode as well – what is the difference?
>
> Anyone know this?
>
> John
>
> _______________________________________________
> NZ Borland Developers Group - Delphi mailing list
> Post: delphi at delphi.org.nz
> Admin: http://delphi.org.nz/mailman/listinfo/delphi
> Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject:
> unsubscribe
>
> _______________________________________________
> NZ Borland Developers Group - Delphi mailing list
> Post: delphi at delphi.org.nz
> Admin: http://delphi.org.nz/mailman/listinfo/delphi
> Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: unsubscribe