[DUG] Upgrading to XE - Unicode strings questions

Tue Nov 23 13:27:10 NZDT 2010

Just thought I would chime in that I'm really interested in the answers to these questions too (Unicode being something we are also a bit apprehensive of).

-----Original Message-----
From: delphi-bounces at delphi.org.nz [mailto:delphi-bounces at delphi.org.nz] On Behalf Of John Bird
Sent: Tuesday, 23 November 2010 1:04 p.m.
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions

Thanks for the references, so I can answer most of the questions now. 
Here is what I understand so far, if anyone has anything to add this will be 
useful!

Extra question:

It looks like code like

    for i:=1 to length(string1) do
    begin
            DoSomethingWithOneChar(string1[i]);
    end;

cannot be used reliably.   The problems are that length(string1) looks like 
it cannot be safely used - as unicode characters may include 2 codepoints 
and length(string1) highlights that there is a difference between the number 
of unicode characters in a string and the number of codepoints.   Still 
figuring out what is the best practice here, as I have quite a lot of string 
routines.   Should be be OK as long as the unicode text actually is ASCII.

Q2 – With XE do the .pas and .dfm files become unicode text and hence cannot
be read by earlier Delphi, eg D2007 any more?

Answer - Is a project option from what I have read?, yes not portable if 
unicode.

Q3 – I do a lot of reading ascii data files, and writing back.   Using
mainly TFilestream and stringlists.   Does this in general mean I will need
to use file variables declared as Ansichar and AnsiString instead of Char
and String?
(I would prefer to use the standard VCL where possible)

If I have variables
        as1:Ansistring;
        s2:string;

Q4 –         if I do s2:=as1  does this convert ansistrings to unicode?

Answer - yes, there are performance issues to watch out for if conversion 
happens a lot.

Q5 – if I do as1:=s2 does this convert a unicode string to ansistring?

    (otherwise how do I do this?)

Answer - yes, there are performance issues to watch out for if conversion 
happens a lot.

Q6 – I understand any code like

            char1:=string1[i];
            if char1 in [‘a’..’z’] then
            begin
                    message:=string[i]+’ - character is lowercase’;
            end

        will break, as ansi characters are ordinal (less than 256 or 512)
and set comparisons ['a'..'z']  or ['a','b','c']    can be used, this set
code cannot be used for unicode characters.   What is the replacement?

Answer - There is CharInSet call and numerous extra housekeeping functions 
added in TCharacter.

Q7 – do literals like  #13#10 still mean carriage return and linefeed?  #9
means tab?
        if I have code like (logline string1 string2 are string)

        logline:=FormatDateTime(‘dd-mmm-yyyy hh:nn:ss’,now) + string1 +
#13#10+#9 + string2;
        ShowMessage(logline);
        Button1.hint:=logline;
        writeln(f,logline);

        these work D5-D2007   - ie a 2 line messagebox text, 2 line hint,
and 2 lines written to a log file.
        is this still going to work?

        do carriage returns/tabs/other control characters have to be defined
differently, eg as constants?

Answer - not figured out yet - anyone else know?

Q8 – stringlist1.loadfromfile(‘Test1.txt’);
        what happens if this file is ascii text being read into a stringlist
which is unicode strings.

Answer - Default is Ascii text for loadfromfile and savetofile, use 
overloaded routines for Unicode

Q9 -   stringlist1.savetofile(‘Test1.txt’)
         presumably this is no longer ascii text.   How do I save and read a
stringlist to/from a file if it is to be Ansi text?

Q10 – If there are complexities in Q8 and Q9 is there a TAnsiStringlist
type (for ansistrings) as well as a unicode TStringlist type?
        (I use stringlists a lot)

Answer - unicodestring lists can save to ascii or unicode files, so 
TAnsiStringlist not needed.

Q11 – do inifiles become unicode too?

Answer - looks like no?  Not clear?  Anyone else know?

Q12 – does Windows Notepad open unicode text files correctly?   or can it
only be used on Ansi text files?

Anyone know this?

Q13 - It looks like most programmers editors read and write ascii and
unicode encoding.....the one I use seems to distinguish between UTF-8 and
unicode as well – what is the difference?

Anyone know this?

John

_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi at delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: 
unsubscribe 

_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi at delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: unsubscribe