[DUG] Upgrading to XE - Unicode strings questions
David Brennan
dugdavid at dbsolutions.co.nz
Tue Nov 23 13:27:10 NZDT 2010
Just thought I would chime in that I'm really interested in the answers to these questions too (Unicode being something we are also a bit apprehensive of).
-----Original Message-----
From: delphi-bounces at delphi.org.nz [mailto:delphi-bounces at delphi.org.nz] On Behalf Of John Bird
Sent: Tuesday, 23 November 2010 1:04 p.m.
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions
Thanks for the references, so I can answer most of the questions now.
Here is what I understand so far, if anyone has anything to add this will be
useful!
Extra question:
It looks like code like
for i:=1 to length(string1) do
begin
DoSomethingWithOneChar(string1[i]);
end;
cannot be used reliably. The problems are that length(string1) looks like
it cannot be safely used - as unicode characters may include 2 codepoints
and length(string1) highlights that there is a difference between the number
of unicode characters in a string and the number of codepoints. Still
figuring out what is the best practice here, as I have quite a lot of string
routines. Should be be OK as long as the unicode text actually is ASCII.
Q2 – With XE do the .pas and .dfm files become unicode text and hence cannot
be read by earlier Delphi, eg D2007 any more?
Answer - Is a project option from what I have read?, yes not portable if
unicode.
Q3 – I do a lot of reading ascii data files, and writing back. Using
mainly TFilestream and stringlists. Does this in general mean I will need
to use file variables declared as Ansichar and AnsiString instead of Char
and String?
(I would prefer to use the standard VCL where possible)
If I have variables
as1:Ansistring;
s2:string;
Q4 – if I do s2:=as1 does this convert ansistrings to unicode?
Answer - yes, there are performance issues to watch out for if conversion
happens a lot.
Q5 – if I do as1:=s2 does this convert a unicode string to ansistring?
(otherwise how do I do this?)
Answer - yes, there are performance issues to watch out for if conversion
happens a lot.
Q6 – I understand any code like
char1:=string1[i];
if char1 in [‘a’..’z’] then
begin
message:=string[i]+’ - character is lowercase’;
end
will break, as ansi characters are ordinal (less than 256 or 512)
and set comparisons ['a'..'z'] or ['a','b','c'] can be used, this set
code cannot be used for unicode characters. What is the replacement?
Answer - There is CharInSet call and numerous extra housekeeping functions
added in TCharacter.
Q7 – do literals like #13#10 still mean carriage return and linefeed? #9
means tab?
if I have code like (logline string1 string2 are string)
logline:=FormatDateTime(‘dd-mmm-yyyy hh:nn:ss’,now) + string1 +
#13#10+#9 + string2;
ShowMessage(logline);
Button1.hint:=logline;
writeln(f,logline);
these work D5-D2007 - ie a 2 line messagebox text, 2 line hint,
and 2 lines written to a log file.
is this still going to work?
do carriage returns/tabs/other control characters have to be defined
differently, eg as constants?
Answer - not figured out yet - anyone else know?
Q8 – stringlist1.loadfromfile(‘Test1.txt’);
what happens if this file is ascii text being read into a stringlist
which is unicode strings.
Answer - Default is Ascii text for loadfromfile and savetofile, use
overloaded routines for Unicode
Q9 - stringlist1.savetofile(‘Test1.txt’)
presumably this is no longer ascii text. How do I save and read a
stringlist to/from a file if it is to be Ansi text?
Q10 – If there are complexities in Q8 and Q9 is there a TAnsiStringlist
type (for ansistrings) as well as a unicode TStringlist type?
(I use stringlists a lot)
Answer - unicodestring lists can save to ascii or unicode files, so
TAnsiStringlist not needed.
Q11 – do inifiles become unicode too?
Answer - looks like no? Not clear? Anyone else know?
Q12 – does Windows Notepad open unicode text files correctly? or can it
only be used on Ansi text files?
Anyone know this?
Q13 - It looks like most programmers editors read and write ascii and
unicode encoding.....the one I use seems to distinguish between UTF-8 and
unicode as well – what is the difference?
Anyone know this?
John
_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi at delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject:
unsubscribe
_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi at delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: unsubscribe
More information about the Delphi
mailing list