[DUG] Upgrading to XE - Unicode strings questions

Tue Nov 23 15:35:47 NZDT 2010

My main remaining question is the best way to handle code that up to now 
looked like:

    for i:=1 to length(string1) do
    begin
            DoSomethingWithOneChar(string1[i]);
    end;

If I got the gist correctly, string1[i] is one unicode character, but 
length(string1) is the number of codepoints in the string and not the number 
of characters.  This is gonna be confusing!

Other comments:

Comment 1 - I saw quite a few commentators say that they in general approved 
of the way that the unicode had been implemented - everything that was ansi 
string before is now unicode consistently throughout the whole language and 
IDE, and in the main the only code that needs altering is where Delphi is 
communicating outside the standard language:   ie

-DLL calls
-SavetoFile and LoadFromFile and other file access - even here smart 
defaults have been put in to retain expected behaviour.
-Sending strings to COM/TCP etc you might need to convert to get the kind 
expected
-Database fields - usually handled by making sure the right encoding is 
sent.

Comment 2 - The worst inconveniences are for those who have already tried to 
do some unicode type processing using WideChar, and the functions that were 
used for these.    Undoing these changes is usually the best way to cater 
for unicode.    Also some of the routines introduced then have horribly 
confusing names,  like AnsiPos   which is for searching widechars and is 
still what should be used for searching.    It seems to me that some 
identical routines should be introduced - eg called UnicodePos(.....) 
just so that those who are new to Unicode can use at least a consistently 
named set of tools.    I would probably make routines named like this which 
I use just to be clear.

Comment 3 - I see a few people arguing that there should have been a 
compiler switch to allow compiling to ansistring  or unicode string 
depending on the compiler switch, to ease converting people to D2009/XE. 
There are merits either way on this - in the long term if everyone is going 
to have to live in a unicode world then its probably better to bite the 
bullet and be made to convert code as eventually you cannot escape it.   In 
such a case a simpler compiler and VCL is a big advantage.     This is sort 
of related to being able to cross compile to 64 bit, iPhone, Android - 
whatever way makes it easy to have these forward looking options.    The 
quite stark reality is that in 5 years it looks like much but not all 
commercial software will be running on Windows,  its likely to be a mix of 
Web/iPhone/Android/GoogleOS/MacOS   so the forwards portability of compiling 
Delphi for different environments is way more important than whether it 
should be able to do Strings as  AnsiString.

Comment 4 - Has anyone at Embarcadero considered 2 ways to make cross 
platform?    option A is to go for a native compiler for different OS's - 
best if can be done.   option B is the Java route - compile to intermediate 
code for a Delphi Virtual Machine which can run interpreted with a runtime 
on many OS's.   Could be called the Delphi Virtual VCL Machine.   The reason 
why this might be a good way to go is that Delphi was originally designed as 
a teaching language - ie formally very strongly typed and formally well 
structured language- it could be about the best candidate around for 
generalised compiling and a simple cross platform runtime.     Also with 
Java now owned by Oracle there is questions over if it has such a bright 
future and there is room for another similar approach.   DotNet is a similar 
idea too, but will only ever really be Windows.   A Delphi Virtual Machine 
might not matter too much if its slower if its portable.

[But I digress - The last point is way off topic for Unicode however]

Comment and question 5 - What is the status of Free Pascal/Lazarus  wrt to 
unicode?    Does Delphi XE code port or not to Free Pascal?    Its an issue 
to consider as well.