[DUG] Upgrading to XE - Unicode strings questions

Tue Nov 23 16:14:50 NZDT 2010

No on two counts:

String[1] is one WIDE character, which may or may not be a complete Unicode codePOINT (and so equally may or may not be a complete Unicode character, although the definition of what constitutes a "character" in Unicode is a whole separate topic).

Length( s ) will always yield the number of chars in s.

The only wrinkle that Unicode introduces here is that the number of chars no longer == the number of *bytes* (each char is a WIDEChar and therefore 2 bytes).

But you can still reliably index each WIDEChar in a WIDEString using the [nth] element index.

Strings in COM have always been WideString - conversion to/from UnicodeString is automatic and lossless (in terms of data).

TCP, yes you will have to do work to support Unicode in this area if you haven't already done so (but the internet has - if not entirely, then in large part - been Unicode for a long time now, so you really should have taken care of this already, regardless of the Unicode-ness or otherwise of your Delphi code itself).

But that applies to ANY external systems with which your code interacts that may already be Unicode (or indeed which will remain resolutely ANSI, even if your app becomes Unicode).

In addition to inconveniences for people who had already done some work to support Unicode, the implementation does little/nothing to encourage or promote *correct* Unicode support in new projects and introduces potential for confusion and mistakes in many areas imho.

The entire string handling area of the RTL should have been thrown out and a properly thought out framework introduced to replace it, and yes, we should have been forced to migrate to the new, consistent and comprehensive string RTL (or at least encouraged, by marking all existing RTL support as "deprecated").

PLUS, for the backwards compatability crowd, they *could* have supported a "String == Unicode" compiler switch imho (not just an "I wish they had" - I can see technically precisely HOW it would and could have been implemented, and it fits perfectly with their own advice for how to deal with code that is problematic to convert to Unicode).

Whilst at a technical level this may not have been a huge advantage, it certainly would have been a welcome comfort to people facing the job of converting large applications with libraries of - in some cases no longer supported - 3rd party library code, by enabling them to "flag" those units as "ANSI" and deal with the conversion warnings that would have subsequently been emitted by linking with the Unicode VCL.

The only real argument against a compiler switch comes from the view that having two versions of the VCL - one Unicode and one ANSI - would have been required and would have been unworkable.  This is not the case IMHO.  The VCL could have gone unilaterally and fixedly String==UnicodeString whilst allowing us to compile our own units with String==ANSI/UnicodeString

As I say, the technique of enforcing ANSI-ness in "unsafe Unicode" units in order to defer the job of migrating those units to Unicode is well documented and is the official advice in such difficult cases.

A compiler switch as I envisage it would simply have made that process more straightforward - the net effect would have been the same, which on its own demonstrates that such a switch was in fact technically possibly IF IMPLEMENTED IN THAT WAY, despite the protestations to the contrary (which assume a DIFFERENT implementation approach).

Too late now of course.

:)

-----Original Message-----
From: delphi-bounces at delphi.org.nz [mailto:delphi-bounces at delphi.org.nz] On Behalf Of John Bird
Sent: Tuesday, 23 November 2010 15:36
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Upgrading to XE - Unicode strings questions

My main remaining question is the best way to handle code that up to now 
looked like:

    for i:=1 to length(string1) do
    begin
            DoSomethingWithOneChar(string1[i]);
    end;

If I got the gist correctly, string1[i] is one unicode character, but 
length(string1) is the number of codepoints in the string and not the number 
of characters.  This is gonna be confusing!

Other comments:

Comment 1 - I saw quite a few commentators say that they in general approved 
of the way that the unicode had been implemented - everything that was ansi 
string before is now unicode consistently throughout the whole language and 
IDE, and in the main the only code that needs altering is where Delphi is 
communicating outside the standard language:   ie

-DLL calls
-SavetoFile and LoadFromFile and other file access - even here smart 
defaults have been put in to retain expected behaviour.
-Sending strings to COM/TCP etc you might need to convert to get the kind 
expected
-Database fields - usually handled by making sure the right encoding is 
sent.

Comment 2 - The worst inconveniences are for those who have already tried to 
do some unicode type processing using WideChar, and the functions that were 
used for these.    Undoing these changes is usually the best way to cater 
for unicode.    Also some of the routines introduced then have horribly 
confusing names,  like AnsiPos   which is for searching widechars and is 
still what should be used for searching.    It seems to me that some 
identical routines should be introduced - eg called UnicodePos(.....) 
just so that those who are new to Unicode can use at least a consistently 
named set of tools.    I would probably make routines named like this which 
I use just to be clear.

Comment 3 - I see a few people arguing that there should have been a 
compiler switch to allow compiling to ansistring  or unicode string 
depending on the compiler switch, to ease converting people to D2009/XE. 
There are merits either way on this - in the long term if everyone is going 
to have to live in a unicode world then its probably better to bite the 
bullet and be made to convert code as eventually you cannot escape it.   In 
such a case a simpler compiler and VCL is a big advantage.     This is sort 
of related to being able to cross compile to 64 bit, iPhone, Android - 
whatever way makes it easy to have these forward looking options.    The 
quite stark reality is that in 5 years it looks like much but not all 
commercial software will be running on Windows,  its likely to be a mix of 
Web/iPhone/Android/GoogleOS/MacOS   so the forwards portability of compiling 
Delphi for different environments is way more important than whether it 
should be able to do Strings as  AnsiString.

Comment 4 - Has anyone at Embarcadero considered 2 ways to make cross 
platform?    option A is to go for a native compiler for different OS's - 
best if can be done.   option B is the Java route - compile to intermediate 
code for a Delphi Virtual Machine which can run interpreted with a runtime 
on many OS's.   Could be called the Delphi Virtual VCL Machine.   The reason 
why this might be a good way to go is that Delphi was originally designed as 
a teaching language - ie formally very strongly typed and formally well 
structured language- it could be about the best candidate around for 
generalised compiling and a simple cross platform runtime.     Also with 
Java now owned by Oracle there is questions over if it has such a bright 
future and there is room for another similar approach.   DotNet is a similar 
idea too, but will only ever really be Windows.   A Delphi Virtual Machine 
might not matter too much if its slower if its portable.

[But I digress - The last point is way off topic for Unicode however]

Comment and question 5 - What is the status of Free Pascal/Lazarus  wrt to 
unicode?    Does Delphi XE code port or not to Free Pascal?    Its an issue 
to consider as well.

_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi at delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: unsubscribe