[DUG] ]XE2 string conversion notes

Fri Jan 20 17:29:35 NZDT 2012

That’s all correct as far as I can see … 

about #3: stringlists etc using SavetoFile and LoadFromFile will default to a file format of AnsiStrings unless a BOM is found or unless one specifies a format otherwise

TEncoding.GetBufferEncoding can be used to detect what encoding is used for a file content (like in this example here: http://docwiki.embarcadero.com/CodeExamples/en/UnicodeConversion_(Delphi) ). 

… and the other way around (if I have to save in Unicode or not I usually just make a conversion to ansistring and compare it back to the original (unicode)string. If the characters are all the same (no replacements of Unicode to “?” characters) then I know I can save them as ascii instead of utf8.

This will allow you to build your own load/save(/append) functions for text-files without having to resort to TStringList. TStringList always adds a CR/LF to the last line of text when loading/saving which can be a bit annoying if you don’t want that.

Kind Regards,
Stefan Mueller 
_______________________
R&D Manager
ORCL Toolbox LLP, Japan
 <http://www.orcl-toolbox.com/> http://www.orcl-toolbox.com

From: delphi-bounces at listserver.123.net.nz [mailto:delphi-bounces at listserver.123.net.nz] On Behalf Of John Bird
Sent: Friday, January 20, 2012 12:36 PM
To: 'NZ Borland Developers Group - Delphi List'
Subject: [DUG] ]XE2 string conversion notes

I am converting source to be D2007 and XE2 compatible, the main issue being just my own string and file reading functions.

I recall Jolyon writing about this some months ago, with his complaints about the confusing naming of some of the routines (ANSIUpperCase for uppercasing Unicode for instance).

>From what I have been reading and researching I wanted to add a few points and list them here to make sure I am on the right track:

1 – Almost everything compiles and runs as is, especially if one has never tried to cater for WideChar and WideString before (thats where much of the problems come from IMHO)

2 – Some unusual cases – Records with definitions eg Name:string[60]  will need to be revisited.  (these are shortstring and still Ansi).

3 – stringlists etc using SavetoFile and LoadFromFile will default to a file format of AnsiStrings unless a BOM is found or unless one specifies a format otherwise

4 – Source files similarly will remain as Ansi/Ascii unless Unicode characters are present

5 – statements like if ThisChar in [‘a’-‘z’]  replaced with CharInSet   (the argument ‘a’-‘z’ is still AnsiChar/Ascii characters

6 – Uppercase, lowercase and more general string functions like the above charinset are best replaced with the Character unit functions:

    eg

        isLower

        isUpper

        isDigit

        toUpper

        toLower

    these are all general Unicode routines – many for either Char or String – and handle eg case conversion according to the general Unicode rules.   ie don’t use the AnsiUpperCase function which converts Ascii and according to the current locale (codepage) – ie not general Unicode conversion as far as I can figure.

7 – To compare strings, use CompareStr and CompareText for comparison which is or is not case sensitive according to general Unicode rules.   These also use proper unicode rules I understand so that the same character encoded differently in each string (eg as a surrogate pair) will be still matched if it is ultimately the same character.

8 – {$IFDEF UNICODE} blocks can be added for code only for  XE2 etc and will be ignored by D2007.

Hope this research is of use to others, please tell me if any of these are wrong.

John Bird

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserver.123.net.nz/pipermail/delphi/attachments/20120120/400a7789/attachment-0001.html