[DUG] Validating CDS files
Jolyon Smith
jsmith at deltics.co.nz
Mon Jan 17 21:11:10 NZDT 2011
I am thoroughly confused now...
If you are talking about a *stream* that you theorise is being corrupted in
mid-flow over the wire, then I don't see how XML really helps ensuring the
integrity of that stream, or at least not how it is any more helpful than
other techniques ... you can't do anything with XML until the document is
complete as an incomplete document is not valid no matter how valid it may
be when eventually complete.
So you can't do any validation until your potentially interrupted stream has
finished streaming and you have a complete file to work with.
That being the case, if XML is a viable way to circumvent your apparent
problem then why can you not simply eliminate the streaming part entirely ?
Dump your output into a file on the server side of your communication, then
transmit the entire file, thereby leveraging the integrity checks of the
connection to ensure reliable transmission.
Hashing files in this way is a trivially simple and relatively efficient
matter using MD5, for example.
I am not entirely convinced that you are solving the right problem here.
The infrequency of the apparent corruption could surely potentially be as
much due to an infrequent access to some data that is already corrupt on the
server as it is to some sporadic wireless network corruption, no ?
-----Original Message-----
From: delphi-bounces at delphi.org.nz [mailto:delphi-bounces at delphi.org.nz] On
Behalf Of Matthew Comb
Sent: Monday, 17 January 2011 19:20
To: NZ Borland Developers Group - Delphi List
Subject: Re: [DUG] Validating CDS files
Paul,
Thanks for your thoughts, I was tending towards reverse engineering the
format, I could not see any obvious tokens at footer, and wondered if
someone had beaten me to it.
I actually prefer the simplicity of your idea of compressing/encrypting
the xml file. Thats a tidyier solution for now until we can get to the
hashing
Not sure why we haven't thought of that already....
Cheers,
Matt.
> Matthew wrote:
>
>> I wasn't suggesting that wireless was changing byte
>> structure, but if you are streaming data, and your datastream
>> gets disconnected, then you could end up with an incomplete transfer.
>>
>> I'm not 100% sure that midas catches all scenarios when
>> working off a remote data instance ?
>>
>> Note we use dbx4mysql + midas.
>>
>> Note also that I cannot rule out the drivers either and also
>> could be the data out of the db server, its a tricky one to
>> track down, as you basically have a black box from db ->
>> dbxmysql + midas...
>>
>> What I do know is that its very rare. e.g. maybe 1 in 10,000
>> usages corrupts the file and has occurred in more than 1
>> location, so it does not appear to be station specific.
>>
>> Logically with these stats, I can only put that down to a
>> flakey connection, otherwise the error rate would be more often.
>
> If you can't go the hash route in the short term, it sounds to me like
> you'll need to reverse engineer the cds format to determine whether
> there is a examinable file tail.
>
> I don't imagine the format will be _that_ hard to reverse engineer but
> then I've done a fair bit of binary reverse engineering so maybe I'm
> being a bit cavalier. If you make a number of simple and small cds files
> and compare and contrast them, you can probably work your way up in
> determining the encoding.
>
> Assuming it was implemented using some form of streaming class model,
> there may well be a common interface and then a cds outputter and an xml
> outputter. If so, there is probably going to be a fairly 1-to-1 mapping
> between the binary format and the xml format in terms of logical schema.
>
> These xml inspited compressed binary schemas are often all pretty
> similar - they replace the text tags and attributes with binary tokens
> taken from a dictionary to avoid repetition and reduce the size.
>
> As an side, often you're better off just zipping the xml rather than
> implementing and debugging your own tokenized 'binary xml' engine but
> that's a whole different argument :-)
>
> Anyway, you might get lucky and quickly determine there is a common
> metadata trailer or end of data stream signature in the binary schema,
> similar to how zip files replicate their metadata directory and place it
> at the end of the file.
>
> This may seem overkill for your current problem since you're aiming for
> a short term solution, but having a cds stream viewer which can load
> into a virtual tree view might be a useful in-house debugging tool down
> the track.
>
> Cheers,
> Paul.
>
> _______________________________________________
> NZ Borland Developers Group - Delphi mailing list
> Post: delphi at delphi.org.nz
> Admin: http://delphi.org.nz/mailman/listinfo/delphi
> Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject:
> unsubscribe
>
_______________________________________________
NZ Borland Developers Group - Delphi mailing list
Post: delphi at delphi.org.nz
Admin: http://delphi.org.nz/mailman/listinfo/delphi
Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject:
unsubscribe
More information about the Delphi
mailing list