[DUG] Validating CDS files
Paul Heinz
paul at accredo.co.nz
Mon Jan 17 17:53:06 NZDT 2011
Matthew wrote:
> I wasn't suggesting that wireless was changing byte
> structure, but if you are streaming data, and your datastream
> gets disconnected, then you could end up with an incomplete transfer.
>
> I'm not 100% sure that midas catches all scenarios when
> working off a remote data instance ?
>
> Note we use dbx4mysql + midas.
>
> Note also that I cannot rule out the drivers either and also
> could be the data out of the db server, its a tricky one to
> track down, as you basically have a black box from db ->
> dbxmysql + midas...
>
> What I do know is that its very rare. e.g. maybe 1 in 10,000
> usages corrupts the file and has occurred in more than 1
> location, so it does not appear to be station specific.
>
> Logically with these stats, I can only put that down to a
> flakey connection, otherwise the error rate would be more often.
If you can't go the hash route in the short term, it sounds to me like
you'll need to reverse engineer the cds format to determine whether
there is a examinable file tail.
I don't imagine the format will be _that_ hard to reverse engineer but
then I've done a fair bit of binary reverse engineering so maybe I'm
being a bit cavalier. If you make a number of simple and small cds files
and compare and contrast them, you can probably work your way up in
determining the encoding.
Assuming it was implemented using some form of streaming class model,
there may well be a common interface and then a cds outputter and an xml
outputter. If so, there is probably going to be a fairly 1-to-1 mapping
between the binary format and the xml format in terms of logical schema.
These xml inspited compressed binary schemas are often all pretty
similar - they replace the text tags and attributes with binary tokens
taken from a dictionary to avoid repetition and reduce the size.
As an side, often you're better off just zipping the xml rather than
implementing and debugging your own tokenized 'binary xml' engine but
that's a whole different argument :-)
Anyway, you might get lucky and quickly determine there is a common
metadata trailer or end of data stream signature in the binary schema,
similar to how zip files replicate their metadata directory and place it
at the end of the file.
This may seem overkill for your current problem since you're aiming for
a short term solution, but having a cds stream viewer which can load
into a virtual tree view might be a useful in-house debugging tool down
the track.
Cheers,
Paul.
More information about the Delphi
mailing list