<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<style>
<!--
 /* Font Definitions */
 @font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:"Trebuchet MS";
        panose-1:2 11 6 3 2 2 2 2 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
tt
        {mso-style-priority:99;
        font-family:"Courier New";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";
        color:black;}
span.apple-style-span
        {mso-style-name:apple-style-span;}
span.webkit-html-tag
        {mso-style-name:webkit-html-tag;}
span.webkit-html-attribute-name
        {mso-style-name:webkit-html-attribute-name;}
span.webkit-html-attribute-value
        {mso-style-name:webkit-html-attribute-value;}
span.EmailStyle26
        {mso-style-type:personal;
        font-family:"Trebuchet MS","sans-serif";
        color:#2D3A44;}
span.doctype
        {mso-style-name:doctype;}
span.start-tag
        {mso-style-name:start-tag;}
span.attribute-name
        {mso-style-name:attribute-name;}
span.attribute-value
        {mso-style-name:attribute-value;}
span.EmailStyle31
        {mso-style-type:personal-reply;
        font-family:"Trebuchet MS","sans-serif";
        color:#2D3A44;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page Section1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1" />
 </o:shapelayout></xml><![endif]-->
</head>

<body bgcolor=white lang=EN-US link=blue vlink=purple>

<div class=Section1>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>Given you mention .Filename property I assume you are using
TXMLDocument.&nbsp; Forget that and use MSXML direct &#8211; its much better,
you could load a URL direct without first downloading to a file.&nbsp; Import the
MSXML 6.0 to create MSXML2_TLB.&nbsp; You will probably find that most web
sites have xhtml tags but are still not valid.&nbsp; Try extracting from html opening
tag down to the closing tag and processing that piece only as xml.&nbsp; In the
website used in the sample below if you download it to a file and strip the
headings before the html tag it will load properly.&nbsp; You might be able to
find a way around this, I haven&#8217;t looked any further.<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>unit Unit5;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>interface<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>uses<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; Windows, Messages, SysUtils, Variants, Classes, Graphics,
Controls, Forms,<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; Dialogs, StdCtrls, MSXML2_TLB;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>type<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; TForm5 = class(TForm)<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; Button1: TButton;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; procedure Button1Click(Sender: TObject);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; private<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; { Private declarations }<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; public<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; { Public declarations }<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; end;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; EValidateXMLError = class(Exception)<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; private<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; FErrorCode: Integer;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; FReason: string;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; public<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; constructor Create(aErrorCode: Integer; const
aReason: string; const aLine, aChar, aFilePos : integer; const aSrcText, aURL,
aXPath : string);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; property ErrorCode: Integer read FErrorCode;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; property Reason: string read FReason;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; end;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>var<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; Form5: TForm5;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>implementation<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>{$R *.dfm}<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>resourcestring<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; RsValidateError = 'XML Validation Error (%.8x) Reason:
%s&nbsp; XPath: %s&nbsp; Line: %d&nbsp; Char: %d&nbsp; File Pos: %d&nbsp; URL:
%s Src Text: %s';<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>constructor EValidateXMLError.Create(aErrorCode: Integer; const
aReason: string; const aLine, aChar, aFilePos : integer; const aSrcText, aURL,
aXPath : string);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>begin<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; inherited CreateResFmt(@RsValidateError, [AErrorCode,
aReason, aXPath, aLine, aChar, aFilePos, aURL, aSrcText]);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; FErrorCode := aErrorCode;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; FReason := aReason;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>end;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>procedure TForm5.Button1Click(Sender: TObject);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>var&nbsp; oXMLDoc: DOMDocument60;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>oError: IXMLDOMParseError2;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>begin<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; oXMLDoc := CoDOMDocument60.Create;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; oXMLDoc.async := FALSE;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; oXMLDoc.setProperty('ProhibitDTD', TRUE);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; oXMLDoc.resolveExternals := FALSE;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; oXMLDoc.validateOnParse&nbsp; := FALSE;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; oXMLDoc.load('http://w3future.com/weblog/gems/xhtml2.xml');
//use oXMLDOc.load() also loads file paths.&nbsp; use oXMLDoc.loadXML to load
XML in a string<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; if oXMLDoc.parseError.errorCode &lt;&gt; S_OK then //
validate is off above but you should still check for load errors.&nbsp; This is
different to validation though check out schemacache if you want to validate
against xsd<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; begin<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; oError := oXMLDoc.parseError as
IXMLDOMParseError2;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp; raise
EValidateXMLError.Create(oError.errorCode, oError.reason,<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
oError.line, oError.linepos, oError.filepos,<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
oError.srcText, oError.url, oError.errorXPath);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; end;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp; showmessage(oXMLDoc.xml);<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>end;<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>end.<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>cameron<o:p></o:p></span></p>

<div>

<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'>

<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif";
color:windowtext'>From:</span></b><span style='font-size:10.0pt;font-family:
"Tahoma","sans-serif";color:windowtext'> delphi-bounces@delphi.org.nz
[mailto:delphi-bounces@delphi.org.nz] <b>On Behalf Of </b>Alister Christie<br>
<b>Sent:</b> Friday, 29 January 2010 2:40 p.m.<br>
<b>To:</b> NZ Borland Developers Group - Delphi List<br>
<b>Subject:</b> Re: [DUG] web scraping using IHTMLDocument2<o:p></o:p></span></p>

</div>

</div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal>Thanks Cameron,<br>
<br>
It does indeed have that header, how do I make this work?<br>
<tt><span style='font-size:10.0pt'>&nbsp; XMLDocument1.FileName :=
'c:\temp\test.htm';</span></tt><span style='font-size:10.0pt;font-family:"Courier New"'><br>
<tt>&nbsp; XMLDocument1.Active := True;</tt><br>
</span>Gives me various errors, I suspect that that the file is not valid xml,
or is there some other way of parsing it?<br>
<br>
<br>
<o:p></o:p></p>

<pre>Alister Christie<o:p></o:p></pre><pre>Computers for People<o:p></o:p></pre><pre>Ph: 04 471 1849 Fax: 04 471 1266<o:p></o:p></pre><pre><a
href="http://www.salespartner.co.nz">http://www.salespartner.co.nz</a><o:p></o:p></pre><pre>PO Box 13085<o:p></o:p></pre><pre>Johnsonville<o:p></o:p></pre><pre>Wellington <o:p></o:p></pre>

<p class=MsoNormal><br>
<br>
Cameron Hart wrote: <o:p></o:p></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>Do you know if the websites are xhtml &#8211; do they have
anything like below in the start of the page.</span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;</span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New";
color:windowtext'>&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0
Strict//EN&quot; <a href="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">&quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd&quot;</a>&gt;</span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:10.0pt;font-family:"Courier New";
color:windowtext'>&lt;html xmlns=<a href="http://www.w3.org/1999/xhtml">&quot;http://www.w3.org/1999/xhtml&quot;</a>&gt;</span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;</span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>If they are it would be easier to load them into XML documents
and process them that way using msxml DOMDocument60</span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;</span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>cameron </span><o:p></o:p></p>

<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Trebuchet MS","sans-serif";
color:#2D3A44'>&nbsp;</span><o:p></o:p></p>

<div>

<div style='border:none;border-top:solid windowtext 1.0pt;padding:3.0pt 0cm 0cm 0cm;
border-color:-moz-use-text-color -moz-use-text-color'>

<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif";
color:windowtext'>From:</span></b><span style='font-size:10.0pt;font-family:
"Tahoma","sans-serif";color:windowtext'> <a
href="mailto:delphi-bounces@delphi.org.nz">delphi-bounces@delphi.org.nz</a> [<a
href="mailto:delphi-bounces@delphi.org.nz">mailto:delphi-bounces@delphi.org.nz</a>]
<b>On Behalf Of </b>Alister Christie<br>
<b>Sent:</b> Friday, 29 January 2010 12:22 p.m.<br>
<b>To:</b> NZ Borland Developers Group - Delphi List<br>
<b>Subject:</b> [DUG] web scraping using IHTMLDocument2</span><o:p></o:p></p>

</div>

</div>

<p class=MsoNormal>&nbsp;<o:p></o:p></p>

<p class=MsoNormal>I'm trying to do some web page scraping using
IHTMLDocument2, which is working fairly well and I can grab the second
paragraph on a web page by doing something like:<br>
<br>
<tt><span style='font-size:10.0pt'>p := iDoc.all.tags('P');</span></tt><span
style='font-size:10.0pt;font-family:"Courier New"'><br>
<tt>if p.Length &gt;= 2 then</tt><br>
<tt>&nbsp; result := p.Item(1).InnerText;</tt><br>
</span><br>
Where iDoc is an isnstance of IHTMLDocument2.<br>
<br>
However say there there is an HTML element like<br>
<br>
<span class=webkit-html-tag><span style='font-size:13.5pt;font-family:"Courier New"'>&lt;div
</span></span><span class=webkit-html-attribute-name><span style='font-size:
13.5pt;font-family:"Courier New"'>class</span></span><span
class=webkit-html-tag><span style='font-size:13.5pt;font-family:"Courier New"'>=&quot;</span></span><span
class=webkit-html-attribute-value><span style='font-size:13.5pt;font-family:
"Courier New"'>propertyInfo</span></span><span class=webkit-html-tag><span
style='font-size:13.5pt;font-family:"Courier New"'>&quot;&gt;</span></span><span
class=apple-style-span><span style='font-size:13.5pt;font-family:"Courier New"'>Price:
</span></span><span class=webkit-html-tag><span style='font-size:13.5pt;
font-family:"Courier New"'>&lt;span&gt;</span></span><span
class=apple-style-span><span style='font-size:13.5pt;font-family:"Courier New"'>Negotiation</span></span><span
class=webkit-html-tag><span style='font-size:13.5pt;font-family:"Courier New"'>&lt;/span&gt;&lt;/div&gt;</span></span><span
style='font-size:13.5pt;font-family:"Courier New"'><br>
<br>
</span>How would I be able to find the divs where
class=&quot;propertyInfo&quot;? (if anyone has much experience with
IHTMLDocument2) <o:p></o:p></p>

<pre>-- <o:p></o:p></pre><pre>Alister Christie<o:p></o:p></pre><pre>Computers for People<o:p></o:p></pre><pre>Ph: 04 471 1849 Fax: 04 471 1266<o:p></o:p></pre><pre><a
href="http://www.salespartner.co.nz">http://www.salespartner.co.nz</a><o:p></o:p></pre><pre>PO Box 13085<o:p></o:p></pre><pre>Johnsonville<o:p></o:p></pre><pre>Wellington <o:p></o:p></pre><pre><o:p>&nbsp;</o:p></pre><pre
style='text-align:center'>

<hr size=4 width="90%" align=center>

</pre><pre><o:p>&nbsp;</o:p></pre><pre>_______________________________________________<o:p></o:p></pre><pre>NZ Borland Developers Group - Delphi mailing list<o:p></o:p></pre><pre>Post: <a
href="mailto:delphi@delphi.org.nz">delphi@delphi.org.nz</a><o:p></o:p></pre><pre>Admin: <a
href="http://delphi.org.nz/mailman/listinfo/delphi">http://delphi.org.nz/mailman/listinfo/delphi</a><o:p></o:p></pre><pre>Unsubscribe: send an email to <a
href="mailto:delphi-request@delphi.org.nz">delphi-request@delphi.org.nz</a> with Subject: unsubscribe<o:p></o:p></pre></div>

</body>

</html>