[DUG] web scraping using IHTMLDocument2

Alister Christie alister at salespartner.co.nz
Fri Jan 29 14:39:49 NZDT 2010


Thanks Cameron,

It does indeed have that header, how do I make this work?
  XMLDocument1.FileName := 'c:\temp\test.htm';
  XMLDocument1.Active := True;
Gives me various errors, I suspect that that the file is not valid xml, 
or is there some other way of parsing it?

Alister Christie
Computers for People
Ph: 04 471 1849 Fax: 04 471 1266
http://www.salespartner.co.nz
PO Box 13085
Johnsonville
Wellington 



Cameron Hart wrote:
>
> Do you know if the websites are xhtml -- do they have anything like 
> below in the start of the page.
>
>  
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
>
> <html xmlns="http://www.w3.org/1999/xhtml">
>
>  
>
> If they are it would be easier to load them into XML documents and 
> process them that way using msxml DOMDocument60
>
>  
>
> cameron
>
>  
>
> *From:* delphi-bounces at delphi.org.nz 
> [mailto:delphi-bounces at delphi.org.nz] *On Behalf Of *Alister Christie
> *Sent:* Friday, 29 January 2010 12:22 p.m.
> *To:* NZ Borland Developers Group - Delphi List
> *Subject:* [DUG] web scraping using IHTMLDocument2
>
>  
>
> I'm trying to do some web page scraping using IHTMLDocument2, which is 
> working fairly well and I can grab the second paragraph on a web page 
> by doing something like:
>
> p := iDoc.all.tags('P');
> if p.Length >= 2 then
>   result := p.Item(1).InnerText;
>
> Where iDoc is an isnstance of IHTMLDocument2.
>
> However say there there is an HTML element like
>
> <div class="propertyInfo">Price: <span>Negotiation</span></div>
>
> How would I be able to find the divs where class="propertyInfo"? (if 
> anyone has much experience with IHTMLDocument2)
>
> -- 
> Alister Christie
> Computers for People
> Ph: 04 471 1849 Fax: 04 471 1266
> http://www.salespartner.co.nz
> PO Box 13085
> Johnsonville
> Wellington 
> ------------------------------------------------------------------------
>
> _______________________________________________
> NZ Borland Developers Group - Delphi mailing list
> Post: delphi at delphi.org.nz
> Admin: http://delphi.org.nz/mailman/listinfo/delphi
> Unsubscribe: send an email to delphi-request at delphi.org.nz with Subject: unsubscribe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserver.123.net.nz/pipermail/delphi/attachments/20100129/a7766bba/attachment.html 


More information about the Delphi mailing list