[DUG] web scraping using IHTMLDocument2

Fri Jan 29 13:09:30 NZDT 2010

Do you know if the websites are xhtml - do they have anything like below
in the start of the page.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

If they are it would be easier to load them into XML documents and
process them that way using msxml DOMDocument60

cameron 

From: delphi-bounces at delphi.org.nz [mailto:delphi-bounces at delphi.org.nz]
On Behalf Of Alister Christie
Sent: Friday, 29 January 2010 12:22 p.m.
To: NZ Borland Developers Group - Delphi List
Subject: [DUG] web scraping using IHTMLDocument2

I'm trying to do some web page scraping using IHTMLDocument2, which is
working fairly well and I can grab the second paragraph on a web page by
doing something like:

p := iDoc.all.tags('P');
if p.Length >= 2 then
  result := p.Item(1).InnerText;

Where iDoc is an isnstance of IHTMLDocument2.

However say there there is an HTML element like

<div class="propertyInfo">Price: <span>Negotiation</span></div>

How would I be able to find the divs where class="propertyInfo"? (if
anyone has much experience with IHTMLDocument2) 

-- 
Alister Christie
Computers for People
Ph: 04 471 1849 Fax: 04 471 1266
http://www.salespartner.co.nz
PO Box 13085
Johnsonville
Wellington 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserver.123.net.nz/pipermail/delphi/attachments/20100129/25e7c59c/attachment.html