If you want to automatically process web pages to extract data, you have a number of tools available. You can bring a web page down to your computer using "curl" or "wget"
curl http:.//aplawrence.com > mysite
If you don't really want the html, use "lynx --dump https://whatever.com > /yourstorage/whatever.txt" to get a text representation of the page. Check the man page for options you might want like "--nolist".
You can also easily be selective and pull only the data you want from a page with simple Perl scripts.
#!/usr/bin/perl use LWP::Simple; $url = 'https://aplawrence.com"; $content = get $url; print $content;
And then of course you'd process the $content as desired. It's only a little more complex if you are dealing with forms.
A book that covers LWP is reviewed at /Books/webc.html.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2012-08-02 Tony Lawrence
Anyone who slaps a 'this page is best viewed with Browser X' label on a Web page appears to be yearning for the bad old days, before the Web, when you had very little chance of reading a document written on another computer, another word processor, or another network. (Tim Berners-Lee)