Alex Iskold posted a great article here http://www.readwriteweb.com/archives/web_30_when_web_sites_become_web_services.php
Which made me think of another blog entry that I read here by Ismael Ghalimi
http://itredux.com/blog/2007/02/20/help-needed-for-web-scraping/
... and after looking thru the various comments got me thinking about what it would take to provide a SIMPLE API over the Web. 4 Hours later, it came up with
this
http://www.universaldatafeeds.com/data/getxml.aspx
and this
http://www.universaldatafeeds.com/data/getjson.aspx
The first provides an XML API over standard Web Sites and the second a JSON API. So to consume, say http://www.diggriver.com as an XML data source, all you have to do is
http://www.universaldatafeeds.com/data/getxml.aspx?url=http://www.diggriver.com
and presto, you get an XML stream of the web page. Need it in JSON format ...
http://www.universaldatafeeds.com/data/getjson.aspx?url=http://www.diggriver.com
But wait ... I don't want the entire contents of the web page, I just want the links ... no problem ...
http://www.universaldatafeeds.com/data/getxml.aspx?url=http://www.diggriver.com~//a
Note the ~//a tacked onto the end of the url? This provides an XPATH query capability over the XML Data Stream.
But I really don't want all the links, I just want the ones that apply to the Digg items ... no problem
http://www.universaldatafeeds.com/data/getxml.aspx?url=http://www.diggriver.com~//li/a
or how about just the title
http://www.universaldatafeeds.com/data/getxml.aspx?url=http://www.diggriver.com~//title
or how about the Google search results for your name
http://www.universaldatafeeds.com/data/getxml.aspx?url=http://www.google.ca/search?hl=en&q=+Alex+Iskold&btnG=Google+Search&meta=~//div/a
This refines the XPATH Query to a specific set of nodes. I'll not go into all the variations of XPATH syntax ... you can Google XPATH for that.
I guess what this all demonstrates is that we tend to look for complex solutions sometimes when all we need is a simpler (dare I say Web 2.0) approach.
All this took about 4 hours to create and while it's not as sophisticated as some of the other tools/technologies mentioned ... it sure is simpler and maybe, just maybe "good enough".
Now ... no doubt this site will get hammered and I'll lose my ISP privileges on this beta site so ...
Sponsors ... please step up :-)