January 10, 2005

I was a big fan of geourl.com and have been missing it enough (and curious about MySQL’s R-Index) that yesterday I threw together a quick scraper in PHP to grab meta tags with names of either “ICBM” or “geo.position” and their associated values and then realized that I should’ve done it in Ruby to take advantage of its multithreading. An hour later I’d ported it and with the new multithreading was able to scrape ~8500 websites in two hours — nearly five times faster than I was able to with the PHP script. Today I was going to setup the database, but ran into the problem of how to convert latitude and longitude (all normalized to decimal form) to the WKT (or WKB) format. Any helpful pointers appreciated.

Tonight Kristen and I watched Collateral and it was surprisingly good. I was amazed that Michael Mann was involved with a film that didn’t have his signature glass brick walls in at least five scenes. The active_record write-up is coming along. Interest hasn’t waned, I’ve a rough draft that needs some polishing but I’ve been busy with contractual work (and writing spiders for geo information). And I’m not posting here with nearly the regularity that I’d like…

Comments

1

So as I understood you have develop multythreaded script on Ruby and it become more faster then php script, is it? (just to make sure that I understood you correctly) Do you have any problems with multythreading in Ruby? - I’ve read somewhere that it is possible that some threads can be lost which will due to memory leaks. Even such product exists Ruby Thread Validator : http://www.softwareverify.com/rubyThreadValidator/feature.html

Posted by vvlad at May 4, 2005 08:43 AM

Post a comment




Remember Me?

(you may use HTML tags for style)