Tutorial Seek Development Blog

IB Personal Project of Jamie Chung

 

Testing the Spider

It is now 4 in the morning, and I have spent it testing the spider on various tutorial sites.  I spidered all the tutorials on GreyCobra in a couple of minutes, found out they have 229 tutorials to be exact.  This automated process will make it easier for webmasters, in the sense that when they write a new tutorial, they do not have to submit it to the database, but instead, the spider would crawl the site for it.  The test went smoothly of course, and I am liking the speed and the accuracy of the spider to a great degree.

My next big opponent is 13Dots.com.  I figured that if I had a lot of tutorials in the database, even to start off with at least 1000 tutorials off the bat would be great for the website.  So I might as well test the spider on the larger tutorial communities.

While testing the spider, I made a lot of changes to the algorithm of the spider and how it operates.  I changed some methods for parsing the tutorial urls and ensuring that we are grabbing the correct urls.  We will also depend on the user community to keep the spider in tip top shape, reporting bad tutorials, so that the bugs can be investigated and recompiled into the spider.

Filed under : General, Development
By Jamie
On November 10, 2006
At 4:10 am
Comments :
 

4 Comments for this post

 
Akash Says:

FINALLY!

 
 
David Leggett Says:

Very nice =)

However, when you consider all of our tutorials published in the forums, tutorials not listed directly in the database, and premium tutorials, you realize that you may be missing several hundred tutorials.

You may want to add a way for webmasters to submit tutorials (or sitemaps to look through).

Looking forward to the next update!

 
 
Dylan Says:

Deffinately can’t wait till the next update. Is there any chance you can tell us what the spider is looking for? Like what data it records etc.

 
 
Eli Says:

Interesting. Just read about it. Good luck with this.

 

Leave a Reply

 
PageRank Plus - Google PageRank checking, with a twist.