Currently nutch has an AdaptiveFetchSchedule which sets the fetch time according to if a page is modified or not. What I want to do is to set the fetch time according ...
In my crawler system, I have set the fetch interval as 30 days. I initially set my user agent as say "...." then many urls are getting rejected. But after changing ...