1. Number of connections to the host at the same time stackoverflow.comHow can I handle this? |
2. Can't access hadoop web ui for job tracker stackoverflow.comI'm trying to set up hadoop and nutch to run on EC2. To get started, I have followed the excellent NutchHadoopTutorial. Most everything works as it should, except that ... |
3. Hadoop to create an Index and Add() it to distributed SOLR... is this possible? Should I use Nutch? ..Cloudera? stackoverflow.comCan I use a MapReduce framework to create an index and somehow add it to a distributed Solr? I have a burst of information (logfiles and documents) that will be transported over ... |
4. How can I develop a web crawler using nutch in Windows XP? stackoverflow.comI'm totally new to Nutch, I've installed Tomcat and, using NetBeans I've made a little Java project, which looks like this:
|
5. Writing MetaData inside HDFS stackoverflow.comWe are using nutch to crawl our intranet site. We are extracting the meta data in xml file, in the indexing phase(We modified the code of indexer.java), and when ran in local ... |
6. Run Nutch on existing Hadoop cluster stackoverflow.comWe have a Hadoop cluster (Hadoop 0.20) and I want to use Nutch 1.2 to import some files over HTTP into HDFS, but I couldn't get Nutch running on the cluster. I've ... |
7. Increase Java heap space for language-identifier plugin-in in nutch stackoverflow.comI am trying to add a new language To Automatic Language Detection tool Apache's tika. It needs to build a language profile for adding a new language. So i am using ... |
8. Setup Nutch 1.3 and Hadoop stackoverflow.comI am a newbie to Nutch and Hadoop and trying to follow the tutorial here at http://wiki.apache.org/nutch/NutchHadoopTutorial. So I started with Nutch 1.3 release. Even though Hadoop is included in Nutch, ... |
9. i don't known what does the symbol,"#" mean in the following src of the nutch's HttpBase.java stackoverflow.comWhen I come to the following src of the nutch's
|
10. Nutch Crawl error - Input path does not exist stackoverflow.comi have nutch/hadoop with 2 datanode server. I try to crawl some urls but nutch fails with this error:
|
11. whether method cancel() and method interrupt() do the duplicate job? stackoverflow.comI read the source of
Instruction 2:
The source of the org.apache.nutch.parse.ParseUtil.runParser(Parser p, Content content) is:
|
12. Exploring nutch over hadoop stackoverflow.comWhat possibly can i do with Hadoop and Nutch used as a search engine ? I know that nutch is used to build a web crawler . But i'm not finding ... |
13. Setting up nutch 1.3 and Hadoop 0.20.2 stackoverflow.comI have a multi-node cluster running on UEC(Ubuntu enterprise cloud) and i thought it will be a good idea to set up nutch with it . However, i found this tutorial unhelpful ... |