task « hadoop « Java Database Q&A

1. Get the task attempt ID for the currently running Hadoop task stackoverflow.com

The Task Side-Effect Files section of the Hadoop tutorial mentions using the "attemptid" of the task as a unique name. How do I get this attempt ID in ...

2. Can I set task memory limit higher than 2GB stackoverflow.com

Hadoop map-reduce configuration provides the mapred.task.limit.maxvmem and mapred.task.default.maxvmem. According to the documentation both of these are values of type long that is anumber, in bytes, that represents the default/upper ...

3. Hadoop task schedulers: Capacity vs Fair sharing or something else? stackoverflow.com

Background

My employer is progressively shifting our resource intensive ETL and backend processing logic from MySQL to Hadoop ( dfs & hive ). At the moment everything is still somewhat small ...

4. Hadoop Pipes: how to pass large data records to map/reduce tasks stackoverflow.com

I'm trying to use map/reduce to process large amounts of binary data. The application is characterized by the following: the number of records is potentially large, such that I don't really ...

5. What types of tasks / applications can use Apache Hadoop for (MapReduce functions) stackoverflow.com

I don't understand what types of apps can be used with Hadoop. Does each task have to be tailored for hadoop/mapreduce. For example, can you just associate any long ...

6. Hadoop, running tasks stackoverflow.com

How do I programmatically add tasks to hadoop and run in my Java application? Any ideas? Thanks.

7. number of reducers for 1 task in MapReduce stackoverflow.com

In a typical MapReduce setup(like Hadoop), how many reducer is used for 1 task, for example, counting words? My understanding of that MapReduce from Google means only 1 reducer is involved. ...

8. Hadoop Fair Scheduler not assigning tasks to some nodes stackoverflow.com

I'm trying to run the Fair Scheduler, but it's not assigning Map tasks to some nodes with only one job running. My understanding is that the Fair Scheduler will use ...

9. General Method for Determining Hadoop Conf Settings on a Single Node Cluster stackoverflow.com

I am wondering how best to determine the appropriate numbers of map and reduce tasks and the corresponding maximum size of the JVM heap? For those new to Hadoop these ...

10. Getting all TaskAttempts of a Task from Hadoop API stackoverflow.com

I would like to get information on all TaskAttempts of a Task of a Job on Hadoop. org.apache.hadoop.mapred.TaskReport gives information on running Attempts and successful Attempts, but I would like to also ...

11. How to expose the task tracker/Job tracker webinterface to the public in hadoop? stackoverflow.com

I'm trying to monitor different cluster nodes, but everytime I have to ssh -X to the node and start the browser to take a look at the status information. Is there anyway ...

12. Why does a number of completed tasks in Mapreduce decrease? stackoverflow.com

When running hadoop jobs, I noticed that sometimes the number of completed tasks decreases and number of canceled tasks increases. How is this possible? Why does this happen?

13. How does Hadoop transfer user-defined parameters to tasks? stackoverflow.com

In Hadoop, how does it transfer user's configuration parameter to the task? For example,

conf.set("myparameter", "somestring")

then I can get the parameter in the mapreduce by conf.get("myparameter"). Is it through Serializable? and ...

14. How to tell Hadoop to not delete temporary directory from HDFS when task is killed? stackoverflow.com

By default, hadoop map tasks write processed records to files in temporary directory at ${mapred.output.dir}/_temporary/_${taskid} . These files sit here until FileCommiter moves them to ${mapred.output.dir} (after task successfully finishes). I ...