Sort « hadoop « Java Database Q&A





1. Can I get invidually sorted Mapper outputs from Hadoop when using zero Reducers?    stackoverflow.com

I have a job in Hadoop 0.20 that needs to operate on large files, one at a time. (It's a pre-processing step to get file-oriented data into a cleaner, line-based ...

2. Sorting large data using MapReduce/Hadoop    stackoverflow.com

I am reading about MapReduce and the following thing is confusing me. Suppose we have a file with 1 million entries(integers) and we want to sort them using MapReduce. The way i ...

3. Sort and shuffle optimization in Hadoop MapReduce    stackoverflow.com

I'm looking for a research/implementation based project on Hadoop and I came across the list posted on the wiki page - http://wiki.apache.org/hadoop/ProjectSuggestions. But, this page was last updated in ...

4. Running the Sort example on Hadoop (single-node cluster)    stackoverflow.com

I have installed Hadoop single-node cluster 0.20.2 on Ubuntu 10.04 and run an example using the material of the tutorial I found on this site: http://www.dscripts.net/wiki/setup-hadoop-ubuntu-single-node Now I am trying to ...

5. MapReduce (secondary) sorting / filtering - how?    stackoverflow.com

I have a logfile of timestamped values (concurrent users) of different "zones" of a chatroom webapp in the format "Timestamp; Zone; Value". For each zone exists one value per minute of ...

6. Hadoop running sort example on single-node cluster    stackoverflow.com

I am trying to run sort example on Hadoop single-node cluster. First of all, I start the deamons: hadoop@ubuntu:/home/user/hadoop$ bin/start-all.sh Then I run the random writer example to generate the sequential files as ...

7. Hadoop MapReduce with already sorted files    stackoverflow.com

I'm working with Hadoop MapReduce. I've got data in HDFS and data in each file is already sorted. Is it possible to force MapReduce not to resort the data after map ...

8. How to sort (order by) big data with hive efficiently?    stackoverflow.com

I want to sort a big dataset efficiently (i.e. with a custom partitioner, like described here: How does the MapReduce sort algorithm work?), but I want to do it with ...

9. Using sorted tables in Hive    stackoverflow.com

In summary: I feel that my system is ignoring the concept of pre-sorted tables. - I expected to save time on the sorting step because I was using pre-sorted data, but the query plan ...





10. The right place for "io.sort.mb" in Hadoop?    stackoverflow.com

I am a bit confused, in the Hadoop cluster setup, in section "Real-World Cluster Configurations", an example is given where properties like io.sort.mb & io.sort.factor goes in core-site.xml. But ...

11. MapReduce sorted iterator    stackoverflow.com

I am reading the source code of MapRedcue to gain more understanding MapReduce's internal mechanism. And I have problem when trying to understand how data produced in map phase are merged ...

12. Hadoop combiner sort phase    stackoverflow.com

When running a MapReduce job with a specified combiner, is the combiner run during the sort phase? I understand that the combiner is run on mapper output for each spill, but ...

13. Sorting by value in Hadoop from a file    stackoverflow.com

I have a file containing a String, then a space and then a number on every line. Example:

Line1: Word 2
Line2 : Word1 8
Line3: Word2 1
I need to sort the number ...