map « hadoop « Java Database Q&A





1. How Can I Use The Input Logs .PCAP(Binary) With Map Rreduce Hadoop    stackoverflow.com

Logs Tcpdumps are binary files, i wanna know what FileInputFormat of hadoop i should use for split chunks the input data...please help me!!

2. How do I concatenate a lot of files into one inside Hadoop, with no mapping or reduction    stackoverflow.com

I'm trying to combine multiple files in multiple input directories into a single file, for various odd reasons I won't go into. My initial try was to write a 'nul' ...

3. Multiple lines of text to a single map    stackoverflow.com

I've been trying to use Hadoop to send N amount of lines to a single mapping. I don't require for the lines to be split already. I've tried to use NLineInputFormat, ...

4. Having two sets of input combined on hadoop    stackoverflow.com

I have a rather simple hadoop question which I'll try to present with an example say you have a list of strings and a large file and you want each mapper to ...

5. Need help implementing this algorithm with map Hadoop MapReduce    stackoverflow.com

i have algorithm that will go through a large data set read some text files and search for specific terms in those lines. I have it implemented in Java, but I ...

6. Hadoop: Mapping binary files    stackoverflow.com

Typically in a the input file is capable of being partially read and processed by Mapper function (as in text files). Is there anything that can be done to handle binaries ...

7. Where should Map put temporary files when running under Hadoop    stackoverflow.com

I am running Hadoop 0.20.1 under SLES 10 (SUSE). My Map task takes a file and generates a few more, I then generate my results from these files. I would like to ...

8. How to keep the sequence file created by map in hadoop    stackoverflow.com

I am using hadoop and working with a map task that creates files that I want to keep, currently I am passing these files through the collector to the reduce task. ...

9. Using Mapreduce to map multiple unique values not always present on the same lines    stackoverflow.com

I have run into a complex problem with Mapreduce. I am trying to match up 2 unique values that are not always present together in the same line. Once ...





10. Hadoop last map job stuck - Need help    stackoverflow.com

I am doing some text processing using hadoop map-reduce jobs. My job is 99.2% complete and stuck on last map job. The last few lines of the map output show as ...

11. How can I use the map datatype in Apache Pig?    stackoverflow.com

I'd like to use Apache Pig to build a large key -> value mapping, look things up in the map, and iterate over the keys. However, there does not even ...

12. How to map a set of text as a whole to a node?    stackoverflow.com

Suppose I have a plain text file with the following data:

DataSetOne <br />
content <br />
content <br />
content <br />


DataSetTwo <br />
content <br />
content <br />
content <br />
content <br />
...and so on... What ...

13. How we can do a map operation from a file and a cassandra at a time?    stackoverflow.com

I want to do a hadoop job by mapping inputs which is from a file and a cassandra at a time. it it possible? I know the ways to get file inputs files ...

14. Looking for a drop-in replacement for a java.util.Map    stackoverflow.com

Problem

Following up on this question, it seems that a file- or disk-based Map implementation may be the right solution to the problems I mentioned there. Short version:
  • Right now, I have ...

15. Load Multiple files in same map function in Hadoop    stackoverflow.com

I have two data sets one is historical quote data and other is historical trade data. Data is splitted per symbol per day basis. My question is how to load two ...

16. Is it possible to run several map task in one JVM?    stackoverflow.com

I want to share large in memory static data(RAM lucene index) for my map tasks in Hadoop? Is there way for several map/reduce tasks to share same JVM?





17. Hadoop Recursive Map    stackoverflow.com

I have a requirement that my mapper may in some cases produce a new key/value for another mapper to handle. Is there a sane way to do this? I've ...

18. Providing several non-textual files to a single map in Hadoop MapReduce    stackoverflow.com

I'm currently writing distributed application which parses Pdf files with the help of Hadoop MapReduce. Input to MapReduce job is thousands of Pdf files (which mostly range from 100KB to ~2MB), ...

19. Hadoop Streaming Multiple Files per Map Job    stackoverflow.com

I have a Hadoop streaming setup that works, however there is a bit of overhead when initializing the mappers which is done once per file, and since I am processing many ...

20. Progress rate during map phase (LATE scheduler) - Hadoop    stackoverflow.com

I am trying to find out the progress rate of the map tasks. If someone can help me out it will be great !! Thanks !!

21. Why do I get "security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000"?    stackoverflow.com

$hdfs dfs -rmr crawl
    11/04/16 08:49:33 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
I'm using hadoop-0.21.0 with the default Single Node Setup configuration.

22. How to get Filename/File Contents as key/value input for MAP when running a Hadoop MapReduce Job?    stackoverflow.com

I am creating a program to analyze PDF, DOC and DOCX files. These files are stored in HDFS. When I start my MapReduce job, I want the map function to have the ...

23. Sorting key-value pairs after map function in mapreduce    stackoverflow.com

I have a file, which contains IP packet headers in text format. After the map function, each reduce method is called for a particular IP address. I want the values in a ...

24. Sorting by values after map function in mapreduce issue , please help    stackoverflow.com

i want to sort my values before passing them to reduce function , i came to know that it can be achieved by setting outputkeycomparatorclass as given below

conf.setOutputKeyComparatorClass(SortReducerByValuesKeyComparator.class);
and my class is ...

25. Hadoop MapReduce with a recursive Map    stackoverflow.com

I need to do a MapReduce application in Java, that need to be auto-recursive, that means for each line of input file processed it must check all the lines of the ...

26. is hive have its own map reduce program?    stackoverflow.com

i want to implement hive+hadoop map reduce program on my aplication, i still wondering,because i have try many times about query and finding information about map reduce program in hive.. my question is,is ...

27. Configure Map Side join for multiple mappers in Hadoop Map/Reduce    stackoverflow.com

I have a question about configuring Map/Side inner join for multiple mappers in Hadoop. Suppose I have two very large data sets A and B, I use the same partition and ...

28. Joining hadoop-streaming map outputs and form a single file.    stackoverflow.com

I just want to ask you if there is away of using a reducer or something like concatenation to glue my outputs from the mapper and outputs them as a single file ...

29. Execute program that creates mp4 local files through a hadoop datanode in the map function    stackoverflow.com

By using java Runtime.getRuntime().exec(command); I want to run a program on a hadoop datanode as part of the map function. This program will create mp4 files on the datanode's local filesystem. ...

30. hadoop : 1 map multiple reducers with each reducer having different functionality? possible?    stackoverflow.com

so here is an example: Is it possible to have same mapper run against multiple reducers at the same time? like

map output : {1:[1,2,3,4,5,4,3,2], 4:[5,4,6,7,8,9,5,3,3,2], 3:[1,5,4,3,5,6,7,8,9,1], so on} ...

31. Why is TeraSort map phase spending significant time in CRC32.update() function?    stackoverflow.com

I am trying to profile which functions consume the most time for a TeraSort Hadoop job. for my test system, I am using a basic 1-node pseudo-distributed setup. This means that ...

32. What's wrong with my Hive-UDF?How to set the map number of hive?    stackoverflow.com

I use Hadoop-Hive to analyse apache log to statis access features. I write a UDF named GetCity to convert the remote_ip to city name, but when I run "select GetCity(remote_ip) from ...

33. Hadoop options are not having any effect (mapreduce.input.lineinputformat.linespermap, mapred.max.map.failures.percent)    stackoverflow.com

I am trying to implement a MapReduce job, where each of the mappers would take 150 lines of the text file, and all the mappers would run simmultaniously; also, it should ...

34. What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run() function in Hadoop?    stackoverflow.com

What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run() function in Hadoop? The setup() is called before calling the map() and the clean() is called after the map(). The documentation for the run() ...

35. How to set the number of map tasks in hadoop 0.20?    stackoverflow.com

I'm trying to set the number of map tasks to run in hadoop 0.20 environment. I am using the old api. Here are the options I've tried so far:

    conf.set("mapred.tasktracker.map.tasks.maximum", ...

36. Hadoop: what should be mapped and what should be reduced?    stackoverflow.com

This is my first time using map/reduce. I want to write a program that processes a large log file. For example, if I was processing a log file that had records ...

37. Hadoop - increasing map tasks in xml doesn't increases map tasks when runs    stackoverflow.com

I added the following in my conf/mapred-site.xml

<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>4</value>
</property>

<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>1</value>
</property>
But when I run the job, its still runs 2 maps(which is default one)? ...

38. Hadoop : Multiple Emits from one Map function    stackoverflow.com

I am writing a small hadoop program in java, my requirement is to do two Emits from a single Map method and handle both the Emits in a single Reduce method. ...

39. Hadoop: How to save Map object in configuration    stackoverflow.com

Any idea how can I set Map object into org.apache.hadoop.conf.Configuration?

40. Hadoop - Creating a single instance of a class for each map() functions inside the Mapper for a particular node    stackoverflow.com

I have a Class something like this in java for hadoop MapReduce

public Class MyClass {
    public static MyClassMapper extends Mapper {
        ...

41. how to insert overwrite a table with a column as map in HIVE    stackoverflow.com

I create 2 tables with the same format CREATE TABLE info(mymap MAP) and CREATE TABLE info_1(mymap MAP) now i managed to load some data into info, and wanna to make info_1 as a dup ...

42. MapReduce Map Tasks Share Input Data    stackoverflow.com

I've recently started looking into the MapReduce/Hadoop framework and am wondering if my problem is truly lends itself to the framework. Consider this. Consider an example where I have a large set ...

43. Difference and relationship between slots, map tasks, data splits, Mapper    stackoverflow.com

I have gone thru few hadoop info books and papers. A Slot is a map/reduce computation unit at a node. it may be map or reduce slot. As far as, i know split ...

44. How to read hadoop sequential file?    stackoverflow.com

I have a sequential file which is the output of hadoop map-reduce job. In this file data is written in key value pairs ,and value itself is a map. I want to read ...

45. Hadoop - One Map and many Reduces    coderanch.com

Hi Chuck Lam, Suppose I have some data and I want process it iteratively grouping for a different key. I think this could be done by running some Hadoop Tasks, but each would have an initial load, that is the initial I/O and the mapping process. My idea was a map once and then do several reduces. Those reduces would emit ...