HDFS « hadoop « Java Database Q&A





1. Hadoop: map/reduce from HDFS    stackoverflow.com

I may be wrong, but all(?) examples I've seen with Apache Hadoop takes as input a file stored on the local file system (e.g. org.apache.hadoop.examples.Grep) Is there a way to load and ...

2. CloudStore vs. HDFS    stackoverflow.com

Does anyone have any familiarity with working with both CloudStore and HDFS. I am interested to see how far CloudStore has been scaled and how heavily it has been ...

3. Writing data to Hadoop    stackoverflow.com

I need to write data in to Hadoop (HDFS) from external sources like a windows box. Right now I have been copying the data onto the namenode and using HDFS's put ...

4. Where HDFS stores files locally by default?    stackoverflow.com

I am running hadoop with default configuration with one-node cluster, and would like to find where HDFS stores files locally. Any ideas? Thanks.

5. Is it possible to use Avro with Hadoop 0.20?    stackoverflow.com

I'm interested in using Avro to save and read files from Hadoop HDFS and I saw some Jira's in Hadoop issue tracker regarding implementing support for Avro but there were no ...

6. Is it possible to run Hadoop in Pseudo-Distributed operation without HDFS?    stackoverflow.com

I'm exploring the options for running a hadoop application on a local system. As with many applications the first few releases should be able to run on a single node, as long ...

7. Which is the easiest way to combine small HDFS blocks?    stackoverflow.com

I'm collecting logs with Flume to the HDFS. For the test case I have small files (~300kB) because the log collecting process was scaled for the real usage. Is there any easy ...

8. Difference between 'distcp' and 'distcp -update'?    stackoverflow.com

What is the difference between

hadoop distcp
and
hadoop distcp -update
Both of them would do the same work with only slight difference in how we call them. None of them overwrites an already ...

9. hadoop copy directory    stackoverflow.com

Is there an hdfs API that can copy an entire local directory to the HDFS ? I found an API for copying files but is there one for directories ?





10. File blocks on HDFS    stackoverflow.com

Does Hadoop guarantee that different blocks from same file will be stored on different machines in the cluster? Obviously replicated blocks will be on different machines.

11. Managing hdfs in psuedo-distributed hadoop mode    stackoverflow.com

I want to do some computation with hadoop and mahout on my quad core machine, so I am using hadoop in pseudo-distributed mode. The problem is that the space ...

12. Hadoop, hardware and bioinformatics    stackoverflow.com

We're about to buy new hardware to run our analyses and are wondering if we're making the right decisions. The setting:
We're a bioinformatics lab that will be handling DNA sequencing data. The ...

13. How to read a file from HDFS in a non-Java client    stackoverflow.com

So my MR Job generates a report file, and that file needs to be able to be downloaded by an end-user who needs to click a button on a normal web ...

14. How ?an I be sure that data is distributed evenly across the hadoop nodes?    stackoverflow.com

If I copy data from local system to HDFS, ?an I be sure that it is distributed evenly across the nodes? PS HDFS guarantee that each block will be stored at 3 ...

15. How to store the actual name of a /*url*?    stackoverflow.com

I'm converting a script to HDFS (Hadoop) and I have this cmd:

    tail -n+$indexedPlus1 $seedsDir/*url* | head -n$it_size > $it_seedsDir/urls
With HDFS I need to get the file using ...

16. hadoop NullPointerException    stackoverflow.com

I was trying to setup a multi node cluster of hadoop michael-noll's way using two computers. When I tried to format the hdfs it showed a NullPointerException.

hadoop@psycho-O:~/project/hadoop-0.20.2$ bin/start-dfs.sh
starting namenode, ...





17. Hadoop HDFS maximum file size    stackoverflow.com

A colleague of mine thinks that HDFS has no maximum file size, i.e., by partitioning into 128 / 256 meg chunks any file size can be stored (obviously the HDFS disk ...

18. Moving files in Hadoop using the Java API?    stackoverflow.com

I want to move files around in HDFS using the Java APIs. I cannot figure out a way to do this. The FileSystem class only seems to want to ...

19. How to keep a flat file on HDFS in sync with a large database table?    stackoverflow.com

What's the best way of keeping a flat file on HDFS in sync with a large database table which may have row updates? Tools such as sqoop seem like they'd be useful ...

20. HDFS: Using HDFS API to append to a SequenceFile    stackoverflow.com

I've been trying to create and maintain a Sequence File on HDFS using the Java API without running a MapReduce job as a setup for a future MapReduce job. I ...

21. Programmatically reading the output of Hadoop Mapreduce Program    stackoverflow.com

This may be a basic question, but I could not find an answer for it on Google.
I have a map-reduce job that creates multiple output files in its output directory. My Java ...

22. Hadoop/Pig regular expression matching    stackoverflow.com

This is kind of an odd situation, but I'm looking for a way to filter using something like MATCHES but on a list of unknown patterns (of unknown length). That is, if ...

23. MapReduce shuffle/sort method    stackoverflow.com

Somewhat of an odd question, but does anyone know what kind of sort MapReduce uses in the sort portion of shuffle/sort? I would think merge or insertion (in keeping with ...

24. Exception while executing hadoop job remotely    stackoverflow.com

I am trying to execute a Hadoop job on a remote hadoop cluster. Below is my code.

Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://server:9000/");
conf.set("hadoop.job.ugi", "username");

Job job = new Job(conf, "Percentil Ranking");
job.setJarByClass(PercentileDriver.class);
job.setMapperClass(PercentileMapper.class);
job.setReducerClass(PercentileReducer.class);
job.setMapOutputKeyClass(TestKey.class);
job.setMapOutputValueClass(TestData.class);
job.setOutputKeyClass(TestKey.class);
job.setOutputValueClass(BaselineData.class);

job.setOutputFormatClass(SequenceFileOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(inputPath));

FileOutputFormat.setOutputPath(job, ...

25. How to adapt bin/hdfs for executing from outside $HADOOP_HOME/bin?    stackoverflow.com

I'm trying to modify the hdfs script so that it still functions although not located in $HADOOP_HOME/bin anymore, but when I execute the modified hdfs I get:

hdfs: line 110: ...

26. HadoopFS (HDFS) as distributive file storage    stackoverflow.com

I'm consider to use HDFS as horizontal scaling file storage system for our client video hosting service. My main concern that HDFS wasn't developed for this needs this is more "an ...

27. Hadoop fully distributed mode    stackoverflow.com

I am a newbie to Hadoop. I have managed to develop a simple Map/Reduce application that works fine in 'pseudo distributed mode'.I want to test that in 'fully distributed mode'. I ...

28. Hadoop JUnit testing writing/reading to/from the hdfs    stackoverflow.com

I have written a class(es) that writes and reads from hdfs. Given certain conditions that are occurring when these classes are instantiated they create a specific path and file, and ...

29. Export data from database and write to HDFS(hadoop fs)    stackoverflow.com

Now i am trying to export data from a db table, and write it into hdfs. And the problem is: will the name node become bottleneck? and how is the machanism, will ...

30. Looking for overall review on Hadoop    stackoverflow.com

I am looking for some performance review on Hadoop (300-600 boxes cluster, commodity hardware), especially on the following aspects:

  1. High concurrent read & write
  2. Web crawling
  3. Mapreduce, parallel computing
  4. Inverted index

31. What is the maximum number of files allowed in a HDFS directory?    stackoverflow.com

What is the maximum number of files and directories allowed in a HDFS (hadoop) directory?

32. Is it possible to append to HDFS file from multiple clients in parallel?    stackoverflow.com

Basically whole question is in the title. I'm wondering if it's possible to append to file located on HDFS from multiple computers simultaneously? Something like storing stream of events constantly produced ...

33. Uploading large gzipped data files to HDFS    stackoverflow.com

I have a use case where I want to upload big gzipped text data files (~ 60 GB) on HDFS. My code below is taking about 2 hours to upload these files ...

34. Why can't hadoop split up a large text file and then compress the splits using gzip?    stackoverflow.com

I've recently been looking into hadoop and HDFS. When you load a file into HDFS, it will normally split the file into 64MB chunks and distribute these chunks around your cluster. ...

35. Indexing a HDFS sequence file    stackoverflow.com

What is the best library/way of indexing a very large sequence file (millions of key/value pairs where each value can be of a different length so you cannot have a random ...

36. Trying to use Fuse to mount HDFS. Can't compile libhdfs    stackoverflow.com

I'm attempting to compile libhdfs (a native shared library that allows external apps to interface with hdfs). It's one of the few steps I have to take to mount Hadoop's hdfs ...

37. Programmatic equivalent of 'hadoop fs -tail -f'    stackoverflow.com

I want to tail an hdfs file programmatically using the org.apache.hadoop.fs.FileSystem API. Is there a way to tail the file using the API in a way which is equivalent to hadoop fs ...

38. Parallel Copy to HDFS    stackoverflow.com

What is the best and fast way to achieve parallel copy to hadoop from an NFS mount? We have a mount with huge number of files and we need to copy it ...

39. Using HierarchicalINIConfiguration class on HDFS    stackoverflow.com

I need to parse the ini file (this is the configuration file with sections) located on HDFS.
I am thinking to use HierarchicalINIConfiguration class from org.apache.commons.configuration. There are following constructors:

 HierarchicalINIConfiguration(File file) ...

40. Hadoop: compress file in HDFS?    stackoverflow.com

I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the ...

41. HDFS path changing when trying to update files in HDFS    stackoverflow.com

I am new to Hadoop and HDFS, so maybe it is something I am doing wrong when I copy from local (Ubuntu 10.04) to HDFS on a single node on localhost. ...

42. setCompressOutput in Hadoop    stackoverflow.com

When should use and not to use FileOutputFormat.setCompressOutput(conf, true);? I heard that it compresses mapper output. Is there any possibility to compress reducer side output? (If my assumption is wrong, please clear me, how ...

43. Running Hadoop MapReduce, is it possible to call external executables outside of HDFS    stackoverflow.com

Within my mapper I'd like to call external software installed on the worker node outside of the HDFS. Is this possible? What is the best way to do this? I ...

44. Does HDFS encrypt or compress the data while storing?    stackoverflow.com

When I put a file into HDFS, for example

$ ./bin/hadoop/dfs -put /source/file input
  • Is the file compressed while storing?
  • Is the file encrypted while storing? Is there a config setting that we can ...

45. How to check whether a file exists or not using hdfs shell commands    stackoverflow.com

am new to hadoop and a small help is required. Suppose if i ran the job in background using shell scripting, how do i know whether the job is completed or not. ...

46. LeaseExpiredException: No lease error on HDFS    stackoverflow.com

I am trying to load large data to HDFS and I sometimes get the error below. any idea why? The error:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /data/work/20110926-134514/_temporary/_attempt_201109110407_0167_r_000026_0/hbase/site=3815120/day=20110925/107-107-3815120-20110926-134514-r-00026 File does not exist. Holder DFSClient_attempt_201109110407_0167_r_000026_0 ...

47. HDFS replication factor    stackoverflow.com

When I'm uploading a file to HDFS, if I set the replication factor to 1 then the file splits gonna reside on one single machine or the splits would be distributed ...

48. Getting data in and out of Elastic MapReduce HDFS    stackoverflow.com

I've written a Hadoop program which requires a certain layout within HDFS, and which afterwards, I need to get the files out of HDFS. It works on my single-node ...

49. hadoop api configuration on the client machine    stackoverflow.com

ultra-noob. I have a server machine with cdh3u1 pseudo-distrib, and a client machine with a java application using the cdh3u1 API. How do I configure the client to talk to the ...

50. Difference between hadoop fs -put and hadoop fs -copyFromLocal    stackoverflow.com

-put and -copyFromLocal are documented as identical, while most examples use the verbose variant -copyFromLocal. Why? Same thing for -get and -copyToLocal

51. How can I access hadoop via the hdfs protocol from java?    stackoverflow.com

I found a way to connect to hadoop via hftp, and it works fine, (read only) :

    uri = "hftp://172.16.xxx.xxx:50070/";

    System.out.println( "uri: " + ...

52. how do we compare a localfile and hdfs file for consistency    stackoverflow.com

    public String getDirs() throws IOException{
        fs=FileSystem.get(conf);
        fs.copyFromLocalFile(new Path("/private/tmp/as"), new Path("/test"));
    ...

53. Writing to HDFS : File is overwritten    stackoverflow.com

I am writing to hadoop file system. But everytime I append something, it overwrites the data instead of adding it to the existing data/file. The code which is doing this is ...

54. Under-replicated blocks count is inaccurate, buy why?    stackoverflow.com

I am getting wildly varying reports of under-replicated blocked. I am wondering what's causing this. hadoop dfsadmin -metasave reports ~232,000 MISSING blocks awaiting replication. How do I fix this? Jobs run ...

55. Hadoop: Compressing output of Map-only job    stackoverflow.com

I have a a map-only job that outputs in TextOutputFormat. I currently see three ways of compressing my output: 1) by defining map to compress through mapred.compress.map.output.* 2) by defining output to compress through ...

56. Using FileInputFormat.addInputPaths to recursively add HDFS path    stackoverflow.com

I've got a HDFS structure something like

a/b/file1.gz
a/b/file2.gz
a/c/file3.gz
a/c/file4.gz
I'm using the classic pattern of
FileInputFormat.addInputPaths(conf, args[0]);
to set my input path for a java map reduce job. This works fine if I specify args[0] as ...

57. how to read a file from HDFS through browser    stackoverflow.com

How to provide a link a HDFS file, so that clicking on that url it will downlaod the HDFS file.. Please provide me the inputs.. Thanks MRK

58. Need to get rid of part-m-0000* files in HDFS    stackoverflow.com

In HDFS processing after each job empty files are created with names like part-m-0000*. Each of these files are empty but they are consuming 64MB of disk space because that is ...

59. Hadoop: Performance degradation when increasing block sizes?    stackoverflow.com

Has anyone seen any performance degradation when increasing the block size in Hadoop? We're setting up a cluster and we're expecting a large amount of data (100s of GBs) coming in ...

60. Compression in Hadoop Sequence File    stackoverflow.com

I have some basic questions about the hadoop sequential file. 1) To what extent the default compression codec compresses the file? 2) I have hadoop sequence file of 100 MB when i read ...

61. Hadoop libhdfs test running issue - Operation not permitted    stackoverflow.com

I'm using Hadoop 0.20.3. When running the hdfs_test of libhdfs library, I'm getting the following errors: 1.

 Exception in thread "main" org.apache.hadoop.util.Shell$ExitCodeException: chgrp: changing group of `/tmp/testfile.txt': Operation not permitted

at org.apache.hadoop.util.Shell.runCommand(Shell.java:195)
at org.apache.hadoop.util.Shell.run(Shell.java:134)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:286)
at ...