Skip to content

Latest commit

 

History

History
120 lines (82 loc) · 3.8 KB

README.rst

File metadata and controls

120 lines (82 loc) · 3.8 KB

What is all this for?

This is a small library (and some test programs) whose purpose help use Hadoop's HDFS C API with C++ programs.

Most classes follow a RAII idiom and, when errors are detected, exceptions are thrown.

The code is hosted on http://github.com/tmacam/libhdfscpp and is distributed under the Apache 2.0 license.

Dependencies

  • zlib (tested with ubuntu version 1:1.2.3.3.dfsg-7ubuntu1)

    For Debian and Ubuntu installing the package zlib1g-dev should do.

  • Hadoop HDFS (version == 0.18.3)

    AFAIK, we are only compatible with this particular version of Hadoop --- at least with this version libhdfs C API.

  • libhdfs

    Although this is part of the Hadoop distribution and despite the fact that pre-built binaries for this lib are shipped with Hadoop distribution, it is better to rebuild this library from source.

    Here is how:

    • libhdfs source code is inside src/c++/libhdfs/, in Hadoop's distribution.

    • Inside the extras directory in libhdfscpp distribution there are two patches. Copy Makefile.diff to libhdfs source directory.

    • Go to libhdfs directory and apply Makefile's patch:

      cd $HADOOP_HOME/src/c++/libhdfs
      cp Makefile Makefile.orig
      patch Makefile < Makefile.diff
      
    • Edit Makefile you just copied. Modify the parts between # BEGIN tmacam and # END tmacam to make sense for your installation settings. In my last install, I was compiling it for a i386 installation, so, this is what I got in that section of the file:

      # BEGIN tmacam
      # for i386 use OS_ARCH=i386 and N_BITS=32
      OS_ARCH=i386
      N_BITS=32
      # for amd64 us OS_ARCH=amd64 and N_BITS=64
      #OS_ARCH=amd64
      #N_BITS=64
      PLATFORM=linux
      JAVA_HOME=/usr/lib/jvm/java-6-sun
      LIBHDFS_DEST_DIR=/usr/local/libhdfs-0.18.3/
      LIBHDFS_BUILD_DIR=$(LIBHDFS_DEST_DIR)/lib
      LIBHDFS_INCLUDES_DIR=$(LIBHDFS_DEST_DIR)/lib
      #
      # execute 'make help' to see build instructions
      #
      # END tmacam
      
    • To build it, first run make help. It will produce an output similar to this:

      LD_LIBRARY_PATH='/usr/lib/jvm/java-6-sun/jre/lib/i386/server/' make
      LD_LIBRARY_PATH='/usr/lib/jvm/java-6-sun/jre/lib/i386/server/' make install
      

      All you have to do is run these commands.

    • Notice: this Makefile expects a JDK from Sun.

Building

Edit libhdfscpp's Makefile, applying changes similar to those made while building libhdfs. In sum, you have to configure OS_ARCH, PLATFORM, JAVA_HOME, N_BITS and LIBHDFS_HOME. Those settings are right at the top of the Makefile.

How to run those programs

There are lots of library dependencies, libraries not in the default search path, classes not in Java's default CLASSPATH --- it is a royal PITA to run programs compiled with libhdfs and, for that matter, with libhdfscpp. Thus, we recommend running programs that use libhdfscpp as follow:

LD_LIBRARY_PATH=$(cat LDLIBPATH.txt) CLASSPATH=$(cat CLASSPATH.txt) ./hdfsread /user/hadoop/friends-txt-DELME2/part-00001

Notice that both LDLIBPATH.txt and CLASSPATH.txt are produced by libhdfscpp's Makefile.

About pkg-config support

We produce a simple but functional .pc use with pkg-config.

libHDFS references

# vim:syn=rst et ai sts=4 ts=4 sw=4 tw=72: