Skip to content

Commit

Permalink
docs: Add WebHDFS version compatibility details (#4024)
Browse files Browse the repository at this point in the history
add webhdfs version compatibility
  • Loading branch information
shbhmrzd authored Jan 19, 2024
1 parent 1795da7 commit 60dcec9
Showing 1 changed file with 23 additions and 1 deletion.
24 changes: 23 additions & 1 deletion core/src/services/webhdfs/docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,34 @@ This service can be used to:

[Hdfs][crate::services::Hdfs] is powered by HDFS's native java client. Users need to set up the HDFS services correctly. But webhdfs can access from HTTP API and no extra setup needed.

## WebHDFS Compatibility Guidelines

### File Creation and Write

For [File creation and write](https://hadoop.apache.org/docs/r3.1.3/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File) operations,
OpenDAL WebHDFS is optimized for Hadoop Distributed File System (HDFS) versions 2.9 and later.
This involves two API calls in webhdfs, where the initial `put` call to the namenode is redirected to the datanode handling the file data.
The optional `noredirect` flag can be set to prevent redirection. If used, the API response body contains the datanode URL, which is then utilized for the subsequent `put` call with the actual file data.
OpenDAL automatically sets the `noredirect` flag with the first `put` call. This feature is supported starting from HDFS version 2.9.

### Multi-Write Support

OpenDAL WebHDFS supports multi-write operations by creating temporary files in the specified `atomic_write_dir`.
The final concatenation of these temporary files occurs when the writer is closed.
However, it's essential to be aware of HDFS concat restrictions for earlier versions,
where the target file must not be empty, and its last block must be full. Due to these constraints, the concat operation might fail for HDFS 2.6.
This issue, identified as [HDFS-6641](https://issues.apache.org/jira/browse/HDFS-6641), has been addressed in later versions of HDFS.

In summary, OpenDAL WebHDFS is designed for optimal compatibility with HDFS, specifically versions 2.9 and later.



## Configurations

- `root`: The root path of the WebHDFS service.
- `endpoint`: The endpoint of the WebHDFS service.
- `delegation`: The delegation token for WebHDFS.
- `atomic_write_dir`: The tmp write dir of multi write for WebHDFS.
- `atomic_write_dir`: The tmp write dir of multi write for WebHDFS.Needs to be configured for multi write support.

Refer to [`Builder`]'s public API docs for more information.

Expand Down

0 comments on commit 60dcec9

Please sign in to comment.