Skip to content

Refactor Binary File Operations #294

Open
@mali-git

Description

@mali-git

Feature request

Implement a centralized, reusable utility or module for handling binary file operations which can be used across modules. This utility should:

  1. Standardize reading and writing headers, binary data, and indices.
  2. Support modular integration with existing components like EmbeddedStreamData and others.
  3. Reduce code duplication while improving readability and maintainability.

Motivation

Currently, there is duplicated code for reading and writing binary files across multiple modules and functions, including:

  • EmbeddedStreamData
  • PackedDataGenerator
  • LargeFileLinesReader
  • shuffle_tokenized_data()

This redundancy increases maintenance overhead and the risk of inconsistencies. For example, reading headers, writing index data, and handling binary streams are repeated in different forms, leading to potential bugs and inefficiencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions