Skip to content

royopa/python-cobol

Β 
Β 

Repository files navigation

Python COBOL Copybook Parser

PyPI version Python 3.8+ License: GPL-3.0 Code style: black Imports: isort

A modern, well-structured Python library for parsing and processing COBOL Copybook files. This library provides comprehensive support for COBOL data structures including REDEFINES, INDEXED BY, and OCCURS clauses, with robust error handling and extensive test coverage.

✨ Features

  • Complete COBOL Support: Parse REDEFINES, INDEXED BY, and OCCURS statements
  • Modern Python: Type hints, dataclasses, and modern Python patterns
  • Comprehensive Testing: Extensive test suite with high coverage
  • CLI Interface: Command-line tool for processing COBOL files
  • Library API: Easy-to-use Python API for integration
  • Database Ready: Generate database-safe field names
  • Logging: Built-in logging for debugging and monitoring
  • Error Handling: Robust error handling with informative messages

πŸš€ Quick Start

Installation

# Install from PyPI
pip install python-cobol

# Install for development
git clone https://github.com/rodrigo/python-cobol.git
cd python-cobol
pip install -e ".[dev]"

Basic Usage

Command Line Interface

# Process a COBOL file with all features enabled
python-cobol example.cbl

# Skip denormalization
python-cobol example.cbl --skip-denormalize

# Enable verbose logging
python-cobol example.cbl --verbose

# See all options
python-cobol --help

Python API

from python_cobol import process_cobol

# Read and process a COBOL file
with open("example.cbl", "r") as f:
    fields = process_cobol(f.readlines())

# Access field information
for field in fields:
    print(f"Field: {field['name']}, Level: {field['level']}")
    if field['pic']:
        print(f"  PIC: {field['pic']}")
        print(f"  Type: {field['pic_info']['type']}")
        print(f"  Length: {field['pic_info']['length']}")

πŸ“– Documentation

Supported COBOL Features

PIC Clauses

  • Character fields: PIC X(10)
  • Numeric fields: PIC 9(5)
  • Signed fields: PIC S9(5)
  • Decimal fields: PIC 9(5)V99
  • Signed decimal: PIC S9(5)V99

OCCURS Clauses

05 FIELD-1 OCCURS 3 TIMES PIC X(10).
05 GROUP-1 OCCURS 2 TIMES.
   10 SUB-FIELD-1 PIC X(5).
   10 SUB-FIELD-2 PIC 9(3).

REDEFINES Clauses

05 FIELD-1 PIC X(10).
05 FIELD-2 REDEFINES FIELD-1 PIC 9(10).

INDEXED BY Clauses

05 FIELD-1 OCCURS 3 TIMES INDEXED BY IDX-1 PIC X(10).

API Reference

Core Functions

process_cobol(lines: List[str]) -> List[Dict]

Complete processing pipeline that:

  • Cleans COBOL lines
  • Parses field definitions
  • Handles REDEFINES
  • Denormalizes OCCURS
  • Cleans field names
  • Makes names database-safe
parse_pic_string(pic_str: str) -> PicInfo

Parse a PIC clause and return structured information:

pic_info = parse_pic_string('S9(5)V99')
# Returns: PicInfo(type='Signed Float', length=7, precision=2)
clean_cobol(lines: List[str]) -> List[str]

Convert multi-line COBOL statements to single lines.

parse_cobol(lines: List[str]) -> List[Dict]

Parse COBOL lines into structured dictionaries.

denormalize_cobol(lines: List[Dict]) -> List[Dict]

Expand OCCURS clauses into individual fields.

clean_names(lines: List[Dict], **options) -> List[Dict]

Clean field names with options:

  • ensure_unique_names: Add suffixes for uniqueness
  • strip_prefix: Remove prefixes before first dash
  • make_database_safe: Replace dashes with underscores

Data Models

PicInfo

@dataclass
class PicInfo:
    type: str          # 'Char', 'Integer', 'Float', 'Signed Integer', etc.
    length: int        # Total field length
    precision: int     # Decimal places (for numeric fields)

CobolField

@dataclass
class CobolField:
    level: int
    name: str
    pic: Optional[str] = None
    pic_info: Optional[PicInfo] = None
    occurs: Optional[int] = None
    indexed_by: Optional[str] = None
    redefines: Optional[str] = None

πŸ”§ Development

Setup Development Environment

# Clone the repository
git clone https://github.com/rodrigo/python-cobol.git
cd python-cobol

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
make test

# Run tests with coverage
make test-cov

# Run specific test file
python -m pytest tests/test_core.py -v

Code Quality

# Format code
make format

# Run linting
make lint

# Run all checks (format, lint, test)
make check

Project Structure

python-cobol/
β”œβ”€β”€ python_cobol/           # Main package
β”‚   β”œβ”€β”€ __init__.py        # Package initialization
β”‚   β”œβ”€β”€ core.py            # Core parsing functions
β”‚   β”œβ”€β”€ models.py          # Data models
β”‚   β”œβ”€β”€ patterns.py        # Regular expression patterns
β”‚   └── cli.py             # Command-line interface
β”œβ”€β”€ tests/                 # Test suite
β”‚   β”œβ”€β”€ test_core.py       # Core functionality tests
β”‚   β”œβ”€β”€ test_example.py    # Integration tests
β”‚   └── example.cbl        # Test COBOL file
β”œβ”€β”€ pyproject.toml         # Project configuration
β”œβ”€β”€ requirements.txt       # Runtime dependencies
β”œβ”€β”€ requirements-dev.txt   # Development dependencies
β”œβ”€β”€ Makefile              # Development tasks
β”œβ”€β”€ .pre-commit-config.yaml # Code quality hooks
└── README.md             # This file

πŸ“‹ Examples

Example 1: Simple Field Processing

Input COBOL:

01  CUSTOMER-RECORD.
    05  CUSTOMER-ID PIC 9(10).
    05  CUSTOMER-NAME PIC X(50).
    05  CUSTOMER-BALANCE PIC S9(10)V99.

Python Code:

from python_cobol import process_cobol

cobol_lines = [
    "01  CUSTOMER-RECORD.",
    "    05  CUSTOMER-ID PIC 9(10).",
    "    05  CUSTOMER-NAME PIC X(50).",
    "    05  CUSTOMER-BALANCE PIC S9(10)V99."
]

fields = process_cobol(cobol_lines)

for field in fields:
    print(f"{field['name']}: {field['pic_info']['type']}")

Output:

CUSTOMER_RECORD: Group
CUSTOMER_ID: Integer
CUSTOMER_NAME: Char
CUSTOMER_BALANCE: Signed Float

Example 2: OCCURS Processing

Input COBOL:

01  ORDER-RECORD.
    05  ORDER-ITEMS OCCURS 5 TIMES.
        10  ITEM-CODE PIC X(10).
        10  ITEM-QUANTITY PIC 9(3).
        10  ITEM-PRICE PIC S9(7)V99.

Python Code:

from python_cobol import process_cobol

cobol_lines = [
    "01  ORDER-RECORD.",
    "    05  ORDER-ITEMS OCCURS 5 TIMES.",
    "        10  ITEM-CODE PIC X(10).",
    "        10  ITEM-QUANTITY PIC 9(3).",
    "        10  ITEM-PRICE PIC S9(7)V99."
]

fields = process_cobol(cobol_lines)

# Print all denormalized fields
for field in fields:
    print(f"{field['name']}: {field['pic']}")

Output:

ORDER_RECORD: None
ITEM_CODE_1: X(10)
ITEM_QUANTITY_1: 9(3)
ITEM_PRICE_1: S9(7)V99
ITEM_CODE_2: X(10)
ITEM_QUANTITY_2: 9(3)
ITEM_PRICE_2: S9(7)V99
...

Example 3: REDEFINES Processing

Input COBOL:

01  DATA-RECORD.
    05  TEXT-FIELD PIC X(20).
    05  NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20).

Python Code:

from python_cobol import process_cobol

cobol_lines = [
    "01  DATA-RECORD.",
    "    05  TEXT-FIELD PIC X(20).",
    "    05  NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20)."
]

fields = process_cobol(cobol_lines)

# Only NUMERIC-FIELD remains after REDEFINES processing
for field in fields:
    print(f"{field['name']}: {field['pic']}")

Output:

DATA_RECORD: None
NUMERIC_FIELD: 9(20)

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes
  4. Run tests: make test
  5. Run linting: make lint
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

Code Style

This project uses:

  • Black for code formatting
  • isort for import sorting
  • flake8 for linting
  • mypy for type checking

All code should pass these tools before submission.

πŸ“„ License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Original code by Paulus Schoutsen
  • PIC parsing logic inspired by pyCOBOL
  • Community contributors and maintainers

πŸ“ž Support

πŸ”„ Changelog

Version 1.0.0

  • Complete refactoring with modern Python practices
  • Added type hints throughout
  • Improved error handling and logging
  • Enhanced CLI interface
  • Comprehensive test suite
  • Modern project structure with pyproject.toml
  • Pre-commit hooks for code quality
  • Detailed documentation and examples

Version 0.1.4 (Original)

  • Basic COBOL parsing functionality
  • Support for REDEFINES, OCCURS, INDEXED BY
  • Simple CLI interface

About

Python code to parse and denormalize COBOL Copybooks.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 93.2%
  • COBOL 3.8%
  • Makefile 3.0%