A modern, well-structured Python library for parsing and processing COBOL Copybook files. This library provides comprehensive support for COBOL data structures including REDEFINES, INDEXED BY, and OCCURS clauses, with robust error handling and extensive test coverage.
- Complete COBOL Support: Parse REDEFINES, INDEXED BY, and OCCURS statements
- Modern Python: Type hints, dataclasses, and modern Python patterns
- Comprehensive Testing: Extensive test suite with high coverage
- CLI Interface: Command-line tool for processing COBOL files
- Library API: Easy-to-use Python API for integration
- Database Ready: Generate database-safe field names
- Logging: Built-in logging for debugging and monitoring
- Error Handling: Robust error handling with informative messages
# Install from PyPI
pip install python-cobol
# Install for development
git clone https://github.com/rodrigo/python-cobol.git
cd python-cobol
pip install -e ".[dev]"
# Process a COBOL file with all features enabled
python-cobol example.cbl
# Skip denormalization
python-cobol example.cbl --skip-denormalize
# Enable verbose logging
python-cobol example.cbl --verbose
# See all options
python-cobol --help
from python_cobol import process_cobol
# Read and process a COBOL file
with open("example.cbl", "r") as f:
fields = process_cobol(f.readlines())
# Access field information
for field in fields:
print(f"Field: {field['name']}, Level: {field['level']}")
if field['pic']:
print(f" PIC: {field['pic']}")
print(f" Type: {field['pic_info']['type']}")
print(f" Length: {field['pic_info']['length']}")
- Character fields:
PIC X(10)
- Numeric fields:
PIC 9(5)
- Signed fields:
PIC S9(5)
- Decimal fields:
PIC 9(5)V99
- Signed decimal:
PIC S9(5)V99
05 FIELD-1 OCCURS 3 TIMES PIC X(10).
05 GROUP-1 OCCURS 2 TIMES.
10 SUB-FIELD-1 PIC X(5).
10 SUB-FIELD-2 PIC 9(3).
05 FIELD-1 PIC X(10).
05 FIELD-2 REDEFINES FIELD-1 PIC 9(10).
05 FIELD-1 OCCURS 3 TIMES INDEXED BY IDX-1 PIC X(10).
Complete processing pipeline that:
- Cleans COBOL lines
- Parses field definitions
- Handles REDEFINES
- Denormalizes OCCURS
- Cleans field names
- Makes names database-safe
Parse a PIC clause and return structured information:
pic_info = parse_pic_string('S9(5)V99')
# Returns: PicInfo(type='Signed Float', length=7, precision=2)
Convert multi-line COBOL statements to single lines.
Parse COBOL lines into structured dictionaries.
Expand OCCURS clauses into individual fields.
Clean field names with options:
ensure_unique_names
: Add suffixes for uniquenessstrip_prefix
: Remove prefixes before first dashmake_database_safe
: Replace dashes with underscores
@dataclass
class PicInfo:
type: str # 'Char', 'Integer', 'Float', 'Signed Integer', etc.
length: int # Total field length
precision: int # Decimal places (for numeric fields)
@dataclass
class CobolField:
level: int
name: str
pic: Optional[str] = None
pic_info: Optional[PicInfo] = None
occurs: Optional[int] = None
indexed_by: Optional[str] = None
redefines: Optional[str] = None
# Clone the repository
git clone https://github.com/rodrigo/python-cobol.git
cd python-cobol
# Install in development mode
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run all tests
make test
# Run tests with coverage
make test-cov
# Run specific test file
python -m pytest tests/test_core.py -v
# Format code
make format
# Run linting
make lint
# Run all checks (format, lint, test)
make check
python-cobol/
βββ python_cobol/ # Main package
β βββ __init__.py # Package initialization
β βββ core.py # Core parsing functions
β βββ models.py # Data models
β βββ patterns.py # Regular expression patterns
β βββ cli.py # Command-line interface
βββ tests/ # Test suite
β βββ test_core.py # Core functionality tests
β βββ test_example.py # Integration tests
β βββ example.cbl # Test COBOL file
βββ pyproject.toml # Project configuration
βββ requirements.txt # Runtime dependencies
βββ requirements-dev.txt # Development dependencies
βββ Makefile # Development tasks
βββ .pre-commit-config.yaml # Code quality hooks
βββ README.md # This file
Input COBOL:
01 CUSTOMER-RECORD.
05 CUSTOMER-ID PIC 9(10).
05 CUSTOMER-NAME PIC X(50).
05 CUSTOMER-BALANCE PIC S9(10)V99.
Python Code:
from python_cobol import process_cobol
cobol_lines = [
"01 CUSTOMER-RECORD.",
" 05 CUSTOMER-ID PIC 9(10).",
" 05 CUSTOMER-NAME PIC X(50).",
" 05 CUSTOMER-BALANCE PIC S9(10)V99."
]
fields = process_cobol(cobol_lines)
for field in fields:
print(f"{field['name']}: {field['pic_info']['type']}")
Output:
CUSTOMER_RECORD: Group
CUSTOMER_ID: Integer
CUSTOMER_NAME: Char
CUSTOMER_BALANCE: Signed Float
Input COBOL:
01 ORDER-RECORD.
05 ORDER-ITEMS OCCURS 5 TIMES.
10 ITEM-CODE PIC X(10).
10 ITEM-QUANTITY PIC 9(3).
10 ITEM-PRICE PIC S9(7)V99.
Python Code:
from python_cobol import process_cobol
cobol_lines = [
"01 ORDER-RECORD.",
" 05 ORDER-ITEMS OCCURS 5 TIMES.",
" 10 ITEM-CODE PIC X(10).",
" 10 ITEM-QUANTITY PIC 9(3).",
" 10 ITEM-PRICE PIC S9(7)V99."
]
fields = process_cobol(cobol_lines)
# Print all denormalized fields
for field in fields:
print(f"{field['name']}: {field['pic']}")
Output:
ORDER_RECORD: None
ITEM_CODE_1: X(10)
ITEM_QUANTITY_1: 9(3)
ITEM_PRICE_1: S9(7)V99
ITEM_CODE_2: X(10)
ITEM_QUANTITY_2: 9(3)
ITEM_PRICE_2: S9(7)V99
...
Input COBOL:
01 DATA-RECORD.
05 TEXT-FIELD PIC X(20).
05 NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20).
Python Code:
from python_cobol import process_cobol
cobol_lines = [
"01 DATA-RECORD.",
" 05 TEXT-FIELD PIC X(20).",
" 05 NUMERIC-FIELD REDEFINES TEXT-FIELD PIC 9(20)."
]
fields = process_cobol(cobol_lines)
# Only NUMERIC-FIELD remains after REDEFINES processing
for field in fields:
print(f"{field['name']}: {field['pic']}")
Output:
DATA_RECORD: None
NUMERIC_FIELD: 9(20)
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes
- Run tests:
make test
- Run linting:
make lint
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
This project uses:
- Black for code formatting
- isort for import sorting
- flake8 for linting
- mypy for type checking
All code should pass these tools before submission.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
- Original code by Paulus Schoutsen
- PIC parsing logic inspired by pyCOBOL
- Community contributors and maintainers
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: GitHub Wiki
- Complete refactoring with modern Python practices
- Added type hints throughout
- Improved error handling and logging
- Enhanced CLI interface
- Comprehensive test suite
- Modern project structure with pyproject.toml
- Pre-commit hooks for code quality
- Detailed documentation and examples
- Basic COBOL parsing functionality
- Support for REDEFINES, OCCURS, INDEXED BY
- Simple CLI interface