Skip to content

A list of command line tools for manipulating structured text data

Notifications You must be signed in to change notification settings

markus2330/structured-text-tools

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

What follows is a list of text-based file formats with command line tools for manipulating each (with a focus on Linux).

Table of contents

DSV

Delimiter-separated values, including CSV, TSV, etc.

Awk

Awk is a POSIX-standard command line tool for processing this sort of data.

SQL-based utilities

Name Programming language and database engine Features Usage link License
csvkit Python, SQLite 3 Use header row for column names, custom input and output encoding, custom input field separator, custom output field separator, custom output formatting, CSV JOINs, Python module. Excel and JSON to CSV. CSV to JSON. SQL queries for CSV. Usage MIT
q Python, SQLite 3 Use header row for column names, custom input and output encoding, gzipped input, custom input field separator (string literal), custom output field separator, custom output formatting, table JOINs, Python module. Usage GNU GPL 3
sqawk C, SQLite 3 Use header row for column names, column name aliases, can skip lines until a regexp matches, custom input field separator (string literal, per-file), keep SQLite file, show generated SQL, table JOINs. Usage ?
Sqawk Tcl, SQLite 3 Use header row for column names, custom input field separator (regexp, per-file), custom input record delimiter (regexp, per-file), custom table names, custom output field separator, custom output record separator, merge selected columns into one, ASCII/Unicode table output, CSV input and output, JSON output, Tcl output, table JOINs. Usage MIT
Squawk Python, custom SQL interpreter Access log and CSV input, JSON and CSV output, Python code generation. Three-clause BSD
termsql Python, SQLite 3 Use header rows for column names, custom field separator (regexp), custom record separator (string literal), lines as columns, skip a given number of lines and the beginning and at the end, merge selected columns into one, HTML, CSV, SQL and Tcl output. Manual MIT
textql Go, SQLite 3 Use header rows for column names, keep SQLite file, custom input field separator (string literal). Usage MIT

XML, HTML

  • XMLStarlet

  • xml2 — convert XML and HTML to and from flat, greppable lists of "path=value" statements.

See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.

JSON

Name and link Description
jq A command line tool that implements a functional DSL for creating and manipulating JSON. It can convert JSON to other formats.
jshon Create and manipulate JSON using getopt-style command-line options.
json Similar to JQ, written in JS.
json2 Convert JSON to and from flat, greppable lists of "path=value" statements. Modeled after xml2.

YAML, TOML

Using jq with a format converter like Remarshal appears to be the best option.

INI

Name and link Platform License Description
IniFile (DOS version) Windows (x86, x86-64), MS-DOS Closed-source freeware Can set and remove properties in INI files. Can retrieve properties as a list of batch file set commands to set the corresponding variables. Changes files in place.
crudini Any with Python 2.x GNU GPLv2 Can set and remove properties in INI files. Can retrieve properties as shell script commands to set the corresponding variables. Can output updated INI data or change files in place.
initool Windows, Linux, FreeBSD MIT Can set and remove properties in INI files and check for their existence. Outputs updated INI data.

Configuration files

  • Augeas — can extract data from and modify a number of file formats. However, not all format are equally well supported by Augeas and for some formats only a limited subset of all valid files can be parsed.

Bonus round: CLIs for single-file databases

Name Description File format
GNU Recutils "[A] set of tools and libraries to access human-editable, plain text databases called recfiles." Text-based, roughly "key: value"
SDB "[A] simple string key/value database based on djb's cdb disk storage and supports JSON and arrays introspection." Binary
sqlite3(1) "[A] simple command-line utility [...] that allows the user to manually enter and execute SQL statements against an SQLite database." Binary

License

The contents of this document is licensed under the Creative Commons Attribution 4.0 International License. By contributing you agree to release your contribution under this license.

Disclosure

Sqawk, Remarshal and initool were written by the curator of this document.

About

A list of command line tools for manipulating structured text data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published