Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to round-trip binary data #4

Open
simonw opened this issue Jun 14, 2020 · 6 comments
Open

Ability to round-trip binary data #4

simonw opened this issue Jun 14, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Jun 14, 2020

e.g. for the binary numbits column in the .coverage SQLite database generated by coveragepy.

Those currently end up represented like this:

[4, 1, "b'\\xfe\\xff\\xfd{\\xe0\\x02\\x10\\x00W}o\\xdb{\\xef}o\\xef\\xbd\\xf7\\x92\\xe8\\x00\\x00\\xca\\t\\xe0\\xfb\\xdf\\x07y\\xdb\\xbe\\xf3\\x97s\\xd7\\xd8\\xeb\\x06\\xd9Y\\x16A\\x17\\xe6\\x02\\x02 @\\x08\\x10\\x00\\xbcH\\xc1$@\\xf7}?\\x01\\x04 \\x00\\x00\\x00\\x00\\x04%\\x00\\x04\\x00\\x00\\x00\\x00\\x00<\\x17H\\x00\\x00\\x12 \\xe9\\xc8\\x08\\x00\\x00\\x00\\x00\\x00\\x00@\\x00\\x00\\x00\\xd4M\\xb5\\x18\\x00w\\xd7\\xdd\\xdd\\xb6m\\xba\\xa9\\xe0\\xa7\\xf3Z\\x82\\xfbN\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00$`\\x00\\x04'"]

Once I implement the load command (#3) these will be a problem, because they won't round-trip correctly.

I need some kind of special-case syntax for storing binary values such that they can be round-tripped properly.

@simonw simonw added the enhancement New feature or request label Jun 14, 2020
@simonw
Copy link
Owner Author

simonw commented Jun 14, 2020

The .metadata.json file may be the place to do this. Right now the accompanying line_bits.metadata.json for the above table looks like this:

{
    "name": "line_bits",
    "columns": [
        "file_id",
        "context_id",
        "numbits"
    ],
    "schema": "CREATE TABLE line_bits (\n    -- If recording lines, a row per context per file executed.\n    -- All of the line numbers for that file/context are in one numbits.\n    file_id integer,            -- foreign key to `file`.\n    context_id integer,         -- foreign key to `context`.\n    numbits blob,               -- see the numbits functions in coverage.numbits\n    foreign key (file_id) references file (id),\n    foreign key (context_id) references context (id),\n    unique (file_id, context_id)\n)"
}

I could use this to say "the third column is binary, so treat it as such" somehow.

@simonw
Copy link
Owner Author

simonw commented Jun 14, 2020

Maybe columns could store type information:

    "columns": [
        ["file_id", "integer"],
        ["context_id", "integer"],
        ["numbits", "blob"]
    ]

@simonw
Copy link
Owner Author

simonw commented Jun 14, 2020

Here's how sqlite3 .coverage .dump outputs this data:

INSERT INTO line_bits VALUES(1,1,X'0e');
INSERT INTO line_bits VALUES(2,1,X'5a');
INSERT INTO line_bits VALUES(3,1,X'36218410420821841042');
INSERT INTO line_bits VALUES(4,1,X'fefffd7be0021000577d6fdb7bef7d6fefbdf792e80000ca09e0fbdf0779dbbef39773d7d8eb06d959164117e602022040081000bc48c12440f77d3f010420000000000425000400000000003c174800001220e9c80800000000000040000000d44db5180077d7ddddb66dbaa9e0a7f35a82fb4e0000000000000002000000000024600004');

@simonw
Copy link
Owner Author

simonw commented Jun 14, 2020

I can accompany this with a parametrized test that covers all of the other SQLite types as well.

@johnnyutahh
Copy link

johnnyutahh commented Feb 24, 2022

Hello @simonw -- I love this project, thanks for making it happen. Is this issue essentially tracking the attempt to properly make a load command, specially to handle dump-and-load of binary data stored in sqlite?

My main usage goal: enable more-efficient git a) storage and b) diff-abiliity of sqlite databases. (Yes, git-based.)

Additional questions:

  1. are there alternatives to sqlite-diffable other than https://stackoverflow.com/a/21789167/605356 ?
  2. Is there anything I can do to help implement a load command?

Additional reference (for my sake):
https://news.ycombinator.com/item?id=25004913

fyi. The following is my environment's data after installing sqlite-diffable today:

$ sqlite-diffable --version
sqlite-diffable, version 0.2.1
$
$ sqlite-diffable --help
Usage: sqlite-diffable [OPTIONS] COMMAND [ARGS]...

  Tools for dumping/loading a SQLite database to diffable directory structure

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  dump
$
$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.15.7
BuildVersion:	19H1713
$
$ date
Wed Feb 23 21:41:32 CST 2022
$
$ sqlite-diffable load
Usage: sqlite-diffable [OPTIONS] COMMAND [ARGS]...
Try 'sqlite-diffable --help' for help.

Error: No such command 'load'.
$

@johnnyutahh
Copy link

Checking in - any update(s) on this topic/issue/discussion? ( @simonw )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants