Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garble cherry pick #47

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 17 additions & 6 deletions garble.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,17 @@ def garble_pii(args):
source_timestamp == meta_timestamp
), "Metadata creation date does not match pii file timestamp"

metadata["garble_time"] = datetime.now().isoformat()
garble_time = datetime.now()

with open(Path("output") / metadata_file_name, "w+") as metafile:
json.dump(metadata, metafile, indent=2)
metadata["garble_time"] = garble_time.isoformat()

timestamp = datetime.strftime(garble_time, TIMESTAMP_FMT)

with open(metadata_file, "w") as original_metafile:
json.dump(metadata, original_metafile, indent=2)
Comment on lines +116 to +117
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the critical piece here -- not sure why the other changes were made, it seems like renaming the other files breaks in linkage-agent-tools. Can you revert the other changes and just do this? Or help me understand what the other changes are needed for

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dehall The other changes are changes we'd have to make in anticipation of merging PR #32 in linkage-agent-tools. Those changes have been present as a part of this PR since before it was cherry-picked into a new PR but I'm not sure if there is any discussion there about them. Either way, they won't break linkage-agent-tools once This PR is merged

Copy link
Collaborator

@dehall dehall Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I try with this branch on data-owner-tools and that branch on linkage-agent-tools I still get errors, related to the metadata file name in validate.py:

$ py validate.py
site_a.zip
Traceback (most recent call last):
  File "/Users/dehall/linkage-agent-tools/validate.py", line 67, in <module>
    do_validate(config)
  File "/Users/dehall/linkage-agent-tools/validate.py", line 23, in do_validate
    missing, unexpected, metadata_issues = c.validate_all_present()
  File "/Users/dehall/linkage-agent-tools/dcctools/config.py", line 129, in validate_all_present
    metadata_issues.extend(self.validate_metadata(system_zip_path))
  File "/Users/dehall/linkage-agent-tools/dcctools/config.py", line 59, in validate_metadata
    timestamp = datetime.strptime(mname, "%Y%m%dT%H%M%S")
  File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '' does not match format '%Y%m%dT%H%M%S'

And in projects.py, I believe this is because the clk file name format doesn't match what's expected:

$ py projects.py
Traceback (most recent call last):
  File "/Users/dehall/linkage-agent-tools/projects.py", line 120, in <module>
    run_projects(config, args.project)
  File "/Users/dehall/linkage-agent-tools/projects.py", line 61, in run_projects
    run_project(c, timestamp, project_name)
  File "/Users/dehall/linkage-agent-tools/projects.py", line 87, in run_project
    project.upload_clks(system, c.get_clks_raw(system, project_name))
  File "/Users/dehall/linkage-agent-tools/dcctools/config.py", line 221, in get_clks_raw
    with clk_zip.open(project_file) as clk_file:
UnboundLocalError: local variable 'project_file' referenced before assignment

Rather than try to solve all these together, let's just do one piece at a time, so for this PR let's just keep the necessary snippet and revisit the rest separately.


with open("output/metadata.json", "w+") as fp:
json.dump(metadata, fp, indent=2)

secret = validate_secret_file(secret_file)
individuals_secret = derive_subkey(secret, "individuals")
Expand All @@ -125,23 +132,27 @@ def garble_pii(args):
"The following schema uses doubleHash, which is insecure: " + str(s)
)
output_file = Path(args.outputdir) / os.path.basename(s)

outfile = str(output_file).replace(".json", f"{timestamp}.json")

subprocess.run(
[
"anonlink",
"hash",
source_file,
individuals_secret,
str(s),
str(output_file),
outfile,
],
check=True,
)
clk_files.append(output_file)
clk_files.append(Path(outfile))
validate_clks(clk_files, metadata_file)
return clk_files + [Path(f"output/{metadata_file_name}")]
return clk_files + [Path("output/metadata.json")]


def create_output_zip(clk_files, args):
print(args.outputdir, args.outputzip)
with ZipFile(os.path.join(args.outputdir, args.outputzip), "w") as garbled_zip:
for output_file in clk_files:
garbled_zip.write(output_file)
Expand Down