Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Separate the DAL and MPI functionality - Add new DDL for multi-table …
…schema for MPI (#828) * Initial commit - this is cleaning up some of the connection stuff as well as prepping for adding schema migrations * WIP - still a LOT to do * More progress - making parallel DAL/MPI to existing * Create config.py * Got the tests working - IT WORKS! * Restored original MPI (postgres.py) but left it in the mpi folder - removed print statements * restored tests for previous happy path * additional clean-up - added new table schema into MPI to be able to test in parallel until we are ready to cut over * Added to do statements * minor changes * fixed issue with message parser out-puting "" instead of null * renamed to generic dal * Removed message-parser changes - split out into a separate branch/PR * restructure folders for linkage * Adding tests * Update test_mpi.py * Update utils.py * Update utils.py * adding tests for the DAL * fix format and added a few more tests * cleaned up some docstrings in the google style and added argument types * run black * Added more tests and commented out some code that is currently not used * added tests and commenting out code which will be updated in another ticket to get test coverage up * Update postgres_mpi.py * Fixed testing errors * LOTS of changes - Finally got it all working - will need to clean up and add more tests * Delete result.py * flake8 fixes * Commented out code that will be changed and added a test case * Update postgres_mpi.py * removed print statements that aren't needed * general clean-up - added TODOs and removed commented out code * moved select transaction functionality into the dal - added tests * renamed dal to be DataAccessLayer * Updated Schema with most recent changes - initialize all the new tables in the DAL - create mechanism to get all tables necessary for blocking * adding and fixing tests with updated schema * Fixed some DDL errors and added/updated tests to use new schema * fixed all errors in tests and updated tests for DAL * fixed some ddl bugs - created the base query for data for linkage purposes - started working on CTE queries in the generate_block_query function * Creating functions to simplify generating the CTE queries and attaching them to base - mostly done - just need to handle for given name scenario * cte queries added to base query - just need to write some tests to validate all this and we should be good to go * Made some minor tweaks to fix some minor bugs - testing query generator in MPI indirectly * tested mpi functionality in current state and added test cases * Update test_mpi.py * change city & state to earthly entities * update patient bundle * remove Zakera Ward and Citadel * update for LoL blocks, not LoLoL * change tests for LoL output from blocking, not LoLoL * simplified the code base by a lot - added some tests and fixed some tests - still working on given_name issue * Completed the given_name scenario all handled by using the meta data from the tables and FKs to generate query - added tests to verify * Update postgres_mpi.py * Update postgres_mpi.py * lots of additional functions to simplify the overall workflow * temp change * add new MPI connector to linkage tests * condense address to LoL and expand to include city, state, and zip * Update postgres_mpi.py * Made some modifications to the dal and updated tests cases - all are passing again * Added more helper functions - updated MPI - just need to wrap up patient inserts and add tests * temp update * Created new core for updated functions and cleaned up functions - some updates/changes to help with implementing MPI in link.py * Fixed all bugs and have all tests passing for DAL and MPI with additional changes - still need to finish multi part insert for patient * Fixed all tests - still need to add a few more tests and finish the full insert * more helper functions and inserts almost done - tests still required * INSERTS are COMPLETE!! Just need to add more tests and fix one test * added additional tests and bolstering for utils and config functions * Moved new DAL and MPI as main and deleted old code, fixed bugs, added tests * Tests all working - code is working * minor clean-up * Update dal.py * Update test.yaml * changed name of test file - weird errors during pick-up testing * Updating comments to make the functions more clear and a bit of clean up * more doc comments updated * more comments added. * Some basic clean-up and reformatting * Added/updated comments - removed some TODOs - other general clean-up * changes based on testing w Brady * Update phdi/linkage/core.py Co-authored-by: Marcelle <[email protected]> * Added some tests - updated some tests - updated comments based upon feedback * Fixed comments and expanded names as suggested in PR * expanding table names * fixed tests * Fix _get_mpi_values and rename to _extract_patient_data * update to match * temp * refer to self * add function for correctly extracting given_names into their own table * add tests for extract_given_name * add tests for _get_mpi_records and update for None values in given names * pass patient_ids to appropriate tables * update tests for passing in name_id * Implement transformations and inserts in the new MPI using the new flattened schema (#840) * updated migration name * Removed unused Utils - fixed tests - added ordering to records in dict * provide example in comments * flake 8 fixes * fixed bug in identifier section of fhir paths * Update test_mpi.py * Remove sorting * added sort capability to dal to insert tables based upon requirement from DB and FKs - updated tests * updated identifier.value to identifier.patient_identifier - fixed a bug for finding MRN in the organize block query * Renamed postgres_mpi.py to just mpi.py - fixed some linting issues - fixed tests * remove test junk * fix None and name_id generation issue * implement new MPI for testing * tests for correct matching person and patient IDs * removed dummy function * update enhanced algo test * fix null test to have a match * add TODO * update where connector client is imported from * Fixed a couple of bugs in the MPI - still running into issue with PK issues (I think this is where we are inserting name with same PK value) * Fixed another bug. New name_id for every instance of name in patient record * fix indent to allow multiple blocking passes * simplify tests * rip Brandon's easter eggs * modify test_multi_element_blocking for new MPI schema * remove obsolete function * set the matched if a person_id is supplied or external person id is supplied and a person record is found with it * Update test_mpi_utils.py * Update test_mpi_utils.py * removed test items * remove old MPI connection env vars * replace old client with new one * remove old db table names * point to feature branch * clean up linkage function * rename schema file * update birthdate dob change * Updated subquery for given_name to combine all the given_names into a single line separated by a space - ordering by given_name_index * Update test_mpi.py * Remove \u200b characters * file rename * rename and remove old schema * update schema to allow 7 varchar for `sex` instead of 3 * added helper function to convert date -> str; updated test for functions * removed for flake8 * delete unused func * turn off formatting * added string version of date to test data * made datetime_to_str more flexible, new tests * long str -> multi line * add additional patient resources to test file for seeding * add file to seeind data in local db * remove extra break * Threshold log odds weights (#873) * Threshold log odds weights * Fix log-odds scoring * Another odds update * Conjoin given name * clean up * clean up pt 2 * remove _clean_up * add additonal patients * make generic connector client name * finished adding tests for multiple given names * linebreak for docstring * name change * name change part II * fix base connector client name * fix base connector client name * generalize connector client name * remove copy * testing * change default external_source_id to IRIS * add LAC as external source * generalize description * remove on conflict * add IRIS external source data * fix quoting * add external_source_id * fix index to refer to id, not name * clean up testing * add check for external_source_id existing * Update containers/record-linkage/app/main.py * add flake8 ignore --------- Co-authored-by: robertmitchellv <[email protected]> Co-authored-by: m-goggins <[email protected]> Co-authored-by: Marcelle <[email protected]> Co-authored-by: DanPaseltiner <[email protected]> Co-authored-by: Nick Clyde <[email protected]> Co-authored-by: bamader <[email protected]>
- Loading branch information