-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conformance-profiles: Database table and worker to process data #78
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,25 @@ | ||
# Profile Normalised Data | ||
|
||
Not Written Yet | ||
This will profile all normalised data against the data profiles and store the results in the database. | ||
|
||
To run this: | ||
|
||
`$ node ./src/bin/profile-normalised-data.js` | ||
|
||
It can be stopped at any time and it will not leave the database in a bad state or lose to much work. | ||
|
||
When restarted it will pick up where it left off. | ||
|
||
## Database Storage & Errors | ||
|
||
It will store the results of this in the `normalised_data_profile_results` table. | ||
|
||
Rows are stored per item of normalised data and per profile, | ||
so you should expect this table to have 3 times as many rows as `normalised_data` (if there are 3 profiles). | ||
|
||
For any data profile and normalised data item, there are 4 states: | ||
|
||
* no row - we haven't tried to run the check yet | ||
* a row with `checked=FALSE` - we tried to run the check but it went wrong. See `error_checking_message`. | ||
* a row with `checked=TRUE` and nothing in `results` - we checked it and the data passed the check! | ||
* a row with `checked=TRUE` and things in `results` - we checked it and the data failed the check. See `results`. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
#!/usr/bin/env node | ||
import profile_normalised_data_all from '../lib/profile-normalised-data.js'; | ||
|
||
|
||
profile_normalised_data_all(); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
CREATE TABLE normalised_data_profile_results ( | ||
normalised_data_id BIGINT NOT NULL, | ||
profile_name TEXT NOT NULL, | ||
checked BOOLEAN NOT NULL, | ||
error_checking_message TEXT NULL, | ||
results JSONB NULL, | ||
PRIMARY KEY(normalised_data_id, profile_name), | ||
CONSTRAINT normalised_data_profile_results_normalised_data_id FOREIGN KEY (normalised_data_id) REFERENCES normalised_data(id) | ||
); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
import { database_pool } from './database.js'; | ||
import Settings from './settings.js'; | ||
import apply_data_profile from './util-data-profile.js'; | ||
|
||
|
||
|
||
async function profile_normalised_data_all() { | ||
for(var profile_name of Settings.dataProfiles) { | ||
// not await - run each profile at once | ||
profile_normalised_data_all_for_profile(profile_name); | ||
}; | ||
} | ||
|
||
async function profile_normalised_data_all_for_profile(profile_name) { | ||
|
||
const select_sql = 'SELECT normalised_data.* FROM normalised_data '+ | ||
'LEFT JOIN normalised_data_profile_results '+ | ||
'ON normalised_data_profile_results.normalised_data_id = normalised_data.id AND normalised_data_profile_results.profile_name=$1 '+ | ||
'WHERE normalised_data_profile_results.normalised_data_id IS NULL AND normalised_data.data_deleted=FALSE '+ | ||
'ORDER BY normalised_data.updated_at ASC LIMIT 10'; | ||
|
||
while(true) { | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I usually try to make sure there is some kind of safety catch in a while true, might be worth adding if there is way to do that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Point - The catch() block should have break in it - that will apply in several places tho so I'll do that in another P.R. |
||
let rows = [] | ||
|
||
// Step 1 - load data to process | ||
// we open and make sure we CLOSE the database connection after this, so the DB connection is not held open when processing in an unneeded manner | ||
const client = await database_pool.connect(); | ||
try { | ||
const res_find_raw_data = await client.query(select_sql, [profile_name]); | ||
if (res_find_raw_data.rows.length == 0) { | ||
break; | ||
} | ||
// Make sure we just store raw data and no database cursors, etc | ||
for (var raw_data of res_find_raw_data.rows) { | ||
rows.push(raw_data) | ||
} | ||
} catch(error) { | ||
console.error("ERROR validate_raw_data_all"); | ||
console.error(error); | ||
} finally { | ||
client.release() | ||
} | ||
|
||
// Step 2 - process each item of data we got | ||
for (var raw_data of rows) { | ||
await profile_normalised_data_for_item_for_profile(raw_data, profile_name); | ||
} | ||
|
||
} | ||
|
||
} | ||
|
||
async function profile_normalised_data_for_item_for_profile(normalised_data, profile_name) { | ||
|
||
//console.log("Profiling Normalised Data id "+ normalised_data.id + " for Profile " + profile_name); | ||
|
||
const results = await apply_data_profile(normalised_data.data, profile_name); | ||
|
||
const client = await database_pool.connect(); | ||
try { | ||
if (results.done) { | ||
await client.query( | ||
"INSERT INTO normalised_data_profile_results (normalised_data_id, profile_name, checked, results) VALUES ($1, $2, TRUE, $3)", | ||
[normalised_data.id, profile_name, JSON.stringify(results.results)] | ||
); | ||
} else { | ||
await client.query( | ||
"INSERT INTO normalised_data_profile_results (normalised_data_id, profile_name, checked, error_checking_message) VALUES ($1, $2, FALSE, $3)", | ||
[normalised_data.id, profile_name, results.error] | ||
); | ||
} | ||
} catch(error) { | ||
console.error("ERROR profile_normalised_data_for_item_for_profile"); | ||
console.error(error); | ||
} finally { | ||
client.release() | ||
} | ||
|
||
} | ||
|
||
|
||
export { | ||
profile_normalised_data_all, | ||
}; | ||
|
||
export default profile_normalised_data_all; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use back ticks ` for multi line strings rather than concatenation at run time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ta