Skip to content

Commit

Permalink
Fixed some typos and some light touch refactor.
Browse files Browse the repository at this point in the history
  • Loading branch information
mihaeladuta committed Feb 6, 2024
1 parent d0c0ae4 commit d367c5f
Show file tree
Hide file tree
Showing 14 changed files with 190 additions and 146 deletions.
Binary file removed data/raw/.DS_Store
Binary file not shown.
8 changes: 4 additions & 4 deletions logs/EnvironmentStatementsInstitutionLevel.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
2024-02-06 14:48:15,335 [INFO] EnvironmentStatementsInstitutionLevel - read data from 'data/processed/environment_statements/extracted/institution/'
2024-02-06 14:48:15,335 [INFO] EnvironmentStatementsInstitutionLevel - statements: 143, sections: 4
2024-02-06 14:48:15,782 [INFO] EnvironmentStatementsInstitutionLevel - prepared institution statements: 143 records, 5 columns
2024-02-06 14:48:15,828 [INFO] EnvironmentStatementsInstitutionLevel - write dataset to 'data/processed/environment_statements/prepared/EnvironmentStatementsInstitutionLevel.parquet'
2024-02-06 15:34:18,160 [INFO] EnvironmentStatementsInstitutionLevel - read data from 'data/processed/environment_statements/extracted/institution/'
2024-02-06 15:34:18,160 [INFO] EnvironmentStatementsInstitutionLevel - statements: 143, sections: 4
2024-02-06 15:34:18,607 [INFO] EnvironmentStatementsInstitutionLevel - prepared institution statements: 143 records, 5 columns
2024-02-06 15:34:18,655 [INFO] EnvironmentStatementsInstitutionLevel - write dataset to 'data/processed/environment_statements/prepared/EnvironmentStatementsInstitutionLevel.parquet'
8 changes: 4 additions & 4 deletions logs/EnvironmentStatementsUnitLevel.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
2024-02-06 14:48:15,314 [INFO] EnvironmentStatementsUnitLevel - read data from 'data/processed/environment_statements/extracted/unit/'
2024-02-06 14:48:15,314 [INFO] EnvironmentStatementsUnitLevel - statements: 1874, sections: 4
2024-02-06 14:48:33,869 [INFO] EnvironmentStatementsUnitLevel - prepared statements: 1874 records
2024-02-06 14:48:34,266 [INFO] EnvironmentStatementsUnitLevel - write dataset to 'data/processed/environment_statements/prepared/EnvironmentStatementsUnitLevel.parquet'
2024-02-06 15:34:18,145 [INFO] EnvironmentStatementsUnitLevel - read data from 'data/processed/environment_statements/extracted/unit/'
2024-02-06 15:34:18,145 [INFO] EnvironmentStatementsUnitLevel - statements: 1874, sections: 4
2024-02-06 15:34:37,064 [INFO] EnvironmentStatementsUnitLevel - prepared statements: 1874 records
2024-02-06 15:34:37,510 [INFO] EnvironmentStatementsUnitLevel - write dataset to 'data/processed/environment_statements/prepared/EnvironmentStatementsUnitLevel.parquet'
20 changes: 10 additions & 10 deletions logs/ImpactCaseStudies.log
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
2024-02-06 14:48:19,039 [INFO] ImpactCaseStudies - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 14:48:19,581 [INFO] ImpactCaseStudies - parsed sheet: 6361 records
2024-02-06 14:48:19,582 [INFO] ImpactCaseStudies - rename 'Main panel' to 'Main panel code'
2024-02-06 14:48:19,584 [INFO] ImpactCaseStudies - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 14:48:19,585 [INFO] ImpactCaseStudies - add columns for panel names
2024-02-06 14:48:19,586 [INFO] ImpactCaseStudies - shift columns from title to the left to fix raw data issue
2024-02-06 14:48:22,744 [INFO] ImpactCaseStudies - replace styling characters in ['1. Summary of the impact', '2. Underpinning research', '3. References to the research', '4. Details of the impact', '5. Sources to corroborate the impact']
2024-02-06 14:48:22,746 [INFO] ImpactCaseStudies - drop columns '['Formal partners', 'Countries', '2. Underpinning research', '5. Sources to corroborate the impact', 'Researcher ORCIDs', 'Main panel code', 'Global research identifiers', 'Grant funding', 'Unit of assessment number', '3. References to the research', 'Institution UKPRN code', 'Is continued from 2014']'
2024-02-06 14:48:22,774 [INFO] ImpactCaseStudies - make categorical ['Joint submission', 'Unit of assessment name', 'Main panel name', 'Multiple submission letter', 'Institution name', 'Multiple submission name']
2024-02-06 14:48:22,980 [INFO] ImpactCaseStudies - write dataset to 'data/processed/sheets/ImpactCaseStudies.parquet'
2024-02-06 15:34:21,959 [INFO] ImpactCaseStudies - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 15:34:22,544 [INFO] ImpactCaseStudies - parsed sheet: 6361 records
2024-02-06 15:34:22,545 [INFO] ImpactCaseStudies - rename 'Main panel' to 'Main panel code'
2024-02-06 15:34:22,546 [INFO] ImpactCaseStudies - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 15:34:22,548 [INFO] ImpactCaseStudies - add columns for panel names
2024-02-06 15:34:22,549 [INFO] ImpactCaseStudies - shift columns from title to the left to fix raw data issue
2024-02-06 15:34:25,546 [INFO] ImpactCaseStudies - replace styling characters in ['1. Summary of the impact', '2. Underpinning research', '3. References to the research', '4. Details of the impact', '5. Sources to corroborate the impact']
2024-02-06 15:34:25,549 [INFO] ImpactCaseStudies - drop columns '['Researcher ORCIDs', 'Is continued from 2014', '5. Sources to corroborate the impact', '2. Underpinning research', '3. References to the research', 'Countries', 'Institution UKPRN code', 'Global research identifiers', 'Formal partners', 'Unit of assessment number', 'Grant funding', 'Main panel code']'
2024-02-06 15:34:25,564 [INFO] ImpactCaseStudies - make categorical ['Multiple submission letter', 'Institution name', 'Main panel name', 'Unit of assessment name', 'Multiple submission name', 'Joint submission']
2024-02-06 15:34:25,819 [INFO] ImpactCaseStudies - write dataset to 'data/processed/sheets/ImpactCaseStudies.parquet'
24 changes: 12 additions & 12 deletions logs/Outputs.log
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
2024-02-06 14:48:19,045 [INFO] Outputs - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 14:48:35,874 [INFO] Outputs - parsed sheet: 185353 records
2024-02-06 14:48:35,897 [INFO] Outputs - rename 'Main panel' to 'Main panel code'
2024-02-06 14:48:35,928 [INFO] Outputs - rename 'Output type' to 'Output type code'
2024-02-06 14:48:35,975 [INFO] Outputs - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 14:48:36,016 [INFO] Outputs - add columns for panel names
2024-02-06 14:48:36,754 [INFO] Outputs - replace styling characters in ['Title']
2024-02-06 14:48:36,791 [INFO] Outputs - add columns for output types names
2024-02-06 14:48:36,795 [INFO] Outputs - make output year categorical
2024-02-06 14:48:36,836 [INFO] Outputs - drop columns '['Main panel code', 'Output type code', 'Institution UKPRN code', 'Unit of assessment number']'
2024-02-06 14:48:36,849 [INFO] Outputs - make categorical ['Propose double weighting', 'Multiple submission letter', 'Non-English', 'Forensic science', 'Multiple submission name', 'Research group', 'Delayed by COVID19', 'Institution name', 'Joint submission', 'Is reserve output', 'Open access status', 'Citations applicable', 'Interdisciplinary', 'Unit of assessment name', 'Criminology', 'Cross-referral requested', 'Output type', 'Main panel name']
2024-02-06 14:48:37,173 [INFO] Outputs - write dataset to 'data/processed/sheets/Outputs.parquet'
2024-02-06 15:34:21,959 [INFO] Outputs - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 15:34:38,633 [INFO] Outputs - parsed sheet: 185353 records
2024-02-06 15:34:38,656 [INFO] Outputs - rename 'Main panel' to 'Main panel code'
2024-02-06 15:34:38,690 [INFO] Outputs - rename 'Output type' to 'Output type code'
2024-02-06 15:34:38,737 [INFO] Outputs - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 15:34:38,777 [INFO] Outputs - add columns for panel names
2024-02-06 15:34:39,528 [INFO] Outputs - replace styling characters in ['Title']
2024-02-06 15:34:39,563 [INFO] Outputs - add columns for output types names
2024-02-06 15:34:39,567 [INFO] Outputs - make output year categorical
2024-02-06 15:34:39,612 [INFO] Outputs - drop columns '['Output type code', 'Institution UKPRN code', 'Unit of assessment number', 'Main panel code']'
2024-02-06 15:34:39,626 [INFO] Outputs - make categorical ['Propose double weighting', 'Forensic science', 'Open access status', 'Unit of assessment name', 'Delayed by COVID19', 'Interdisciplinary', 'Output type', 'Multiple submission letter', 'Joint submission', 'Is reserve output', 'Non-English', 'Criminology', 'Multiple submission name', 'Research group', 'Institution name', 'Citations applicable', 'Main panel name', 'Cross-referral requested']
2024-02-06 15:34:39,938 [INFO] Outputs - write dataset to 'data/processed/sheets/Outputs.parquet'
18 changes: 9 additions & 9 deletions logs/ResearchDoctoralDegreesAwarded.log
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
2024-02-06 14:48:19,038 [INFO] ResearchDoctoralDegreesAwarded - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 14:48:19,115 [INFO] ResearchDoctoralDegreesAwarded - parsed sheet: 1888 records
2024-02-06 14:48:19,115 [INFO] ResearchDoctoralDegreesAwarded - rename 'Main panel' to 'Main panel code'
2024-02-06 14:48:19,116 [INFO] ResearchDoctoralDegreesAwarded - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 14:48:19,117 [INFO] ResearchDoctoralDegreesAwarded - add columns for panel names
2024-02-06 14:48:19,117 [INFO] ResearchDoctoralDegreesAwarded - calculate total number of degrees awarded
2024-02-06 14:48:19,118 [INFO] ResearchDoctoralDegreesAwarded - drop columns '['Main panel code', 'Institution UKPRN code', 'Unit of assessment number']'
2024-02-06 14:48:19,118 [INFO] ResearchDoctoralDegreesAwarded - make categorical ['Joint submission', 'Multiple submission letter', 'Institution name', 'Multiple submission name', 'Main panel name', 'Unit of assessment name']
2024-02-06 14:48:19,135 [INFO] ResearchDoctoralDegreesAwarded - write dataset to 'data/processed/sheets/ResearchDoctoralDegreesAwarded.parquet'
2024-02-06 15:34:21,959 [INFO] ResearchDoctoralDegreesAwarded - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 15:34:22,043 [INFO] ResearchDoctoralDegreesAwarded - parsed sheet: 1888 records
2024-02-06 15:34:22,043 [INFO] ResearchDoctoralDegreesAwarded - rename 'Main panel' to 'Main panel code'
2024-02-06 15:34:22,044 [INFO] ResearchDoctoralDegreesAwarded - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 15:34:22,045 [INFO] ResearchDoctoralDegreesAwarded - add columns for panel names
2024-02-06 15:34:22,046 [INFO] ResearchDoctoralDegreesAwarded - calculate total number of degrees awarded
2024-02-06 15:34:22,047 [INFO] ResearchDoctoralDegreesAwarded - drop columns '['Unit of assessment number', 'Institution UKPRN code', 'Main panel code']'
2024-02-06 15:34:22,047 [INFO] ResearchDoctoralDegreesAwarded - make categorical ['Main panel name', 'Multiple submission name', 'Multiple submission letter', 'Institution name', 'Joint submission', 'Unit of assessment name']
2024-02-06 15:34:22,061 [INFO] ResearchDoctoralDegreesAwarded - write dataset to 'data/processed/sheets/ResearchDoctoralDegreesAwarded.parquet'
18 changes: 9 additions & 9 deletions logs/ResearchGroups.log
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
2024-02-06 14:48:19,039 [INFO] ResearchGroups - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 14:48:19,093 [INFO] ResearchGroups - parsed sheet: 2036 records
2024-02-06 14:48:19,093 [INFO] ResearchGroups - rename 'Main panel' to 'Main panel code'
2024-02-06 14:48:19,095 [INFO] ResearchGroups - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 14:48:19,097 [INFO] ResearchGroups - add columns for panel names
2024-02-06 14:48:19,097 [INFO] ResearchGroups - make group code categorical
2024-02-06 14:48:19,098 [INFO] ResearchGroups - drop columns '['Main panel code', 'Unit of assessment number', 'Institution UKPRN code']'
2024-02-06 14:48:19,098 [INFO] ResearchGroups - make categorical ['Unit of assessment name', 'Main panel name', 'Institution name', 'Joint submission', 'Multiple submission name', 'Multiple submission letter']
2024-02-06 14:48:19,113 [INFO] ResearchGroups - write dataset to 'data/processed/sheets/ResearchGroups.parquet'
2024-02-06 15:34:21,961 [INFO] ResearchGroups - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 15:34:22,022 [INFO] ResearchGroups - parsed sheet: 2036 records
2024-02-06 15:34:22,022 [INFO] ResearchGroups - rename 'Main panel' to 'Main panel code'
2024-02-06 15:34:22,023 [INFO] ResearchGroups - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 15:34:22,027 [INFO] ResearchGroups - add columns for panel names
2024-02-06 15:34:22,028 [INFO] ResearchGroups - make group code categorical
2024-02-06 15:34:22,028 [INFO] ResearchGroups - drop columns '['Institution UKPRN code', 'Unit of assessment number', 'Main panel code']'
2024-02-06 15:34:22,028 [INFO] ResearchGroups - make categorical ['Main panel name', 'Unit of assessment name', 'Joint submission', 'Multiple submission letter', 'Multiple submission name', 'Institution name']
2024-02-06 15:34:22,042 [INFO] ResearchGroups - write dataset to 'data/processed/sheets/ResearchGroups.parquet'
18 changes: 9 additions & 9 deletions logs/ResearchIncome.log
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
2024-02-06 14:48:19,040 [INFO] ResearchIncome - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 14:48:20,220 [INFO] ResearchIncome - parsed sheet: 28637 records
2024-02-06 14:48:20,221 [INFO] ResearchIncome - rename 'Main panel' to 'Main panel code'
2024-02-06 14:48:20,227 [INFO] ResearchIncome - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 14:48:20,230 [INFO] ResearchIncome - add columns for panel names
2024-02-06 14:48:20,233 [INFO] ResearchIncome - make income source categorical
2024-02-06 14:48:20,233 [INFO] ResearchIncome - drop columns '['Main panel code', 'Unit of assessment number', 'Institution UKPRN code']'
2024-02-06 14:48:20,234 [INFO] ResearchIncome - make categorical ['Unit of assessment name', 'Main panel name', 'Multiple submission letter', 'Multiple submission name', 'Institution name', 'Joint submission']
2024-02-06 14:48:20,259 [INFO] ResearchIncome - write dataset to 'data/processed/sheets/ResearchIncome.parquet'
2024-02-06 15:34:21,987 [INFO] ResearchIncome - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 15:34:23,245 [INFO] ResearchIncome - parsed sheet: 28637 records
2024-02-06 15:34:23,245 [INFO] ResearchIncome - rename 'Main panel' to 'Main panel code'
2024-02-06 15:34:23,252 [INFO] ResearchIncome - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 15:34:23,255 [INFO] ResearchIncome - add columns for panel names
2024-02-06 15:34:23,258 [INFO] ResearchIncome - make income source categorical
2024-02-06 15:34:23,259 [INFO] ResearchIncome - drop columns '['Institution UKPRN code', 'Unit of assessment number', 'Main panel code']'
2024-02-06 15:34:23,259 [INFO] ResearchIncome - make categorical ['Main panel name', 'Multiple submission name', 'Institution name', 'Multiple submission letter', 'Joint submission', 'Unit of assessment name']
2024-02-06 15:34:23,285 [INFO] ResearchIncome - write dataset to 'data/processed/sheets/ResearchIncome.parquet'
18 changes: 9 additions & 9 deletions logs/ResearchIncomeInKind.log
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
2024-02-06 14:48:19,038 [INFO] ResearchIncomeInKind - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 14:48:19,229 [INFO] ResearchIncomeInKind - parsed sheet: 4093 records
2024-02-06 14:48:19,230 [INFO] ResearchIncomeInKind - rename 'Main panel' to 'Main panel code'
2024-02-06 14:48:19,231 [INFO] ResearchIncomeInKind - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 14:48:19,232 [INFO] ResearchIncomeInKind - add columns for panel names
2024-02-06 14:48:19,233 [INFO] ResearchIncomeInKind - make income source categorical
2024-02-06 14:48:19,233 [INFO] ResearchIncomeInKind - drop columns '['Institution UKPRN code', 'Unit of assessment number', 'Main panel code']'
2024-02-06 14:48:19,233 [INFO] ResearchIncomeInKind - make categorical ['Unit of assessment name', 'Multiple submission name', 'Joint submission', 'Institution name', 'Main panel name', 'Multiple submission letter']
2024-02-06 14:48:19,248 [INFO] ResearchIncomeInKind - write dataset to 'data/processed/sheets/ResearchIncomeInKind.parquet'
2024-02-06 15:34:21,960 [INFO] ResearchIncomeInKind - read sheet from 'data/raw/REF-2021-Submissions-All-2022-07-27.xlsx'
2024-02-06 15:34:22,161 [INFO] ResearchIncomeInKind - parsed sheet: 4093 records
2024-02-06 15:34:22,161 [INFO] ResearchIncomeInKind - rename 'Main panel' to 'Main panel code'
2024-02-06 15:34:22,163 [INFO] ResearchIncomeInKind - replace '['/', ':']' with '_' in 'Institution name'
2024-02-06 15:34:22,164 [INFO] ResearchIncomeInKind - add columns for panel names
2024-02-06 15:34:22,164 [INFO] ResearchIncomeInKind - make income source categorical
2024-02-06 15:34:22,165 [INFO] ResearchIncomeInKind - drop columns '['Institution UKPRN code', 'Unit of assessment number', 'Main panel code']'
2024-02-06 15:34:22,165 [INFO] ResearchIncomeInKind - make categorical ['Unit of assessment name', 'Institution name', 'Multiple submission name', 'Multiple submission letter', 'Main panel name', 'Joint submission']
2024-02-06 15:34:22,178 [INFO] ResearchIncomeInKind - write dataset to 'data/processed/sheets/ResearchIncomeInKind.parquet'
Loading

0 comments on commit d367c5f

Please sign in to comment.