Skip to content

Gaelic-Algorithmic-Research-Group/DataPipeline_Gaelic_Text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Preprocess for new datasets: preGOC, postGOC, or paired data

How to guide

Have a new file? Upload it to the corresponding folder, and job done ✅

  • preGOC: upload it here
  • postGOC: upload it here
  • paired data: make sure they have the same filenames

Output folder: https://github.com/Gaelic-Algorithmic-Research-Group/DataPipeline_Gaelic_Text/tree/main/processed


Folder structure

.
└── Main repo/
    ├── .github/
    │   └── workflows/
    │       └── actions.yml
    ├── rowdata/
    │   ├── preGOC/
    │   │   └── unique_name.txt
    │   ├── postGOC/
    │   │   └── unique_name.txt
    │   └── pairedData/
    │       ├── HERE_YOU_LL_FIND_PAIRED_DATA.MD
    │       ├── preGOC_unique_name.txt
    │       └── postGOC_unique_name.txt
    └── processed/
        ├── preGOC/
        │   └── preGOC_all_single.txt
        ├── postGOC/
        │   └── postGOC_all_single.txt
        └── pairedData/
            ├── HERE_YOU_LL_FIND_PAIRED_DATA.MD
            ├── preGOC_all_paired.txt
            └── postGOC_all_paired.txt

Notes

  • Training Data Shuffling (👍 gradient friendly)
  • TBC
mkdir -p processed/postGOC
mkdir processed/preGOC
mkdir processed/pairedData
mkdir -p rowdata/pairedData
mkdir rowdata/preGOC
mkdir rowdata/postGOC

Releases

No releases published

Packages

No packages published