updating Lyon after finding out that many recordings were misattributed (in progress) #309
Replies: 3 comments 9 replies
-
next, I fixed the metadata # working on the metadata
# I took metadata.py from bergelson (within EL1000)
mkdir -p metadata/original/
cp /Users/valentinthouzeau/Documents/projets/SES_language/data_EL1000/lyon/lyon_metadata.csv metadata/original/.
# 20211018 we had generated a new base for the metadata in extra/errors/lyon_EL1000_metadata.csv
# it didn't have all the needed info, so created a script to generate it:
python scripts/fix_metadata.py
# then adapted metadata.py
pip install git+ssh://[email protected]:/EL1000/tools.git --upgrade
python scripts/metadata.py lyon extra/errors/lyon_EL1000_metadata_sibs.csv
child-project compute-durations .
datalad save . -m "metadata"
datalad push
still left to double check
still left to do
|
Beta Was this translation helpful? Give feedback.
-
To do the next batch of actions, it's better to be on the cluster, so I did get(ok): recordings/raw/e20130311_153528_008344.wav (file) [from origin...] There are 48 recs total, looks like 39 were successfully downloaded, and 9 failed. And yet, relaunching the same command doesn't do anything. That is, no error, but also no success output:
Just in case, I tried the next step:
The failure comes from an expection of _1 and _2. I think this means I should do the split of recs in the same session before the conversion, and not after. I'll pick this up later. |
Beta Was this translation helpful? Give feedback.
-
done:
|
Beta Was this translation helpful? Give feedback.
-
We found out that many of the its and wavs associated to specific children-days were wrong, so Lyon needs a major rehaul. In this S&T I'll explain my reasoning in the early stages of correction -- namely correcting the .its, audio recordings, vtc, and metadata. I'm not going to cover fixing of the human annotations.
git checkout -b newrecs
scripts/creation.sh
to take notes there1I haven't been able to correctly do the metadata, so I cannot yet import annotations. Also missing is analyses using vtc, to be done on oberon, once the data are downloaded over there.
Footnotes
I could have created a new script, but I thought that would be even more confusing, since after this rehaul, there will be basically no trace of the old created dataset ↩
Beta Was this translation helpful? Give feedback.
All reactions