Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help to checkout, corupt data? #120

Closed
Slashdacoda opened this issue Apr 12, 2021 · 6 comments
Closed

Need help to checkout, corupt data? #120

Slashdacoda opened this issue Apr 12, 2021 · 6 comments

Comments

@Slashdacoda
Copy link
Contributor

image

image

@kermitt2
Copy link
Owner

Hello @Slashdacoda !

I've just tried and have seen no problem:

lopez@work:~/tmp$ git clone https://github.com/kermitt2/entity-fishing
Cloning into 'entity-fishing'...
remote: Enumerating objects: 511, done.
remote: Counting objects: 100% (511/511), done.
remote: Compressing objects: 100% (291/291), done.
remote: Total 14526 (delta 222), reused 375 (delta 148), pack-reused 14015
Receiving objects: 100% (14526/14526), 611.58 MiB | 1.98 MiB/s, done.
Resolving deltas: 100% (6480/6480), done.
Updating files: 100% (2798/2798), done.
lopez@work:~/tmp$ 

What's your OS and version of git?

You can always try with the zip, you might be luckier, e.g.:

wget https://github.com/kermitt2/entity-fishing/archive/refs/heads/master.zip

@Slashdacoda
Copy link
Contributor Author

Hey @kermitt2

Win 10pro ()
image

A bit troubleshooting:

After updating to 2.31.1 (https://gitforwindows.org), still same in git bash:
image

On Windows Terminal:
image

After installing Cygwin 2.905 (64 bit):
image

I think this is an Windows/Filesystem related problem: https://brendanforster.com/notes/fixing-invalid-git-paths-on-windows/

Some character problem. In my case the msg is:
image

The fix should be related to some path related character, maybe:

https://github.com/kermitt2/entity-fishing/blob/master/data/corpus/corpus-long/wikipedia/RawText/Alfred_Conkling_Coxe%2C_Sr.

why this %2C >> , in a filename?

@Slashdacoda
Copy link
Contributor Author

Slashdacoda commented Apr 13, 2021

Update: on other pc with windows 10 it works, thats wierd^^

Never the less, i will try this steps to fix my enviroment: https://brendanforster.com/notes/fixing-invalid-git-paths-on-windows/

Slashdacoda added a commit to Slashdacoda/entity-fishing that referenced this issue Apr 13, 2021
Slashdacoda added a commit to Slashdacoda/entity-fishing that referenced this issue Apr 13, 2021
Slashdacoda pushed a commit to Slashdacoda/entity-fishing that referenced this issue Apr 13, 2021
@Slashdacoda
Copy link
Contributor Author

Ok, after all, the problem semes to be the last point in the name of the files.

My enviroment can't find the 2 files with this nameshema. I figure it out at the point on recommiting the changed filename:

image

A posible solution is renaming it without a dot at the end of the name. Following this propose i ask myself if

  1. is it enough to rename it, or did we have to chane other things on other section of the project?
  2. why only my enviroment has problems with this nameshema "blalb.c." > identified as an C file

@kermitt2
Copy link
Owner

Hello !

The data you are pointing to come from an external evaluation corpus "Wikipedia" created by:

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from pairwise approach to listwise approach. In Zoubin Ghahramani, editor, Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, volume 227 of ACM International Conference Proceeding Series, pages 129–136. ACM. DOI <https://doi.org/10.1145/1273496.1273513>.

and they use the Wikipedia article name as file name - bad practice for file portability, but it's not our choice.

The file names are referenced in data/corpus/corpus-long/wikipedia/wikipedia.xml, @docName, that's it.

I guess there is no problem to rename these files (this corpus is not very useful beyond old system comparison, and is not updated), just be sure to rename them also in the corresponding wikipedia.xml for consistency... PR welcome ! :)

@Slashdacoda
Copy link
Contributor Author

The checkout problem should be fixed, thx for the support @kermitt2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants