Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote speedup to top level #102

Merged
merged 59 commits into from
Aug 23, 2020
Merged

Promote speedup to top level #102

merged 59 commits into from
Aug 23, 2020

Conversation

philipmat
Copy link
Owner

@philipmat philipmat commented Aug 16, 2020

This PR promotes the speedup folder to top level replacing the old code.

  • README.md has been updated to reflect these changes.
  • new module folder created: discogsxml2db. parser.py and exporter.py now reside in that folder; there's a run.py at the top-level which takes the role of python export.py...
  • Tested on Windows 10 with Python 3.8.
  • Minimum Python version 3.6

TODO:

  • ensure PEP8 compliance
  • test on mac
  • have a good Travis build on Linux (though it doesn't mean much)
  • add Github actions
  • remove v1.x folder once @ijabz tests it

@philipmat philipmat requested a review from ijabz August 16, 2020 22:58
@philipmat philipmat added this to the v2.0 milestone Aug 16, 2020
@philipmat philipmat linked an issue Aug 16, 2020 that may be closed by this pull request
@ijabz
Copy link
Collaborator

ijabz commented Aug 17, 2020

Noted, I will try to test this on linux at the end of this week.

@philipmat
Copy link
Owner Author

Thank you @ijabz!

What version of python are you running? Do you think it's ok if 3.6 is the minimum we support?

@ijabz
Copy link
Collaborator

ijabz commented Aug 23, 2020

There seems to be a bug with exporting artist so I couldn't complete test:

ubuntu@ip-172-31-39-147:/code/discogs-xml2db$ python3 run.py --export artist . csv-dir
Traceback (most recent call last):
File "run.py", line 23, in
sys.exit(main(arguments))
File "/home/ubuntu/code/discogs-xml2db/discogsxml2db/exporter.py", line 323, in main
dry_run=dry_run)
File "/home/ubuntu/code/discogs-xml2db/discogsxml2db/exporter.py", line 166, in init
('group_member', _group_members, None), )
NameError: name '_group_members' is not defined
ubuntu@ip-172-31-39-147:
/code/discogs-xml2db$

Also some simple nice to to haves would be :

  1. Would be neater if get_latest_dumps actually put files in a dump-dir instead of current dir

  2. csv-dir has to exist before run run.py,
    e.g

ubuntu@ip-172-31-39-147:~/code/discogs-xml2db$ python3 run.py --export label dump-dir csv-dir
Traceback (most recent call last):
File "run.py", line 23, in
sys.exit(main(arguments))
File "/home/ubuntu/code/discogs-xml2db/discogsxml2db/exporter.py", line 324, in main
exporter.export()
File "/home/ubuntu/code/discogs-xml2db/discogsxml2db/exporter.py", line 71, in export
return self.export_from_file(self.openfile())
File "/home/ubuntu/code/discogs-xml2db/discogsxml2db/exporter.py", line 114, in export_from_file
operations = self.build_ops()
File "/home/ubuntu/code/discogs-xml2db/discogsxml2db/exporter.py", line 90, in build_ops
out_file_obj = open_func(os.path.join(self.out_dir, fname), 'wt', newline='', encoding='utf-8')
FileNotFoundError: [Errno 2] No such file or directory: 'csv-dir/label.csv'
would be better if could automatically be created

  1. Would be good if run.py --export release dump-dir csv-dir exported all four files, that seems the default we usually want instead of having to specify all four files, instead it does nothing.

@philipmat
Copy link
Owner Author

Thank you, @ijabz - what version of Python are you running? (python -V)

@ijabz
Copy link
Collaborator

ijabz commented Aug 23, 2020

It says Python 2.7.6 but I do have Python 3 installed as well and the current code with speedup subfolder works for me, exports of Label, Master and Release appear to work.

@philipmat
Copy link
Owner Author

philipmat commented Aug 23, 2020

Sorry, I meant python3 -V - I'm assuming you're running with python3 run.py..., right?

@philipmat
Copy link
Owner Author

@ijabz - could you grab latest and try again, please?

It should also automatically create the whatever path you specify for output, if it doesn't exist (issue #108)

@ijabz
Copy link
Collaborator

ijabz commented Aug 23, 2020

Oh python3 -V using 3.4.3 so I need to update then ?

@philipmat
Copy link
Owner Author

Python 3.4 is no longer an active Python release and on top of that I would like to not have to test and support with older versions.
If at all possible, I'd like 3.6 to be the minimum version to support.

@ijabz
Copy link
Collaborator

ijabz commented Aug 23, 2020

Okay I git pulled and all exports are now failing., I then updated to 3.7 which didn't have any effect away because I had /usr/bin/python3 -> pythong 3.4, changed the link to point to 3.7 but then complained about missing package so tried to run pip but more problems, please see below

ubuntu@ip-172-31-39-147:/code/discogs-xml2db$ sudo pip3 install -r requirements.txt
Traceback (most recent call last):
File "/usr/bin/pip3", line 5, in
from pkg_resources import load_entry_point
File "/usr/lib/python3/dist-packages/pkg_resources.py", line 1479, in
register_loader_type(importlib_bootstrap.SourceFileLoader, DefaultProvider)
AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'
ubuntu@ip-172-31-39-147:
/code/discogs-xml2db$ python3 run.py --export artist csv-dir
Traceback (most recent call last):
File "run.py", line 18, in
from discogsxml2db.exporter import main
File "/home/ubuntu/code/discogs-xml2db/discogsxml2db/exporter.py", line 8, in
from tqdm import tqdm
ModuleNotFoundError: No module named 'tqdm'
ubuntu@ip-172-31-39-147:/code/discogs-xml2db$ sudo apt-get install python3-pip
Reading package lists... Done
Building dependency tree
Reading state information... Done
python3-pip is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 9 not upgraded.
ubuntu@ip-172-31-39-147:
/code/discogs-xml2db$ sudo pip3 install -r requirements.txt
Traceback (most recent call last):
File "/usr/bin/pip3", line 5, in
from pkg_resources import load_entry_point
File "/usr/lib/python3/dist-packages/pkg_resources.py", line 1479, in
register_loader_type(importlib_bootstrap.SourceFileLoader, DefaultProvider)
AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'
ubuntu@ip-172-31-39-147:~/code/discogs-xml2db$

It is important to note here:
a> I am not a python developer/user
b> new users of this library will pick it because of what it does not because written in python so may not be used to python either.
c> It should be easy to setup the python part for all users, not just those knowledgeable about python

@philipmat
Copy link
Owner Author

@ijabz - your points are fair and I will try to account for them.

That being said, the minimum to expect is a working Python 3.6 installation and that includes a functioning pip installer.

I will try to prepare a step by step document for running this project and make use of pyenv as a way to insulate against system python.

Meanwhile, I have created a series of tests that extract a sample of 1,000 records of each of artists, labels, master, records and validate them. The tests pass on Ubuntu latest and Python 3.6, 3.7, 3.8 so I am going to assume this is PR is ready for the prime time.

@philipmat philipmat merged commit b786e12 into develop Aug 23, 2020
@philipmat philipmat deleted the merge_speedup branch August 23, 2020 23:46
@ijabz
Copy link
Collaborator

ijabz commented Aug 24, 2020

Im happy to upgrade 'System Python' to 3.6, I dont need to have multiple Pythons running I just dont understand what is wrong with my Python setup.

@philipmat
Copy link
Owner Author

philipmat commented Aug 24, 2020

@ijabz maybe this will help? https://stackoverflow.com/questions/53407801/how-can-i-adjust-pip3-using-python3-6-not-python3-4-on-ubuntu-14-04

Alternatively, if python3 -V reports 3.6, then you could try python3 -m pip install -r requirements.txt (or sudo that).

@ijabz
Copy link
Collaborator

ijabz commented Sep 5, 2020

FYI I set up a new Ubuntu machine and this came with Python 3.6.9 and now discogs-xml2db is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make speedup subfolder the default and only discogs-xml2db mechanism
2 participants