Skip to content

Commit

Permalink
UTF-8 release
Browse files Browse the repository at this point in the history
- bump version
- update readme
- add utf-8 convert option
- make sure files are parsed as utf-8
- pull out playlist name extractor
- first update outline & tests with m3u8 extension
- add overall test
- validate file extension
- validate empty playlist file
  • Loading branch information
radujica authored Jan 22, 2021
2 parents 96ea90a + e70b73c commit 8821833
Show file tree
Hide file tree
Showing 17 changed files with 251 additions and 32 deletions.
8 changes: 8 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,12 @@ Check out the Makefile for the common commands.

Publishing to pypi is handled through github releases and action,
though version in setup.py needs to be manually bumped.

# Current ideas

- Use more data from the playlist or even the song tags themselves to more accurately find songs, e.g. song duration,
year, album artist
- Try removing the "the" from song names; `The Wolven Storm` is not the same as `Wolven Storm`
- Compute some similarity metrics after finding matches on Spotify; taking the example above, atm can't know which
one to search first but after seeing the artist from each attempt, one could deduce the correct one

74 changes: 69 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# Convert local playlist to Spotify playlist
# Convert local playlists to Spotify playlists

![Build, Test, Lint](https://github.com/radujica/tospotify/workflows/Build,%20Test,%20Lint/badge.svg)
[![PyPI version](https://badge.fury.io/py/tospotify.svg)](https://badge.fury.io/py/tospotify)

Currently works for m3u files; m3u8 support to come!
Supports m3u and m3u8 files in the [Extended format](https://en.wikipedia.org/wiki/M3U) encoded as UTF-8.

Take a look [below](#help) for more details and debugging tips.

## Usage

usage: tospotify [-h] [-v] [--public] [--playlist-id PLAYLIST_ID]
usage: tospotify [-h] [--verbose] [--public] [--convert]
[--playlist-id PLAYLIST_ID]
spotify_username playlist_path

Create/update a Spotify playlist from a local m3u playlist
Expand All @@ -20,8 +23,9 @@ Currently works for m3u files; m3u8 support to come!

optional arguments:
-h, --help show this help message and exit
-v, --verbose print all the steps when searching for songs
--verbose print all the steps when searching for songs
--public playlist is public, otherwise private
--convert convert from locale default to utf-8
--playlist-id PLAYLIST_ID
do not create a new playlist, instead update the
existing playlist with this id
Expand Down Expand Up @@ -56,4 +60,64 @@ Currently works for m3u files; m3u8 support to come!

### Windows
Same as linux but use `set` instead of `export`



## Help

### Encoding

Seeing unexpected characters in the log messages is a sign of faulty encoding.

This tool uses [m3u8](https://github.com/globocom/m3u8) library to parse the files, which relies on utf-8.

Encoding can be checked by opening the playlist in a text editor, such as Notepad++.
If the playlist is in a different encoding,
try using the `--convert` argument which will attempt to convert it to utf-8.

Alternatively, could try importing the playlist into your music player and using its export function
to export as utf-8, if it exists. AIMP, for example, can do this.

### Songs missing

This might happen when the file is not actually in the [extended m3u](https://en.wikipedia.org/wiki/M3U) format.
This format looks like

#EXTM3U
#EXTINF:277,Faun - Sieben Raben
/Music/Selection/Faun - Sieben Raben.mp3

and is populated from file tags. For this example, the mp3 file contains the tags
artist = Faun and title = Sieben Raben which then populate the `#EXTINF` line.

If your playlist only contain paths, try importing it in a (different) music player and exporting again.
AIMP, for example, exports in the expected format.

### What does tospotify actually do internally?

It tries various cleaning steps and search queries in an attempt to find the correct songs on Spotify.

The [extended m3u](https://en.wikipedia.org/wiki/M3U) format is important. As mentioned above, the ground truth
is actually the artist and title tags stored in the songs themselves which are then reflected in the playlist.
Looking at the example above, the format is essentially `artist - title`; this implicitly means that dashes `-`
in the artist or title cannot be interpreted properly at the moment. Sorry, AC-DC :(

The tool then uses rules to compute various queries. Take for example the song
`Every Breath You Take` by `Sting and the Police`. This can be stored in many ways. The artist could be
`Sting and the Police`, `Sting;The Police`, `Sting & the Police`, `The Police`, etc.
Then the title could be `Every Breath You Take` but also `Every Breath You Take feat. Sting` and other
variations. Many of these are not found exactly as such on Spotify.

There can also be live versions, e.g. with title `Every Breath You Take [live]`,
covers by other artists, separate recordings of the song,
and the list goes on. This song was actually recorded both by The Police with Sting
and solo by Sting; both versions are available on Spotify!

Bit more complex than it initially seems :)

So this is what tospotify does; it will try to find the correct song through various rules derived from the data in the
playlist.


## Contributing

Take a look at the [CONTRIBUTING](CONTRIBUTING.md) file for more details. Pull requests are welcome!
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
m3u8==0.5.4
m3u8>=0.7.1
spotipy==2.11.1
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def readme():

setup(
name='tospotify',
version='0.2',
version='0.3',
description='Create/update a Spotify playlist from a local m3u playlist',
url='https://github.com/radujica/tospotify',
author='Radu Jica',
Expand All @@ -29,7 +29,7 @@ def readme():
packages=['tospotify', 'tospotify.types'],
include_package_data=True,
zip_safe=True,
install_requires=['spotipy', 'm3u8'],
install_requires=['spotipy', 'm3u8>=0.7.1'],
entry_points={
'console_scripts': ['tospotify=tospotify.run:main'],
}
Expand Down
3 changes: 3 additions & 0 deletions test/data/cp1252_playlist.m3u
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#EXTM3U
#EXTINF:290,Eiv�r - Tr�llabundin
..\..\Music\Selection\Nordic\Eiv�r - Tr�llabundin.mp3
Empty file added test/data/empty_playlist.m3u
Empty file.
1 change: 1 addition & 0 deletions test/data/path_playlist.m3u
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/Music/Selection/Faun - Sieben Raben.mp3
5 changes: 5 additions & 0 deletions test/data/utf8_playlist.m3u8
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#EXTM3U
#EXTINF:200,Marcin Przybyłowicz - The Wolven Storm (Priscilla's Song)
/Music/Marcin Przybyłowicz - The Wolven Storm (Priscilla's Song).mp3
#EXTINF:290,Eivør - Trøllabundin
/Music/Eivør - Trøllabundin.mp3
3 changes: 3 additions & 0 deletions test/data/valid_playlist.m3u
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#EXTM3U
#EXTINF:277,Faun - Sieben Raben
/Music/Selection/Faun - Sieben Raben.mp3
32 changes: 32 additions & 0 deletions test/test_integration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
import os
from unittest.mock import patch

import pytest

from tospotify.run import main


class MockArgs:
def __init__(self, playlist_path, playlist_id=None):
self.verbose = False
self.convert = False
self.spotify_username = 'test_username'
self.public = True
self.playlist_path = playlist_path
self.playlist_id = playlist_id


# the point here is to run through the whole flow
@patch('tospotify.run._parse_args')
@patch('tospotify.run.prompt_for_user_token', lambda x, y: 'token')
@patch('tospotify.search._find_track', lambda x, y, z: 'uri')
@patch('tospotify.search.add_tracks', lambda x, y, z: None)
@pytest.mark.parametrize('playlist', [
os.path.join('test', 'data', 'valid_playlist.m3u'),
os.path.join('test', 'data', 'empty_playlist.m3u'),
os.path.join('test', 'data', 'empty_playlist.m3u'),
os.path.join('test', 'data', 'utf8_playlist.m3u8')
])
def test_integration(mock_function, playlist):
mock_function.return_value = MockArgs(playlist_path=playlist, playlist_id=1)
main()
5 changes: 5 additions & 0 deletions test/test_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from tospotify.parser import parse_songs


def test_parse_songs():
pass
5 changes: 2 additions & 3 deletions test/test_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,8 @@
@pytest.mark.parametrize('name,expected', [
('The Police', 'the police'),
(' The Police ', 'the police'),
('St-ing; The, Police', 'sting; the, police'),
('ßtïngé', 'tng'),
('é', ''),
('St-ing; {The}, Police', 'st-ing; the, police'),
('ßtïngé', 'ßtïngé'),
('Every Breath You Take (feat. Sting)', 'every breath you take (feat sting)'),
('Every Breath You Take [Acoustic]', 'every breath you take [acoustic]')
])
Expand Down
23 changes: 22 additions & 1 deletion test/test_run.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import argparse
import os
from unittest.mock import patch

import pytest

from tospotify.run import _parse_path
from tospotify.run import _parse_path, _m3u_file


@patch('os.getcwd', lambda: '/test/path')
Expand All @@ -13,3 +14,23 @@
])
def test__parse_path(path, expected):
assert _parse_path(path) == os.sep.join(expected)


@pytest.mark.parametrize('playlist_path', [
'path/to/file.m3u',
'file.m3u',
'path/to/file.m3u8',
'file.m3u8'
])
def test__m3u_extension(playlist_path):
_m3u_file(playlist_path)


@pytest.mark.parametrize('playlist_path', [
'path/file.mp3',
'.m3u',
'file'
])
def test__m3u_extension_invalid(playlist_path):
with pytest.raises(argparse.ArgumentTypeError):
_m3u_file(playlist_path)
39 changes: 39 additions & 0 deletions tospotify/parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import logging

import m3u8


def convert_utf8(playlist_path: str) -> str:
""" Convert file to utf-8 after parsing with locale.getpreferredencoding
:param playlist_path: absolute path of the playlist
:return: Path of converted file
"""
logging.warning('Converting file to utf-8')
path_without_extension = playlist_path.rsplit('.', 1)[0]
output_path = path_without_extension + '_utf8.m3u8'

# here it uses the locale.getpreferredencoding, which could be cp1252 for Windows
with open(playlist_path, mode='r') as input_:
with open(output_path, encoding='utf-8', mode='w') as output_:
for line in input_.readlines():
output_.write(line.encode('utf-8').decode('utf-8'))

return output_path


def parse_songs(playlist_path: str) -> m3u8.SegmentList:
""" Parse and return the songs found in the file
:param playlist_path: absolute path of the playlist
:type playlist_path: str
:return:
"""
# m3u8 uses open(..., encoding='utf-8') which will through exception when the file cannot be parsed as utf-8
playlist = m3u8.load(playlist_path)
segments = playlist.segments

if len(segments) <= 0:
logging.error('Could not find any songs in the file!')

return segments
12 changes: 8 additions & 4 deletions tospotify/processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,16 +28,16 @@ def clean_title(title: str) -> str:
def clean_name(name: str) -> str:
""" Clean either artist or title:
- keep only ascii and some relevant characters: \\,&()[]
- note that single quotes are also removed since Spotify seems to handle those well
- removes these characters: .{}
- removes extra spaces
- strip and lowercase
:param name: artist or song title to clean
:type name: str
:return: cleaned name
:rtype: str
"""
cleaned_name = re.sub(r'[^a-zA-Z0-9\s,;&()\[\]]', '', name)
cleaned_name = re.sub(r'[.{\}]', '', name)
cleaned_name = re.sub(r'\s+', ' ', cleaned_name)
cleaned_name = cleaned_name.strip()
cleaned_name = cleaned_name.lower()
Expand All @@ -49,12 +49,16 @@ def clean_name(name: str) -> str:


def process_song_name(song_name: str) -> Tuple[str, str]:
""" Splits m3u line of artist - title and cleans using clean_name
""" Splits m3u line of artist - title and cleans using clean_name.
Note that the Extended M3U8 format requires this format, i.e. artist - title based on file tags.
!Since '-' delimits the artist from the song, this character should not be seen inside the artist or title!
:param song_name:
:type song_name: str
:return: tuple of artist and title
:rtype: (str, str)
:raises ProcessingException: if a song does not obey the Extended M3U8 formatting of "artist - title"
"""
song_split = song_name.split('-')

Expand Down
34 changes: 30 additions & 4 deletions tospotify/run.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,42 @@
import argparse
import logging
import os
from typing import Optional

from spotipy import Spotify
from spotipy.util import prompt_for_user_token

from .search import create_spotify_playlist, update_spotify_playlist


def _m3u_file(path: str) -> Optional[str]:
if not isinstance(path, str):
raise argparse.ArgumentTypeError('Path must be a string. Encountered type={}'.format(str(type(path))))

splits = path.rsplit('.', 1)
if len(splits) == 1:
raise argparse.ArgumentTypeError('Could not determine file extension')

filename, extension = splits[0], splits[1]
if len(filename) == 0:
raise argparse.ArgumentTypeError('Filename without extension cannot be empty')

if extension in {'m3u', 'm3u8'}:
return path

raise argparse.ArgumentTypeError('Only m3u files are supported. Encountered={}'.format(extension))


def _parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description='Create/update a Spotify playlist from a local m3u playlist')
parser.add_argument('spotify_username',
help='Spotify username where playlist should be updated. '
'Your email address should work just fine, or could find your user id '
'through e.g. the developer console', type=str)
parser.add_argument('playlist_path', help='full path to the playlist', type=str)
parser.add_argument('playlist_path', help='full path to the playlist', type=_m3u_file)
parser.add_argument('--verbose', help='print all the steps when searching for songs', action='store_true')
parser.add_argument('--public', help='playlist is public, otherwise private', action='store_true')
parser.add_argument('--convert', help='convert from locale default to utf-8', action='store_true')
parser.add_argument('--playlist-id', help='do not create a new playlist, '
'instead update the existing playlist with this id', type=str)
parsed_args = parser.parse_args()
Expand All @@ -31,6 +51,13 @@ def _parse_path(path: str) -> str:
return os.path.join(os.getcwd(), *path.split(os.sep))


def _extract_playlist_name(playlist_path: str) -> str:
_, filename = os.path.split(playlist_path)
playlist_name = str(filename.split('.')[0])

return playlist_name


def main() -> None:
""" Main entry point to the script """
args = _parse_args()
Expand All @@ -50,12 +77,11 @@ def main() -> None:
spot = Spotify(auth=token)

if args.playlist_id is None:
_, filename = os.path.split(playlist_path)
playlist_name = str(filename.split('.')[0])
playlist_name = _extract_playlist_name(playlist_path)
playlist_id = create_spotify_playlist(spot, playlist_name)
logging.info('Created playlist with name={} at id={}'.format(playlist_name, playlist_id))
else:
playlist_id = args.playlist_id
logging.info('Updating existing playlist with id={}'.format(playlist_id))

update_spotify_playlist(spot, playlist_path, playlist_id)
update_spotify_playlist(spot, playlist_path, playlist_id, args.convert)
Loading

0 comments on commit 8821833

Please sign in to comment.