-
Notifications
You must be signed in to change notification settings - Fork 61
Importing and serializing company data #263
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Ilozuluchris! One more many thanks for the WIP PR. First of all I changed the merge branch to master
, otherwise it would be impossible to browser the changes and picking up only what you've actually changed and what was merged since I created cuducos-rows
branch.
That said you're in the right path. The main point is that rows
powerful import tool can handle all serialization so we can get rid of that in Jarbas source code. So there is still a few extras steps (mostly related to the force_types
kwarg).
In addition, there are some Python style issues that are indeed minor stuff. Just run prospector
before you declare you code ready and probably you can spot them all ; )
import lzma | ||
import rows | ||
import rows.fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need to import rows
and then rows.fields
?
@@ -23,6 +24,8 @@ def handle(self, *args, **options): | |||
|
|||
self.save_companies() | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need these extra spaces: according to PEP8 1 line is fine to separate class methods.
return row._asdict() | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need these extra spaces: according to PEP8 1 line is fine to separate class methods.
|
||
|
||
class InputDateField(rows.fields.DateField): | ||
INPUT_FORMAT = '%d/%m/%Y' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd declare this class in the beginning of the file so when you use it in the other class the reader is aware about what this field is ; )
@@ -45,6 +48,12 @@ def save_companies(self): | |||
self.count += 1 | |||
self.print_count(Company, count=self.count) | |||
|
|||
@staticmethod | |||
def transform(row): | |||
return row._asdict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a method for that… calling _asdict()
in the for
loop seams enough.
@@ -63,17 +72,15 @@ def save_activities(self, row): | |||
|
|||
return [main], secondaries | |||
|
|||
|
|||
def serialize(self, row): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the great point of using rows
is to get rid of the serialize
method. The idea is to use the force_types
keyword argument when calling rows.import_from_csv
and let rows
serialize stuff as we need it (for example, #211).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to add that comment on the previous review, sorry.
@@ -31,7 +34,7 @@ def save_companies(self): | |||
skip = ('main_activity', 'secondary_activity') | |||
keys = tuple(f.name for f in Company._meta.fields if f not in skip) | |||
with lzma.open(self.path, mode='rt', encoding='utf-8') as file_handler: | |||
for row in csv.DictReader(file_handler): | |||
for row in map(self.transform,rows.import_from_csv(filename_or_fobj=file_handler)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might work but this is not pythonic. First of all, style:
- try to keep things under 80 chars
- list comprehension usually are more readable than
map
- Zen of Pytthon - always add a space after come when separating values (in sequences) and arguments (in function calls) — also PEP8
I'd try something like:
for row in rows.import_from_csv(file_handler):
row = row._asdict() # and we can get rid of the self.transform call
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay @cuducos Thanks for the review,I would make the necessary changes .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cuducos rows.fields.EmailField raises a value error when it tries to serialize 'ahoy'
IMHO that is expected as |
Yeah,it is...the |
@cuducos I am confused,I do not know how to go about this https://github.com/datasciencebr/jarbas/blob/f95addd5ea4c47f161796cf1e72a58050f66ce60/jarbas/core/management/commands/suspicions.py#L50 since the serialize method actually reshapes rows |
Would you mind expanding on your doubt? I mean… you just highlighted the method we're trying to get rid of, so I don't know exactly where you doubt lies… |
I was talking about this arrangement,am not sure how to go about this with rows |
Ow… you're right! I don't know any thing by heart, maybe here we do need a serialization method!… @turicas, any idea on that? Our input is a CSV like:
And the expected output is something like:
|
Okay I will, thanks. |
…e) of the command class in jarbas.core.management.commands.companies.py
…any more) of the command class in jarbas.core.management.commands.companies.py
@cuducos The travis CI job does not run, the requests page reports this
I believe it has to do with the merge conflict wrt requirements.txt. I tried to resolve myself but that did not work as expected. |
After fixing conflicts it is still red:
|
@Ilozuluchris, I'm not sure why the CodeClimate checks are missing. Gonna fix that and then we come back to this PR, ok? |
@cuducos Okay no problem, I am sorry it took so long to do this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Ilozuluchris, many thanks for pushing this PR further. I'm just afraid CodeClimate could help you with code style. Would you mind pushing a new commit so the CodeClimate check runs? (I just fixed it).
phone=b'', responsible_federative_entity=b'', situation=b'', zip_code=b'', | ||
situation_date=datetime.date(2005, 9, 24), situation_reason=b'', type='Book', | ||
special_situation_date=None, state=b'', status=b'', trade_name=b'', | ||
special_situation=b'') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about creating a dictionary and unpacking it to enhance the readability here?
kwargs = {
'additional_address_details': b'',
'address': b'',
…
}
create.assert_called_once_with(**kwargs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will.
@cuducos All checks passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, just some formatting issues (don'tknow why) Code Climate skips.
'special_situation_date': CompaniesDate, | ||
'latitude': rows.fields.FloatField, | ||
'longitude': rows.fields.FloatField | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a contant, could you named it with caps (i.e. COMPANIES_CSV_FIELDS
) and put it before the class
definition (just after the imports)?
Also, in Python we use one of these formatting styles:
COMPANIES_CSV_FIELDS = {
'email': rows.fields.EmailField,
'opening': CompaniesDate,
'situation_date': CompaniesDate,
'special_situation_date': CompaniesDate,
'latitude': rows.fields.FloatField,
'longitude': rows.fields.FloatField
}
Or:
COMPANIES_CSV_FIELDS = {'email': rows.fields.EmailField,
'opening': CompaniesDate,
'situation_date': CompaniesDate,
'special_situation_date': CompaniesDate,
'latitude': rows.fields.FloatField,
'longitude': rows.fields.FloatField}
Also, can you reflect this formatting decision in arbas/core/tests/test_companies_command.py
(line 46), isting one field per line?
This is a WIP but I want to invite @cuducos to review what I have done so far #211