Importing and serializing company data #263

Ilozuluchris · 2017-10-07T13:05:17Z

This is a WIP but I want to invite @cuducos to review what I have done so far #211

cuducos

Hi @Ilozuluchris! One more many thanks for the WIP PR. First of all I changed the merge branch to master, otherwise it would be impossible to browser the changes and picking up only what you've actually changed and what was merged since I created cuducos-rows branch.

That said you're in the right path. The main point is that rows powerful import tool can handle all serialization so we can get rid of that in Jarbas source code. So there is still a few extras steps (mostly related to the force_types kwarg).

In addition, there are some Python style issues that are indeed minor stuff. Just run prospector before you declare you code ready and probably you can spot them all ; )

cuducos · 2017-10-07T13:51:15Z

jarbas/core/management/commands/companies.py

 import lzma
+import rows
+import rows.fields


Do you really need to import rows and then rows.fields?

cuducos · 2017-10-07T13:51:40Z

jarbas/core/management/commands/companies.py

@@ -23,6 +24,8 @@ def handle(self, *args, **options):

        self.save_companies()

+
+


We don't need these extra spaces: according to PEP8 1 line is fine to separate class methods.

cuducos · 2017-10-07T13:52:09Z

jarbas/core/management/commands/companies.py

+        return row._asdict()
+
+
+


We don't need these extra spaces: according to PEP8 1 line is fine to separate class methods.

cuducos · 2017-10-07T13:52:46Z

jarbas/core/management/commands/companies.py

+
+
+class InputDateField(rows.fields.DateField):
+    INPUT_FORMAT = '%d/%m/%Y'


I'd declare this class in the beginning of the file so when you use it in the other class the reader is aware about what this field is ; )

cuducos · 2017-10-07T13:58:43Z

jarbas/core/management/commands/companies.py

@@ -45,6 +48,12 @@ def save_companies(self):
                self.count += 1
                self.print_count(Company, count=self.count)

+    @staticmethod
+    def transform(row):
+        return row._asdict()


I don't think we need a method for that… calling _asdict() in the for loop seams enough.

cuducos · 2017-10-07T14:00:53Z

jarbas/core/management/commands/companies.py

@@ -63,17 +72,15 @@ def save_activities(self, row):

        return [main], secondaries

+
    def serialize(self, row):


Basically the great point of using rows is to get rid of the serialize method. The idea is to use the force_types keyword argument when calling rows.import_from_csv and let rows serialize stuff as we need it (for example, #211).

cuducos

I forgot to add that comment on the previous review, sorry.

cuducos · 2017-10-07T14:04:40Z

jarbas/core/management/commands/companies.py

@@ -31,7 +34,7 @@ def save_companies(self):
        skip = ('main_activity', 'secondary_activity')
        keys = tuple(f.name for f in Company._meta.fields if f not in skip)
        with lzma.open(self.path, mode='rt', encoding='utf-8') as file_handler:
-            for row in csv.DictReader(file_handler):
+            for row in map(self.transform,rows.import_from_csv(filename_or_fobj=file_handler)):


This might work but this is not pythonic. First of all, style:

try to keep things under 80 chars

list comprehension usually are more readable than map - Zen of Pytthon

always add a space after come when separating values (in sequences) and arguments (in function calls) — also PEP8

I'd try something like:

for row in rows.import_from_csv(file_handler): row = row._asdict() # and we can get rid of the self.transform call …

Okay @cuducos Thanks for the review,I would make the necessary changes .

Ilozuluchris

@cuducos rows.fields.EmailField raises a value error when it tries to serialize 'ahoy'

cuducos · 2017-10-09T10:31:37Z

rows.fields.EmailField raises a value error when it tries to serialize 'ahoy'

IMHO that is expected as 'ahoy' is not a valid email address… right?

Ilozuluchris · 2017-10-09T19:08:06Z

Yeah,it is...the serialize method returned invalid email addresses as none.I just mentioned it since that was not how the 'serialize' method handled invalid email addresses.
Do you have any corrections to make?

Ilozuluchris · 2017-10-14T23:07:49Z

@cuducos I am confused,I do not know how to go about this https://github.com/datasciencebr/jarbas/blob/f95addd5ea4c47f161796cf1e72a58050f66ce60/jarbas/core/management/commands/suspicions.py#L50 since the serialize method actually reshapes rows

cuducos · 2017-10-15T00:20:36Z

@cuducos I am confused,I do not know how to go about this

Would you mind expanding on your doubt? I mean… you just highlighted the method we're trying to get rid of, so I don't know exactly where you doubt lies…

Ilozuluchris · 2017-10-15T10:41:41Z

    reserved_keys = (
        'applicant_id',
        'document_id',
        'probability',
        'year'
    )
    hypothesis = tuple(k for k in row.keys() if k not in reserved_keys)
    pairs = ((k, v) for k, v in row.items() if k in hypothesis)
    filtered = filter(lambda x: self.bool(x[1]), pairs)
    suspicions = {k: True for k, _ in filtered} or None

    return dict(
        document_id=document_id,
        probability=probability,
        suspicions=suspicions
    )

I was talking about this arrangement,am not sure how to go about this with rows

cuducos · 2017-10-15T10:57:25Z

Ow… you're right! I don't know any thing by heart, maybe here we do need a serialization method!… @turicas, any idea on that?

Our input is a CSV like:

document_id, hypothesis_1, hypotehesis_2, hypothesis_3
42,True,False,True

And the expected output is something like:

[
    {
        'document_id': 42,
        'suspicions': {
            'hypothesis_1': True,
            'hypothesis_3': True
        }
    }
]

Ilozuluchris · 2017-12-19T13:49:25Z

Hello @cuducos, anything left wrt issue #211

cuducos · 2017-12-20T20:35:48Z

Hello @cuducos, anything left wrt issue #211

Yep, tests are failing, so there is still something to fix. You might consider adding rows to the requirements.txt; )

Ilozuluchris · 2017-12-21T00:09:15Z

Okay I will, thanks.

…e) of the command class in jarbas.core.management.commands.companies.py

…any more) of the command class in jarbas.core.management.commands.companies.py

Ilozuluchris · 2017-12-23T21:54:57Z

@cuducos The travis CI job does not run, the requests page reports this

GitHub payload is missing a merge commit (mergeable_state: "unknown", merged: false)

I believe it has to do with the merge conflict wrt requirements.txt. I tried to resolve myself but that did not work as expected.

cuducos · 2017-12-24T06:08:16Z

The travis CI job does not run

After fixing conflicts it is still red:

======================================================================
ERROR: test_save_companies (jarbas.core.tests.test_companies_command.TestCreate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/python/3.5.4/lib/python3.5/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/opt/python/3.5.4/lib/python3.5/unittest/case.py", line 605, in run
    testMethod()
  File "/opt/python/3.5.4/lib/python3.5/unittest/mock.py", line 1159, in patched
    return func(*args, **keywargs)
TypeError: test_save_companies() missing 1 required positional argument: 'lzma'

cuducos · 2018-02-23T12:14:49Z

@Ilozuluchris, I'm not sure why the CodeClimate checks are missing. Gonna fix that and then we come back to this PR, ok?

Ilozuluchris · 2018-02-23T12:29:32Z

@cuducos Okay no problem, I am sorry it took so long to do this.

cuducos

Hi @Ilozuluchris, many thanks for pushing this PR further. I'm just afraid CodeClimate could help you with code style. Would you mind pushing a new commit so the CodeClimate check runs? (I just fixed it).

cuducos · 2018-02-25T13:59:43Z

jarbas/core/tests/test_companies_command.py

+                                  phone=b'', responsible_federative_entity=b'', situation=b'', zip_code=b'',
+                                  situation_date=datetime.date(2005, 9, 24), situation_reason=b'', type='Book',
+                                  special_situation_date=None, state=b'', status=b'', trade_name=b'',
+                                  special_situation=b'')


What about creating a dictionary and unpacking it to enhance the readability here?

kwargs = { 'additional_address_details': b'', 'address': b'', … } create.assert_called_once_with(**kwargs)

Okay, I will.

Ilozuluchris · 2018-02-25T18:30:55Z

@cuducos All checks passed.

cuducos

Hi, just some formatting issues (don'tknow why) Code Climate skips.

cuducos · 2018-02-26T17:29:52Z

jarbas/core/management/commands/companies.py

+            'special_situation_date': CompaniesDate,
+            'latitude': rows.fields.FloatField,
+            'longitude': rows.fields.FloatField
+        }


This is a contant, could you named it with caps (i.e. COMPANIES_CSV_FIELDS) and put it before the class definition (just after the imports)?

Also, in Python we use one of these formatting styles:

COMPANIES_CSV_FIELDS = { 'email': rows.fields.EmailField, 'opening': CompaniesDate, 'situation_date': CompaniesDate, 'special_situation_date': CompaniesDate, 'latitude': rows.fields.FloatField, 'longitude': rows.fields.FloatField }

Or:

COMPANIES_CSV_FIELDS = {'email': rows.fields.EmailField, 'opening': CompaniesDate, 'situation_date': CompaniesDate, 'special_situation_date': CompaniesDate, 'latitude': rows.fields.FloatField, 'longitude': rows.fields.FloatField}

Also, can you reflect this formatting decision in arbas/core/tests/test_companies_command.py (line 46), isting one field per line?

Ilozuluchris added 6 commits October 4, 2017 00:58

testing to see if it works

3c98772

removed unnecessary import statement

d8ddaac

Importing companies data with rows library

055ee86

some final changes

f14a1e9

Finished with companies data

b4b65ea

fixup! Finished with companies data

c28bb54

cuducos changed the base branch from cuducos-rows to master October 7, 2017 13:50

cuducos suggested changes Oct 7, 2017

View reviewed changes

Applied suggestions

8a4bc18

Ilozuluchris commented Oct 7, 2017

View reviewed changes

cuducos mentioned this pull request Nov 7, 2017

Use rows library to convert values #211

Open

cuducos approved these changes Dec 20, 2017

View reviewed changes

Ilozuluchris added 6 commits December 21, 2017 10:34

Added rows to requirements.txt

ceaf203

Removed test for the serialize method(since it does not exist any mor…

383e578

…e) of the command class in jarbas.core.management.commands.companies.py

fixup! Removed test for the serialize method(since it does not exist …

422dd04

…any more) of the command class in jarbas.core.management.commands.companies.py

Removed patch decorator for csv.DictReader

b3f4e9b

Removed serialize.return_value

6070f7c

Merge branch 'master' into master

57acdf6

Merge branch 'master' into master

a221cd5

Ilozuluchris added 3 commits January 14, 2018 20:32

Trying to fix positional argument issue

662652b

Issue with reading file with rows

1050063

Got test to work

6424c0a

cuducos reviewed Feb 25, 2018

View reviewed changes

Using a dictionary in assertion of the create mocked object

94bdd5d

cuducos suggested changes Feb 26, 2018

View reviewed changes

reformatted some necessary dictionaries

be4513c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importing and serializing company data #263

Importing and serializing company data #263

Ilozuluchris commented Oct 7, 2017

cuducos left a comment

cuducos Oct 7, 2017

cuducos Oct 7, 2017

cuducos Oct 7, 2017

cuducos Oct 7, 2017

cuducos Oct 7, 2017

cuducos Oct 7, 2017

cuducos left a comment

cuducos Oct 7, 2017

Ilozuluchris Oct 7, 2017

Ilozuluchris left a comment

cuducos commented Oct 9, 2017

Ilozuluchris commented Oct 9, 2017 •

edited

Loading

Ilozuluchris commented Oct 14, 2017 •

edited

Loading

cuducos commented Oct 15, 2017

Ilozuluchris commented Oct 15, 2017

cuducos commented Oct 15, 2017

Ilozuluchris commented Dec 19, 2017

cuducos commented Dec 20, 2017

Ilozuluchris commented Dec 21, 2017

Ilozuluchris commented Dec 23, 2017

cuducos commented Dec 24, 2017

cuducos commented Feb 23, 2018

Ilozuluchris commented Feb 23, 2018

cuducos left a comment

cuducos Feb 25, 2018

Ilozuluchris Feb 25, 2018

Ilozuluchris commented Feb 25, 2018

cuducos left a comment

cuducos Feb 26, 2018

		@@ -23,6 +24,8 @@ def handle(self, args, *options):

		self.save_companies()



		class InputDateField(rows.fields.DateField):
		INPUT_FORMAT = '%d/%m/%Y'

		@@ -63,17 +72,15 @@ def save_activities(self, row):

		return [main], secondaries


		def serialize(self, row):

Importing and serializing company data #263

Are you sure you want to change the base?

Importing and serializing company data #263

Conversation

Ilozuluchris commented Oct 7, 2017

cuducos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cuducos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ilozuluchris left a comment

Choose a reason for hiding this comment

cuducos commented Oct 9, 2017

Ilozuluchris commented Oct 9, 2017 • edited Loading

Ilozuluchris commented Oct 14, 2017 • edited Loading

cuducos commented Oct 15, 2017

Ilozuluchris commented Oct 15, 2017

cuducos commented Oct 15, 2017

Ilozuluchris commented Dec 19, 2017

cuducos commented Dec 20, 2017

Ilozuluchris commented Dec 21, 2017

Ilozuluchris commented Dec 23, 2017

cuducos commented Dec 24, 2017

cuducos commented Feb 23, 2018

Ilozuluchris commented Feb 23, 2018

cuducos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ilozuluchris commented Feb 25, 2018

cuducos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ilozuluchris commented Oct 9, 2017 •

edited

Loading

Ilozuluchris commented Oct 14, 2017 •

edited

Loading