-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix zero weights issue #387
Fix zero weights issue #387
Conversation
Thanks @andersonfrailey. When I set up taxdata on my machine last week, I had to delete the .toml files. I then installed needed Julia packages. From my perspective it would be fine to just give a list of needed Julia packages in the documentation. |
Thanks for this, @donboyd5. I'll add some additional notes in the documentation about installing the proper packages. |
Re: the change in
Still have not gotten to the bottom of why they are different. |
Ok. I think this is somehow a random error. I re-ran everything without making any changes and now there are a bunch of variables that are changing. Still unclear why this is happening. |
@andersonfrailey I re-created the puf and then ran the tests locally:
Hope this helps! |
Thanks for re-creating this, @hdoupe. That's the error I got when running the scripts last night. This morning, and previously, I got this error:
I have found two functions with a np.random call that do not have a seed set before them. But this doesn't really explain why we have all of these other changes. |
Hmm strange. I can make a |
@hdoupe if it wouldn't be too much trouble that'd be really helpful to figure out why we're getting different errors. I think I've tracked down where mine is coming from. In the alldata["pencon_p"] = idata["pencon"][idata["spouse"] == 0]
alldata["pencon_s"] = idata["pencon"][idata["spouse"] == 1]
alldata["e00200p"] = idata["e00200"][idata["spouse"] == 0]
alldata["e00200s"] = idata["e00200"][idata["spouse"] == 1]
alldata["e00200"] = alldata["e00200p"] + alldata["e00200s"]
It's still a little odd to me that only 20 records are affected, but I'm confident that this is the source of the test failure I'm getting. I can't explain the error you're getting though, @hdoupe. |
@andersonfrailey I was able to re-create the bug I ran into with a docker image:
Tests with this command:
These commands will mount your code in the docker image. So you can make changes to the code without having to re-build the docker image to test them. Re-build with this command:
I had some issues pushing my changes with the Dockerfile, but it's here if you want to re-build the image:# Dockerfile
FROM continuumio/miniconda3
RUN conda config --append channels conda-forge
# Install dependencies. csk build-env will read the environment.yml file and install
# the packages there into the base conda environment.
RUN conda install pip "python>=3.8" && \
pip install cs-kit pyyaml
COPY . /taxdata
WORKDIR /taxdata
RUN csk build-env
|
@hdoupe I haven't been able to re-create the bug you found since I got it the first time. I'll give it a few more tries just to be safe. In the meantime I've updated the PUF expected results to reflect the new totals. |
After trying again I still haven't been able to reproduce the bug reported by @hdoupe. I'm going to merge this and we'll do a bug fix later if it becomes and issue again. |
This PR addresses #385. All we needed to do was rename the weights for non-filers from
s006
tomatched_weight
so that they were properly appended to the filers. Thanks to @donboyd5 for flagging this issue.A few other notes:
For some reason, the unweighted sum of
e00200
changes slightly. It's unclear why because nothing I changed should have affected this. I want to try and get to the bottom of this before merging this PR.I removed
Project.toml
andManifest.toml
. These two files were used by Julia to specify packages and version numbers used in our stage 2 process. However I don't think they were specified properly because I repeatedly got errors about packages not having the proper dependencies listed inManifest.toml
. I think it would be easier to just include in the documentation instructions for installing the proper packages. I'm interesting in hearing other people's thought here or if they've run into similar troubles.