Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise if ipf marginals do not #29

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

raise if ipf marginals do not #29

wants to merge 2 commits into from

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Apr 21, 2016

No description provided.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 81.474% when pulling 9c710a0 on Eh2406:master into 99f8b83 on UDST:master.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 19, 2017

Just notest that this is still around.

@janowicz what do you think?

@janowicz
Copy link
Contributor

Thanks for the reminder @Eh2406. Prior to the new RuntimeError that this PR introduces, would the existing RuntimeError ("'Maximum number of iterations reached during IPF") already be encountered in situations that this PR is trying to catch? In the ipf.py in this PR, is it strictly true that IPF will not converge if sums.max() - sums.min() is greater than .01, or is it possible that IPF could converge if sums.max() - sums.min() is greater than .01 by a modest amount?

@Eh2406
Copy link
Contributor Author

Eh2406 commented Jan 20, 2017

Thanks for the thoughtful response. I will try and explain so we can improve on my pr. :-)

So this pr is testing for the situation where marginal sum to different values. This easily can occur with rounding. In our uses this led to approximately half the bg being off by one in the number of households. When this form of bad input is given to ipf there is no "correct" output.

So for example in the new test:
cat_owner is (40 + 60) == 100
but
car_color is (50 + 31 + 20) == 101
so constraints.sum() is either 100 or 101 but not both. So with every loop it adjusts to 100 when adjusting to cat_owner then back to 101 when adjusting car_color. The values of constraints oscillate up and down depending on what part of the loop we are in. This infinite oscillation is what I meant by "will not converge."

Prior to the new RuntimeError that this PR introduces, would the existing RuntimeError ("'Maximum number of iterations reached during IPF") already be encountered in situations that this PR is trying to catch?

No, the value of constraints oscillate but the loop only checks at one spot in the cycle so max_iterations will not be hit. But the "correctness" of this output is somewhat arbitrary; checking at a different part of the cycle will give you a different answer.

In the ipf.py in this PR, is it strictly true that IPF will not converge if sums.max() - sums.min() is greater than .01, or is it possible that IPF could converge if sums.max() - sums.min() is greater than .01 by a modest amount?

It will not converge by my definition if sums.max() - sums.min() > 0. .01 is an arbitrary small number, as testing for 0 is generally a bad idea. I have sean no mathematical analysis of what ipf dose with bad input. So I do not know what cut off is best.

Does this make sense? how can I improve this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants