-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDRP-1087 outliers #395
RDRP-1087 outliers #395
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running the pipeline I get the following error:
FileExistsError: File: R:/BERD Results System Development 2023/DAP_emulation/2021_surveys/PNP/06_imputation/backdata_output/PNP_2021_backdata_thousands.csv does not exist
To make it run you need to remove the postfix _thousand from the filename in dev_configs.
outlier_cond = filtered_df["group_rank"] > filtered_df["clip_higher_than"] | ||
|
||
# If lower clipping is specified, add the condition | ||
if lower_clip > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the case where function "get_clip_bands" has a non-zero lower_clip has not been tested. I'm wondering if this is worth doing.
Also, has "flag_outliers" been unit-tested for the case where lower_clip is not zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the flag_outliers has been unit tested where the lower_clip is non zero. I think you're right, I'll add in a unit test for get_clip_bands for non-zero lower clip as well, and for when both are non-zero.
Pull Request submission
The code src.outlier_detection.auto_outliers.py has a large function flag_outliers.
I have created a separate function that indicates the number or rows that should be clipped for outliering, with a unit test.
This unit test illustrates how many rows get outliered depending on the size of valid entries in a cell.
Closes or fixes
Closes #1087
Code
Documentation
Any new code includes all the following forms of documentation:
Args
andreturns
for all major functionsData
Testing
Peer Review Section
requirements.txt
Final approval (post-review)
The author has responded to my review and made changes to my satisfaction.
Review comments
Insert detailed comments here!
These might include, but not exclusively:
that it is likely to interact with?)
works correctly?)
Your suggestions should be tailored to the code that you are reviewing.
Be critical and clear, but not mean. Ask questions and set actions.