-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimised inference pipeline #104
base: master
Are you sure you want to change the base?
Conversation
Hi @jjhbw , Thanks a lot for this - we really appreciate it. We are pleased that you and your lab are enjoying working with the code. Also thanks for the detailed description. We will pull down the PR and test it before giving some feedback later this week. |
Great! Would love to hear your feedback. Note that i've only really changed 'plumbing' code and the model and postprocessing routines are untouched, as shown by the diff. Just let me know if anything is unclear. |
Hi @jjhbw , I was busied with other stuffs. I ended up rolling another version for not doing the caching. Your code relies solely only on chunk grid defined here Line 140 in e019105
Line 380 in e019105
which can be considered as seamless tiling of chunk output. Now, If we check the postproc here, Line 298 in e019105
|
Hi @vqdang, thanks for looking over the PR. The chunks overlap each other by a fixed-size strip of padding. For each chunk, the cells within the padding area are discarded. This redundancy should ensure that cells near the borders of a chunk are always part of another chunk, so duplicates can be safely discarded. Each cell should be present only once in the final set. The below image may help explain the concept a bit (sorry about its low resolution). In red, you see the chunk boundaries. The tile boundaries are shown in green. I will ignore the imperfect tissue segmentation for the purposes of this discussion. I also performed a more thorough investigation by exporting the nuclei found in neighbouring chunks to QuPath. I'll re-do it and share a few screenshots. |
b701725
to
c0cfb16
Compare
…uild a MultiPolygon from a list with only a single Polygon in it.
99119d4
to
65963c1
Compare
Hey, no movement on this? |
Hi all,
First of all, thanks for this amazing project. Our lab is really happy to be able to use it, and I've learned a lot from working with it.
While applying HoverNet to our dataset, I encountered a pretty significant obstacle. We use a HPC cluster equipped with GPUs to run inference across our fairly large dataset (several thousand WSI). The runtime of the current inference script (
run_infer.py
) reached multiple days for a single WSI before being killed by the scheduler.When reading through the code, I noticed that the inference script (
run_infer.py
) makes quite extensive use of memory mapped numpy files. Memmapping this way has quite significant overhead, making it virtually impossible to use on network filesystems like that of our HPC cluster.I definitely wanted to use your project, so I ended up refactoring the inference script a bit. Please find the result in this PR. I ended up making a few additional optimisations that should improve runtime and code simplicity a bit. The new
infer_simple.py
should be able to completely replace the previous inference code, and only imports the parts of the existing codebase that it needs (i.e. theHoVerNet
model definition and theprocess
postprocessing routine).Our lab is using this script 'in production'. I'm happy to polish this PR so it better fits your vision for this project or to explain my design decisions some more. If you prefer to use the current scripts instead that's also perfectly fine with me, of course! I'm just happy to have been able to use your work.