-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
17 sparse selection #30
Conversation
39b5a83
to
2cb55d4
Compare
Hi @FreekvanLeijen , the sparse selection function is ready for review. I extended the example script of ps selection to a general depsi processing script, and added the example of sparse selection as the next step. Can you give it a review? I probably need to react on it after I come back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ou, thanks for creating this function. I especially like the foreseen extension with other crs!
I added some comments. There seems to be some options for performance improvement.
@@ -42,6 +42,8 @@ def get_free_port(): | |||
# Parameters | |||
method = 'nmad' # Method for selection | |||
threshold = 0.45 # Threshold for selection | |||
dist_thres = 200 # Distance threshold for network selection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a little longer. I know that I proposed this, but I would like to change, since 'thres' suggests a maximum distance, instead of a minimum distance.
Therefore please change to 'min_dist'.
@@ -42,6 +42,8 @@ def get_free_port(): | |||
# Parameters | |||
method = 'nmad' # Method for selection | |||
threshold = 0.45 # Threshold for selection | |||
dist_thres = 200 # Distance threshold for network selection | |||
include_index = [101] # Force including the 101th point, use None if no point need to be included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extent this example to multiple points, so that the formatting is clear?
And would be good to include to nearby points here as well (e.g., 101 and 102, although they may actually be relatively far apart).
@@ -42,6 +42,8 @@ def get_free_port(): | |||
# Parameters | |||
method = 'nmad' # Method for selection | |||
threshold = 0.45 # Threshold for selection | |||
dist_thres = 200 # Distance threshold for network selection | |||
include_index = [101] # Force including the 101th point, use None if no point need to be included | |||
|
|||
# Input data paths | |||
path_slc_zarr = Path("/project/caroline/slc_file.zarr") # Zarr file of all SLCs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this a variable, specify file path at the beginning of the script.
@@ -121,5 +124,15 @@ def get_free_port(): | |||
else: | |||
stm_ps_reordered.to_zarr(path_ps_zarr) | |||
|
|||
# ---- Processing Stage 2: Network Processing ---- | |||
# Select network points | |||
stm_network = network_stm_seletcion(stm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selection
pydepsi/classification.py
Outdated
@@ -126,6 +127,123 @@ def ps_selection( | |||
return stm_masked | |||
|
|||
|
|||
def network_stm_seletcion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selection
pydepsi/classification.py
Outdated
if stm_select is None: | ||
stm_select = stm_now.copy() | ||
else: | ||
stm_select = xr.concat([stm_select, stm_now], dim="space") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Matlab, iterative concatenation is much slower than working with an index, because of memory allocation. Not sure how this is in Python. To test. (Therefore the original implementation worked with an index.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @FreekvanLeijen , I test a comparision between working with index and concat on ~1million points. My conclusion is their performance are almost the same. I would use the index here since in general it's more readable. But for the part below, when handling the coords_remain
, I would prefer a coords_remain
for readability.
stm_remain = stm_remain.isel(space=slice(1, None)).copy() | ||
|
||
# Remove points in stm_remain within min_dist of stm_now | ||
coords_remain = np.column_stack( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation is now repeated 1000+ times. Move outside loop. So store the coordinates, work with index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the comment above
tests/test_classification.py
Outdated
assert np.all(res_nmad["space"].values == np.array([3, 7])) | ||
|
||
|
||
def test_network_stm_seletcion_include_index(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selection
(check whole file)
Hi @FreekvanLeijen, I applied most of your comments. For the performance improvement with index, I did some test but actually found the performance are quite similar. I partly implemented the index to make the syntax more readable. Can you give another look? |
Fix #17.