Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17 sparse selection #30

Merged
merged 13 commits into from
Dec 11, 2024
Merged

17 sparse selection #30

merged 13 commits into from
Dec 11, 2024

Conversation

rogerkuou
Copy link
Member

Fix #17.

@rogerkuou rogerkuou force-pushed the 17_sparse_selection branch from 39b5a83 to 2cb55d4 Compare October 21, 2024 15:19
@rogerkuou rogerkuou marked this pull request as ready for review October 21, 2024 15:29
@rogerkuou
Copy link
Member Author

Hi @FreekvanLeijen , the sparse selection function is ready for review. I extended the example script of ps selection to a general depsi processing script, and added the example of sparse selection as the next step.

Can you give it a review? I probably need to react on it after I come back.

Copy link

@FreekvanLeijen FreekvanLeijen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ou, thanks for creating this function. I especially like the foreseen extension with other crs!

I added some comments. There seems to be some options for performance improvement.

@@ -42,6 +42,8 @@ def get_free_port():
# Parameters
method = 'nmad' # Method for selection
threshold = 0.45 # Threshold for selection
dist_thres = 200 # Distance threshold for network selection

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a little longer. I know that I proposed this, but I would like to change, since 'thres' suggests a maximum distance, instead of a minimum distance.

Therefore please change to 'min_dist'.

@@ -42,6 +42,8 @@ def get_free_port():
# Parameters
method = 'nmad' # Method for selection
threshold = 0.45 # Threshold for selection
dist_thres = 200 # Distance threshold for network selection
include_index = [101] # Force including the 101th point, use None if no point need to be included

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extent this example to multiple points, so that the formatting is clear?
And would be good to include to nearby points here as well (e.g., 101 and 102, although they may actually be relatively far apart).

@@ -42,6 +42,8 @@ def get_free_port():
# Parameters
method = 'nmad' # Method for selection
threshold = 0.45 # Threshold for selection
dist_thres = 200 # Distance threshold for network selection
include_index = [101] # Force including the 101th point, use None if no point need to be included

# Input data paths
path_slc_zarr = Path("/project/caroline/slc_file.zarr") # Zarr file of all SLCs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a variable, specify file path at the beginning of the script.

@@ -121,5 +124,15 @@ def get_free_port():
else:
stm_ps_reordered.to_zarr(path_ps_zarr)

# ---- Processing Stage 2: Network Processing ----
# Select network points
stm_network = network_stm_seletcion(stm,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selection

@@ -126,6 +127,123 @@ def ps_selection(
return stm_masked


def network_stm_seletcion(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selection

if stm_select is None:
stm_select = stm_now.copy()
else:
stm_select = xr.concat([stm_select, stm_now], dim="space")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Matlab, iterative concatenation is much slower than working with an index, because of memory allocation. Not sure how this is in Python. To test. (Therefore the original implementation worked with an index.)

Copy link
Member Author

@rogerkuou rogerkuou Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @FreekvanLeijen , I test a comparision between working with index and concat on ~1million points. My conclusion is their performance are almost the same. I would use the index here since in general it's more readable. But for the part below, when handling the coords_remain, I would prefer a coords_remain for readability.

stm_remain = stm_remain.isel(space=slice(1, None)).copy()

# Remove points in stm_remain within min_dist of stm_now
coords_remain = np.column_stack(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation is now repeated 1000+ times. Move outside loop. So store the coordinates, work with index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment above

assert np.all(res_nmad["space"].values == np.array([3, 7]))


def test_network_stm_seletcion_include_index():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selection
(check whole file)

@rogerkuou
Copy link
Member Author

Hi @FreekvanLeijen, I applied most of your comments. For the performance improvement with index, I did some test but actually found the performance are quite similar. I partly implemented the index to make the syntax more readable. Can you give another look?

@rogerkuou rogerkuou merged commit 102ba6f into main Dec 11, 2024
9 checks passed
@rogerkuou rogerkuou deleted the 17_sparse_selection branch January 10, 2025 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sparse_selection function
2 participants