You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with a small (7Gbyte) P2 dataset, the system load went over 200, making the system a bit sluggish. Heavy I/O I guess .. top showed the 8 python processes, each with 700 to 1200% in different states (R,S,D) ...
top - 21:43:28 up 274 days, 10:38, 29 users, load average: 214.81, 106.81, 77.11
<...>
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19971 USER 20 0 22.2g 349288 51752 R 1210 0.0 36:49.68 python3
19972 USER 20 0 21.6g 288420 51156 S 1041 0.0 37:35.33 python3
19967 USER 20 0 21.7g 377372 50984 D 989.2 0.0 36:09.18 python3
19969 USER 20 0 21.7g 277636 48848 R 972.3 0.0 35:51.98 python3
19974 USER 20 0 21.7g 380280 54068 S 949.0 0.0 38:09.10 python3
19973 USER 20 0 21.7g 362636 48292 R 943.6 0.0 35:44.70 python3
19968 USER 20 0 21.6g 286052 48144 S 875.5 0.0 37:15.61 python3
19970 USER 20 0 21.7g 270932 47140 D 843.0 0.0 36:15.94 python3
This is by far more than I would expect when using --threads 8 ...
Is there a smooth way to better control the CPUs used by pod5 subset and thus I/O throughput (avoiding sth like taskset)?
This small dataset took ~7 minutes to finish, the larger datasets are 100x to 200x in size. So I wonder what is considered "best practice"? On shared servers this is quite a big issue.
Instead of writing a few thousand per-channel POD5 files, wouldn't it be convenient to write one or some more large(r) per-channel-sorted POD5 files?
Any ideas/comments/remarks are welcome :-)
The text was updated successfully, but these errors were encountered:
I just started using
pod5 subset
to regroup my POD5 data per-channel to optimizedorado duplex
basecalling speed.The default setting for
--threads
is 4, I was brave and set it to 8.. well, ..System: Linux, 128 cores, 1TB RAM
Load before starting
pod5
was around 34 ..Quickly after starting sth like:
with a small (7Gbyte) P2 dataset, the system load went over 200, making the system a bit sluggish. Heavy I/O I guess ..
top
showed the 8 python processes, each with 700 to 1200% in different states (R,S,D) ...This is by far more than I would expect when using
--threads 8
...Is there a smooth way to better control the CPUs used by
pod5 subset
and thus I/O throughput (avoiding sth liketaskset
)?This small dataset took ~7 minutes to finish, the larger datasets are 100x to 200x in size. So I wonder what is considered "best practice"? On shared servers this is quite a big issue.
Instead of writing a few thousand per-channel POD5 files, wouldn't it be convenient to write one or some more large(r) per-channel-sorted POD5 files?
Any ideas/comments/remarks are welcome :-)
The text was updated successfully, but these errors were encountered: