Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executing a concurrent QUEENS run within a QUEENS run does not work #72

Open
gilrrei opened this issue Dec 20, 2024 · 0 comments
Open
Labels
status: open No solution for this issue has been provided topic: other Issue/PR that do not fit the topic: <type> labels type: bug report Issue/PR to highlight/fix a bug

Comments

@gilrrei
Copy link
Contributor

gilrrei commented Dec 20, 2024

This issue was originally created by @leahaeusel on gitlab.lrz.de on 2024-09-20.

Motivation and Context

Sometimes, it could be helpful to start a second QUEENS run within the function of a FunctionDriver, e.g. to compute an expectation with the MonteCarloIterator. This would further increase the modularity of QUEENS.

Current Behavior

QUEENS crashes when run_iterator is called within another run_iterator call.

Script with which this happenend: qrun_optimization.py

Error message: /home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use. Perhaps you already have a cluster running? Hosting the HTTP server on port 35417 instead warnings.warn( 2024-09-20 12:51:03,744 - distributed.scheduler - INFO - State start 2024-09-20 12:51:03,746 - distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:40593 2024-09-20 12:51:03,746 - distributed.scheduler - INFO - dashboard at: http://127.0.0.1:35417/status 2024-09-20 12:51:03,746 - distributed.scheduler - INFO - Registering Worker plugin shuffle 2024-09-20 12:51:03,751 - distributed.nanny - INFO - Start Nanny at: 'tcp://127.0.0.1:40085' 2024-09-20 12:51:03,752 - distributed.nanny - ERROR - Failed to start process Traceback (most recent call last): File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/nanny.py", line 452, in instantiate result = await self.process.start() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/nanny.py", line 752, in start await self.process.start() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/process.py", line 55, in _call_and_set_future res = func(*args, **kwargs) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/process.py", line 215, in _start process.start() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/multiprocessing/process.py", line 118, in start assert not _current_process._config.get('daemon'), \ AssertionError: daemonic processes are not allowed to have children 2024-09-20 12:51:03,752 - distributed.nanny - INFO - Closing Nanny at 'tcp://127.0.0.1:40085'. Reason: nanny-instantiate-failed 0%| | 0/1 [00:25<?, ?it/s] Traceback (most recent call last): File "/home/haeusel/workdir/scripts_rep/queens/beam_examples_presentation/optimization/qrun_optimization.py", line 104, in <module> main() File "/home/haeusel/workdir/scripts_rep/queens/beam_examples_presentation/optimization/qrun_optimization.py", line 95, in main 2024-09-20 12:51:28,866 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:43337. Reason: worker-handle-scheduler-connection-broken run_iterator(iterator, global_settings=global_settings) File "/home/haeusel/workdir/queens_rep/queens/main.py", line 68, in run_iterator iterator.run() File "/home/haeusel/workdir/queens_rep/queens/iterators/iterator.py", line 49, in run 2024-09-20 12:51:28,867 - distributed.core - INFO - Connection to tcp://127.0.0.1:41382 has been closed. 2024-09-20 12:51:28,867 - distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:43337', name: 0, status: running, memory: 0, processing: 1> (stimulus_id='handle-worker-cleanup-1726829488.8676085') 2024-09-20 12:51:28,867 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('run-5d416ec6-d1b7-4d0d-858f-2fb6cdaeeab1-0')" coro=<Worker.execute() done, defined at /home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/worker_state_machine.py:3615>> ended with CancelledError self.core_run() File "/home/haeusel/workdir/queens_rep/queens/iterators/optimization_iterator.py", line 328, in core_run 2024-09-20 12:51:28,868 - distributed.scheduler - INFO - Lost all workers self.solution = minimize( File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_minimize.py", line 689, in minimize res = _minimize_cg(fun, x0, args, jac, callback, **options) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_optimize.py", line 1686, in _minimize_cg 2024-09-20 12:51:28,873 - distributed.nanny - INFO - Closing Nanny gracefully at 'tcp://127.0.0.1:34977'. Reason: worker-handle-scheduler-connection-broken sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps, File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_optimize.py", line 332, in _prepare_scalar_function sf = ScalarFunction(fun, x0, args, grad, hess, File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 158, in __init__ self._update_fun() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 251, in _update_fun self._update_fun_impl() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 155, in update_fun self.f = fun_wrapped(self.x) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in fun_wrapped fx = fun(np.copy(x), *args) File "/home/haeusel/workdir/queens_rep/queens/iterators/optimization_iterator.py", line 223, in objective_function f_value = self.eval_model(x_vec) File "/home/haeusel/workdir/queens_rep/queens/iterators/optimization_iterator.py", line 368, in eval_model f_batch.append(self.model.evaluate(position.reshape(1, -1))["result"].reshape(-1)) File "/home/haeusel/workdir/queens_rep/queens/models/simulation_model.py", line 35, in evaluate self.response = self.interface.evaluate(samples) File "/home/haeusel/workdir/queens_rep/queens/interfaces/job_interface.py", line 39, in evaluate output = self.scheduler.evaluate(samples_list, driver=self.driver) File "/home/haeusel/workdir/queens_rep/queens/schedulers/dask_scheduler.py", line 80, in evaluate for future in as_completed(futures): File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/client.py", line 5680, in __next__ self.thread_condition.wait(timeout=0.100) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/threading.py", line 324, in wait gotit = waiter.acquire(True, timeout) KeyboardInterrupt 2024-09-20 12:51:28,882 - distributed.scheduler - INFO - Retire worker addresses (0,) 2024-09-20 12:51:28,884 - distributed.nanny - INFO - Closing Nanny at 'tcp://127.0.0.1:34977'. Reason: nanny-close 2024-09-20 12:51:28,884 - distributed.nanny - INFO - Nanny asking worker to close. Reason: nanny-close 2024-09-20 12:51:32,885 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing 2024-09-20 12:51:32,929 - distributed.nanny - INFO - Worker process 75004 was killed by signal 9 2024-09-20 12:51:32,929 - distributed.scheduler - INFO - Scheduler closing due to unknown reason... 2024-09-20 12:51:32,930 - distributed.scheduler - INFO - Scheduler closing all comms /home/haeusel/.miniforge3/envs/queens/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

@gilrrei gilrrei added status: open No solution for this issue has been provided topic: other Issue/PR that do not fit the topic: <type> labels type: bug report Issue/PR to highlight/fix a bug labels Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: open No solution for this issue has been provided topic: other Issue/PR that do not fit the topic: <type> labels type: bug report Issue/PR to highlight/fix a bug
Projects
None yet
Development

No branches or pull requests

1 participant