Executing a concurrent QUEENS run within a QUEENS run does not work #72
Labels
status: open
No solution for this issue has been provided
topic: other
Issue/PR that do not fit the topic: <type> labels
type: bug report
Issue/PR to highlight/fix a bug
Motivation and Context
Sometimes, it could be helpful to start a second QUEENS run within the function of a FunctionDriver, e.g. to compute an expectation with the MonteCarloIterator. This would further increase the modularity of QUEENS.
Current Behavior
QUEENS crashes when run_iterator is called within another run_iterator call.
Script with which this happenend: qrun_optimization.py
Error message:
/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use. Perhaps you already have a cluster running? Hosting the HTTP server on port 35417 instead warnings.warn( 2024-09-20 12:51:03,744 - distributed.scheduler - INFO - State start 2024-09-20 12:51:03,746 - distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:40593 2024-09-20 12:51:03,746 - distributed.scheduler - INFO - dashboard at: http://127.0.0.1:35417/status 2024-09-20 12:51:03,746 - distributed.scheduler - INFO - Registering Worker plugin shuffle 2024-09-20 12:51:03,751 - distributed.nanny - INFO - Start Nanny at: 'tcp://127.0.0.1:40085' 2024-09-20 12:51:03,752 - distributed.nanny - ERROR - Failed to start process Traceback (most recent call last): File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/nanny.py", line 452, in instantiate result = await self.process.start() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/nanny.py", line 752, in start await self.process.start() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/process.py", line 55, in _call_and_set_future res = func(*args, **kwargs) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/process.py", line 215, in _start process.start() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/multiprocessing/process.py", line 118, in start assert not _current_process._config.get('daemon'), \ AssertionError: daemonic processes are not allowed to have children 2024-09-20 12:51:03,752 - distributed.nanny - INFO - Closing Nanny at 'tcp://127.0.0.1:40085'. Reason: nanny-instantiate-failed 0%| | 0/1 [00:25<?, ?it/s] Traceback (most recent call last): File "/home/haeusel/workdir/scripts_rep/queens/beam_examples_presentation/optimization/qrun_optimization.py", line 104, in <module> main() File "/home/haeusel/workdir/scripts_rep/queens/beam_examples_presentation/optimization/qrun_optimization.py", line 95, in main 2024-09-20 12:51:28,866 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:43337. Reason: worker-handle-scheduler-connection-broken run_iterator(iterator, global_settings=global_settings) File "/home/haeusel/workdir/queens_rep/queens/main.py", line 68, in run_iterator iterator.run() File "/home/haeusel/workdir/queens_rep/queens/iterators/iterator.py", line 49, in run 2024-09-20 12:51:28,867 - distributed.core - INFO - Connection to tcp://127.0.0.1:41382 has been closed. 2024-09-20 12:51:28,867 - distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:43337', name: 0, status: running, memory: 0, processing: 1> (stimulus_id='handle-worker-cleanup-1726829488.8676085') 2024-09-20 12:51:28,867 - distributed.worker.state_machine - WARNING - Async instruction for <Task cancelled name="execute('run-5d416ec6-d1b7-4d0d-858f-2fb6cdaeeab1-0')" coro=<Worker.execute() done, defined at /home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/worker_state_machine.py:3615>> ended with CancelledError self.core_run() File "/home/haeusel/workdir/queens_rep/queens/iterators/optimization_iterator.py", line 328, in core_run 2024-09-20 12:51:28,868 - distributed.scheduler - INFO - Lost all workers self.solution = minimize( File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_minimize.py", line 689, in minimize res = _minimize_cg(fun, x0, args, jac, callback, **options) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_optimize.py", line 1686, in _minimize_cg 2024-09-20 12:51:28,873 - distributed.nanny - INFO - Closing Nanny gracefully at 'tcp://127.0.0.1:34977'. Reason: worker-handle-scheduler-connection-broken sf = _prepare_scalar_function(fun, x0, jac=jac, args=args, epsilon=eps, File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_optimize.py", line 332, in _prepare_scalar_function sf = ScalarFunction(fun, x0, args, grad, hess, File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 158, in __init__ self._update_fun() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 251, in _update_fun self._update_fun_impl() File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 155, in update_fun self.f = fun_wrapped(self.x) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in fun_wrapped fx = fun(np.copy(x), *args) File "/home/haeusel/workdir/queens_rep/queens/iterators/optimization_iterator.py", line 223, in objective_function f_value = self.eval_model(x_vec) File "/home/haeusel/workdir/queens_rep/queens/iterators/optimization_iterator.py", line 368, in eval_model f_batch.append(self.model.evaluate(position.reshape(1, -1))["result"].reshape(-1)) File "/home/haeusel/workdir/queens_rep/queens/models/simulation_model.py", line 35, in evaluate self.response = self.interface.evaluate(samples) File "/home/haeusel/workdir/queens_rep/queens/interfaces/job_interface.py", line 39, in evaluate output = self.scheduler.evaluate(samples_list, driver=self.driver) File "/home/haeusel/workdir/queens_rep/queens/schedulers/dask_scheduler.py", line 80, in evaluate for future in as_completed(futures): File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/site-packages/distributed/client.py", line 5680, in __next__ self.thread_condition.wait(timeout=0.100) File "/home/haeusel/.miniforge3/envs/queens/lib/python3.10/threading.py", line 324, in wait gotit = waiter.acquire(True, timeout) KeyboardInterrupt 2024-09-20 12:51:28,882 - distributed.scheduler - INFO - Retire worker addresses (0,) 2024-09-20 12:51:28,884 - distributed.nanny - INFO - Closing Nanny at 'tcp://127.0.0.1:34977'. Reason: nanny-close 2024-09-20 12:51:28,884 - distributed.nanny - INFO - Nanny asking worker to close. Reason: nanny-close 2024-09-20 12:51:32,885 - distributed.nanny - WARNING - Worker process still alive after 4.0 seconds, killing 2024-09-20 12:51:32,929 - distributed.nanny - INFO - Worker process 75004 was killed by signal 9 2024-09-20 12:51:32,929 - distributed.scheduler - INFO - Scheduler closing due to unknown reason... 2024-09-20 12:51:32,930 - distributed.scheduler - INFO - Scheduler closing all comms /home/haeusel/.miniforge3/envs/queens/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
The text was updated successfully, but these errors were encountered: