qlauncher.workflow.slurm_job_manager#

Summary#

Classes:

Reference#

class qlauncher.workflow.slurm_job_manager.SlurmJobManager(sbatch_exe: str = 'sbatch', scancel_exe: str = 'scancel', slurm_options: dict[str, Any] | None = None, env_setup: list[str] | None = None)[source]#

Bases: BaseJobManager

submit(function, cores: int = 1, **kwargs) str[source]#

Creates a QLauncher instance from problem, algorithm and backend and forwards it to submit_launcher().

Parameters:
  • problem (Problem | Model) – Problem to be solved.

  • algorithm (Algorithm) – Algorithm to be used.

  • backend (Backend) – Backend on which the algorithm will be executed.

  • cores (int, optional) – Number of CPU cores per task requested from Slurm (mapped to --cpus-per-task). Defaults to 1.

Returns:

Slurm job ID returned by sbatch.

Return type:

str

Raises:

RuntimeError – If sbatch returns a non-zero exit code.

wait_for_a_job(job_id: str | None = None, timeout: float | None = None)[source]#

Waits until a Slurm job finishes and returns its ID.

Parameters:
  • job_id (str | None, optional) – ID of the job to wait for. If None, the first job in jobs that is not yet marked as finished is selected. Defaults to None.

  • timeout (float | None, optional) – Maximum time to wait in seconds. If None, wait indefinitely. Defaults to None.

Raises:
  • ValueError – If job_id is None and there are no jobs left.

  • TimeoutError – If the timeout is exceeded before the job finishes.

  • RuntimeError – If the job disappears from squeue without producing a result file, or if it finishes in a non-successful state.

Returns:

ID of the finished job.

Return type:

str

read_results(job_id)[source]#

Reads the result of a finished job from its output file.

Parameters:

job_id (str) – Slurm job ID returned by submit() or submit_launcher().

Raises:
  • KeyError – If job_id is not known to this manager.

  • FileNotFoundError – If the expected output file does not exist.

Returns:

Deserialized result object produced by the worker process.

Return type:

Result

cancel(job_id: str) None[source]#

Cancel a given Slurm job via scancel.

Parameters:

job_id (str) – Slurm job id returned by submit().

Raises:
  • KeyError – If job_id is not known to this manager.

  • RuntimeError – If scancel fails.

clean_up()[source]#

Removes temporary files created for all tracked jobs.

run(function: Callable[[...], Any], cores: int = 1, **kwargs) Any[source]#

Convenience method: submit job, wait for completion, read results, and cleanup.

This method handles the complete lifecycle of a job execution.

Parameters:
  • function (Callable[[...], Any]) – Function to be executed.

  • cores (int) – Number of CPU cores per task.

  • **kwargs – Manager-specific additional arguments.

Returns:

Result object produced by the job.

Return type:

Any