Dask-Optuna¶
Dask-Optuna helps improve integration between Optuna and Dask.
What Dask-Optuna does¶
Dask-Optuna leverages Optuna’s existing distributed optimization capabilities to run
optimization trials in parallel on a Dask cluster. It does this by providing a
Dask-compatible dask_optuna.DaskStorage
storage class which wraps an
Optuna storage class (e.g. Optuna’s in-memory or sqlite storage) and can be used
directly by Optuna. For example:
import dask.distributed
import dask_optuna
client = dask.distributed.Client()
# Wraps Optuna's in-memory storage
storage_1 = dask_optuna.DaskStorage()
# Wraps Optuna's SQLite DB storage
storage_2 = dask_optuna.DaskStorage("sqlite:///example.db")
The underlying Optuna storage object lives on the cluster’s scheduler and any method
calls on the DaskStorage
instance results in the same method being called on the
underlying Optuna storage object.
This offers two primary benefits:
Helps extend Optuna’s
InMemoryStorage
class to run across multiple processes. This is important when using remote workers in a Dask cluster or situations where Python’s GIL leads to less-than-ideal parallelization.Reduces setup when using persistent storage (e.g. creating a SQLite DB that’s globally available) as the underlying Optuna storage class on the scheduler is accessible all workers in a Dask cluster.
Example¶
import optuna
import joblib
import dask.distributed
import dask_optuna
def objective(trial):
x = trial.suggest_uniform("x", -10, 10)
return (x - 2) ** 2
with dask.distributed.Client() as client:
# Create a study using Dask-compatible storage
storage = dask_optuna.DaskStorage()
study = optuna.create_study(storage=storage)
# Optimize in parallel on your Dask cluster
with joblib.parallel_backend("dask"):
study.optimize(objective, n_trials=100, n_jobs=-1)
print(f"best_params = {study.best_params}")
Community discussion¶
Discussions on improving integration between Dask and Optuna are taking place in both the Dask issue tracker and Optuna issue tracker. Please feel free to join these conversations if you’d like to get involved.
If you have feedback or thoughts on how Dask-Optuna may be improved, please feel free to open an issue in Dask-Optuna’s issue tracker.
FAQ¶
When would I use this?¶
Dask-Optuna is useful if you want to use Optuna’s InMemoryStorage
when running trials in
parallel across multiple processes or if the workers in your Dask cluster don’t use the same
filesystem that your Dask Client
uses. If, for example, you’re using a
dask.distributed.LocalCluster
you may be better served by using Optuna’s built in storage classes.