The APIs on this page require Geneva v0.10.0 or later.
- Local Ray
- KubeRay: create a cluster on demand in your Kubernetes cluster.
- Existing Ray Cluster
Ray Clusters
Local Ray
To execute jobs without an external Ray cluster, you can useLocalRayContext. This will auto-create a Ray cluster on your machine. Because it’s on your laptop/desktop, this is only suitable for prototyping on small datasets. But it is the easiest way to get started. Simply define the UDF, add a column, call Connection.local_ray_context(), and trigger the job:
KubeRay
If you have a Kubernetes cluster with kuberay-operator, you can use Geneva to automatically provision RayClusters. To do so, define a Geneva cluster, representing the resource needs, Docker images, and other Ray configurations necessary to run your job. Make sure your cluster has adequate compute resources to provision the RayCluster. Here is an example Geneva cluster definition:External Ray cluster
If you already have a Ray cluster, Geneva can execute jobs against it too. You do so by defining a Geneva cluster withGenevaCluster.create_external() which has the address of the cluster. Here’s an example:
Dependencies and Manifests
Most UDFs require some dependencies: helper libraries likepillow for image processing, pre-trained models like open-clip to calculate embeddings, or even small config files. We have three ways to get them to workers:
- Define dependencies explicitly in a manifest
- Bake dependencies into an image
- Auto-upload local dependencies
Define dependencies explicitly in a manifest
We recommend defining dependencies explicitly: it’s the easiest way to understand exactly what’s running, and the least error-prone. To do so, define a Manifest, like so:pip install lancedb numpy on them. You can also use .requirements_path(path) to point to a local requirements.txt file instead of listing packages inline. Note that attempting to use both .pip() and .requirements_path() will raise an exception.
For conda-based dependencies, use GenevaManifest.create_conda() instead:
Python
.conda_environment_path(path) to point to a local environment.yml file. Note that attempting to use both .conda() and .conda_environment_path() will raise an exception.
Bake dependencies into an image
Because thepip or conda methods involve installing packages, they will incur some startup costs. When your jobs are stable in production, therefore, it will be faster to build all your dependencies into the workers’ images, then specify them like so:
Auto-upload local dependencies
Geneva can package your local environment and send it to Ray workers. This includes the current workspace root (if you’re in a python repo) or the current working directory (if you’re not).GenevaManifest.create_site() additionally uploads your Python site-packages (defined by site.getsitepackages()) to workers. This is not recommended for production use, as it is prone to issues like architecture mismatches of built dependencies, but it can be a good way to iterate quickly during development.
To upload site packages:
Python
What’s in a manifest?
Here’s a summary of what’s in a manifest and how you can define it across the three manifest builder types.| Contents | How you can define it | Factory method |
|---|---|---|
| Local working directory (or workspace root, if in a python repo) | Will be uploaded automatically. | All |
| Local python packages (site-packages) | Uploaded automatically. | create_site() |
| Pip packages to be installed | Use .pip(packages: list[str]) or .requirements_path(path: str). See Ray’s RuntimeEnv docs for details. | create_pip() |
| Conda packages to be installed | Use .conda(deps: dict[str, Any]) or .conda_environment_path(path: str). See Ray’s RuntimeEnv docs for details. | create_conda() |
Local python packages outside of site_packages | Use .py_modules(modules: list[str]) or .add_py_module(module: str). See Ray’s RuntimeEnv docs for details. | All |
| Container image for head node | Use .head_image(head_image: str) or default_head_image() to use the default. Note that, if the image is also defined in the GenevaCluster, the image set here in the Manifest will take priority. | All |
| Container image for worker nodes | Use .worker_image(worker_image: str) or default_worker_image() to use the default for the current platform. As with the head image, this takes priority over any images set in the Cluster. | All |
.delete_local_zips(False) and .local_zip_output_dir(path) then examine the zip files in path.
Putting it all together: Execution Contexts
An execution context represents the concrete execution environment (Cluster and Manifest) used to execute a distributed job. Callingcontext will enter a context manager that will provision an execution cluster and execute the Job using the Cluster and Manifest definitions provided. Because you’ve already defined the cluster and manifest, you can just reference them by name. Note that providing a manifest is optional. Once completed, the context manager will automatically de-provision the cluster.