Config
kubr.config.runner.RunnerConfig
¶
Bases: BaseModel
RunnerConfig is the configuration for the runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment |
ExperimentConfig
|
Experiment configuration. |
required |
container |
ContainerConfig
|
Container configuration. |
required |
resources |
ResourceConfig
|
Resource configuration. |
required |
type |
JobType
|
Job type. Defaults to JobType.torchrun. |
required |
backend |
JobBackend
|
Job backend. Defaults to JobBackend.Volcano. |
required |
code |
Optional[CodePersistenceConfig]
|
Code persistence configuration. Defaults to None. |
required |
data |
Optional[DataConfig]
|
Data configuration. Defaults to None. |
required |
Source code in kubr/config/runner.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
kubr.config.runner.ExperimentConfig
¶
Bases: BaseModel
ExperimentConfig is the configuration for the experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
Name of the experiment. |
required |
namespace |
str
|
Namespace of the experiment. |
required |
queue |
Optional[str]
|
Queue to submit the experiment to. Defaults to "default". |
required |
job_retries |
int
|
Number of retries for the job. Defaults to 0. |
required |
worker_max_retries |
int
|
Maximum number of retries for the task. Defaults to 10. |
required |
Source code in kubr/config/runner.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | |
kubr.config.runner.ResourceConfig
¶
Bases: BaseModel
ResourceConfig is the configuration for the resources.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
nodes |
int
|
Number of replicas to run. Defaults to 1. |
required |
cpu |
int
|
Number of CPUs to request. Defaults to 0. |
required |
memory |
int
|
Memory in GB to request. Defaults to 0. |
required |
gpu |
int
|
Number of GPUs to request. Defaults to 0. |
required |
ib |
Union[int, Literal['auto']]
|
Number of Infiniband devices to request. Defaults to 0. |
required |
ib_device |
str
|
Name of the Infiniband device to request. Defaults to "nvidia.com/hostdev". |
required |
Source code in kubr/config/runner.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | |
kubr.config.runner.DataConfig
¶
Bases: BaseModel
DataConfig is the configuration for the data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
volumes |
Optional[List[VolumeMount]]
|
List of volumes to mount. Defaults to []. |
required |
Source code in kubr/config/runner.py
73 74 75 76 77 78 79 80 81 | |
kubr.config.runner.ContainerConfig
¶
Bases: BaseModel
ContainerConfig is the configuration for the container.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image |
str
|
Image to run. |
required |
entrypoint |
Optional[str]
|
Entrypoint to run. Defaults to None. |
required |
env |
Dict[str, str]
|
Environment variables to pass to the entrypoint. Defaults to {}. |
required |
secrets |
Optional[List[SecretConfig]]
|
Secrets to pass to the entrypoint. Defaults to None. |
required |
Source code in kubr/config/runner.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
kubr.config.runner.VolumeMount
¶
Bases: BaseModel
VolumeMount is the configuration for a volume mount.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
Name of the volume. |
required |
type |
Literal['hostPath']
|
Type of the volume. |
required |
mount_path |
str
|
Mount path of the volume. |
required |
Source code in kubr/config/runner.py
53 54 55 56 57 58 59 60 61 62 63 64 | |