How Model Tuning Works¶

Traditionally, getting the best performing model on a dataset involves involves tedious manual hyperparameter tuning. On Rafiki, model hyperparameter tuning is automated by conducting multiple trials in a train job.

Over the trials, the model is initialized with different hyperparameters (knobs), trained and evaluated. A hyperparameter tuning advisor on Rafiki ingests the validation scores from these trials to suggest better hyperparameters for future trials, to maximise performance of a model on the dataset. At the very end of the train job, Rafiki could deploy the best-scoring trials for predictions.

Defining Hyperparameter Search Space¶

You’ll define a search space of hyperparameters (knob configuration) in a declarative manner with the static method rafiki.model.BaseModel.get_knob_config(). The method should return a mapping of hyperparameter names (knob names) to hyperparameter specifications (knob specifications). A hyperparameter specification is an instance of a class that extends rafiki.model.BaseKnob, which is limited to any of the following:

Refer to their documentation for more details on each type of knob specification, and refer to Sample Models to see examples of how knob configurations are declared.

Model Policies¶

rafiki.model.PolicyKnob is a special type of knob specification that allows Rafiki to configure the behaviour of a model on a trial basis.

In a modern model hyperparameter tuning scheme, a model tends to switch between different “modes”, or so we call policies. For example, when you tune your model manually, you might want the model to do early-stopping for the first e.g. 100 trials, then conduct a final trial for a full e.g. 300 epochs. As such, the concept of model policies in Rafiki enables Rafiki’s tuning advisor to externally configure your model to switch between these “modes”.

Your model communicates to Rafiki which policies it supports by adding PolicyKnob(policy_name) to your model’s knob_configuration. On the other hand, during training, Rafiki configures the activation of the model’s policies on a trial basis by realising the values of PolicyKnob to either True (activated) or False (not activated).

For example, if Rafiki’s tuning scheme for your model requires your model to engage in e.g. early-stopping for all trials except for the final trial, if your model has { 'early_stop': PolicyKnob('EARLY_STOP'), ... }, Rafiki will pass early_stop=False for just the final trial as part of its knobs, and pass early_stop=True for all other trials. Your model would situationally do early-stopping based on the value of the knob early-stop.

Below is the list of officially recognized model policies:

Policy	Description
`SHARE_PARAMS`	Whether model should load the shared parameters passed in `train()`
`EARLY_STOP`	Whether model should stop training early in `train()`, e.g. with use of early stopping or reduced no. of epochs
`SKIP_TRAIN`	Whether model should skip training its parameters
`QUICK_EVAL`	Whether model should stop evaluation early in `evaluate()`, e.g. by evaluating on only a subset of their validation dataset
`DOWNSCALE`	Whether a smaller version of the model should be constructed e.g. with fewer layers

Model Tuning Schemes¶

At a model level, Rafiki automatically selects the appropriate tuning scheme (advisor) based on the composition of the model’s knob configuration and the incoming train job’s budget.

Specifically, it employs the following rules, in the given order, to select the type of advisor to use:

Rule	Tuning Scheme
Only `PolicyKnob`, `FixedKnob`	Only conduct a single trial
Only `PolicyKnob`, `FixedKnob`, `FloatKnob`, `IntegerKnob`, `CategoricalKnob`, with policy `SHARE_PARAMS`	Hyperparameter tuning with Bayesian Optimization & cross-trial parameter sharing. Share globally best-scoring parameters across workers in a epsilon greedy manner. Optionally employ early stopping (`EARLY_STOP` policy) for all trials. More details at Hyperparameter Tuning with Bayesian Optimization & Parameter Sharing.
Only `PolicyKnob`, `FixedKnob`, `FloatKnob`, `IntegerKnob`, `CategoricalKnob`	Hyperparameter tuning with Bayesian Optimization. Optionally employ early stopping (`EARLY_STOP` policy) before the last 1h, and perform standard trials during the last 1h.
Only `PolicyKnob`, `FixedKnob`, `ArchKnob`, with policies `SHARE_PARAMS`, `EARLY_STOP` `SKIP_TRAIN`, `QUICK_EVAL` `DOWNSCALE`, and `TIME_HOURS` budget >= 12h	Architecture tuning with cell-based “Efficient Neural Architecture Search via Parameter Sharing”. It conducts ENAS architecture search before the last 12h, then performs the final training of the best architectures found in the last 12h. More details at Architecture Tuning with ENAS.
All others	Hyperparameter tuning with uniformly random knobs

The following subsections briefly explain how to leverage on the various model tuning schemes on Rafiki.

Hyperparameter Tuning with Bayesian Optimization¶

To tune the hyperparameters of your model, where the hyperparameters are simply floats, integers or categorical, use rafiki.model.FixedKnob, rafiki.model.CategoricalKnob, rafiki.model.FloatKnob & rafiki.model.IntegerKnob.

Hyperparameter Tuning with Bayesian Optimization & Early Stopping¶

To additionally employ early stopping during hyperparameter tuning to speed up the tuning process, declare an extra rafiki.model.PolicyKnob of the EARLY_STOP policy (see Model Policies).

Refer to the sample model ./examples/models/image_classification/TfFeedForward.py.

Architecture Tuning with ENAS¶

To tune the architecture for your model with the modern architecture search algorithm “Efficient Neural Architecture Search via Parameter Sharing” (ENAS), declare a rafiki.model.ArchKnob and offer the policies SHARE_PARAMS, EARLY_STOP, SKIP_TRAIN, QUICK_EVAL and DOWNSCALE (see Model Policies). Specifically, you’ll need your model to support parameter sharing, stopping training early, skipping the training step, evaluating on a subset of the validation dataset, and downscaling the model e.g. to use fewer layers. These policies are critical in the speed & performance of ENAS. See Deep Dive on ENAS to understand more about Rafiki’s implementation of ENAS.

Refer to the sample model ./examples/models/image_classification/TfEnas.py and its corresponding usage script ./examples/scripts/image_classification/run_enas.py to better understand how to do architecture tuning.

Deep Dive on ENAS¶

The ENAS paper outlines a new methodology for automatic neural network construction, speeding up the original Neural Architecture Search (NAS) methodology by 1000x without affecting its ability to search for a competitive architecture. The authors made the crucial observation that 2 different architectures would share a common subgraph, and the model parameters in that subgraph could be reused across trials without having to re-train these parameters from scratch every trial.

The following is an overview of how ENAS works. As explained in the ENAS paper, during an ENAS search for best CNN architecture (ENAS Search), there is an alternation between 2 phases: training of the ENAS CNN’s shared parameters (CNN Train Phase), and the training of the ENAS controller (Controller Train Phase). While CNN parameters are carried over the phases, the CNN’s shared parameters are not trained during Controller Train Phases. After ENAS Search is done, there is a final training of the best CNN architecture found (ENAS Train), this time initializing its CNN parameters from scratch,

On Rafiki, we’ve replicated the Cell-Based ENAS controller for image classification as one of Rafiki’s tuning scheme and a Rafiki model TfEnas, with very close reference to author’s code. In this specific setup for ENAS, ENAS Search is done with the construction of a single supergraph of all possible architectures, while ENAS Train is done with the construction of a fixed graph of the best architecture (with slight architectural differences from ENAS Search). Each CNN Train Phase involves training the CNN for 1 epoch, while within each Controller Train Phase, the controller is trained for 30 steps. In each controller step, 10 architectures are sampled from the controller, evaluated on the ENAS CNN by dynamically changing its architecture, and losses based on validation accuracies are back-propagated in the controller to update the controller’s parameters. Each validation accuracy is computed on only a batch of the validation dataset. The alternation between CNN Train Phase and Controller Train Phase happens for X cycles during ENAS Search, and close to the end of training, during ENAS Train, architecture samples with highest validation accuracies, this time computed on the full validation dataset, would be trained from scratch to arrive at final best models.

We’ve generalized the ENAS controller, its architecture encoding scheme and its overall tuning scheme on Rafiki, such that Rafiki models can leverage on architecture tuning with a flexible architecture encoding, and Rafiki’s application developers can train with these models in an end-to-end manner.

We’ve also devised a simple, yet effective strategy to run ENAS in a distributed setting. When given multiple GPUs, Rafiki performs ENAS locally at each worker in a train job, with these workers sharing a central ENAS controller.