rafiki.model

Core Classes

class rafiki.model.BaseModel(**knobs)[source]

Rafiki’s base model class that Rafiki models should extend. Rafiki models should implement all abstract methods according to their associated tasks’ specifications, together with the static method get_knob_config().

In the model’s __init__ method, call super().__init__(**knobs) as the first line, followed by the model’s initialization logic. The model should be initialize itself with knobs, a set of generated knob values for the instance, and possibly save the knobs’ values as attribute(s) of the model instance. These knob values will be chosen by Rafiki based on the model’s knob config.

For example:

def __init__(self, **knobs):
    super().__init__(**knobs)
    self.__dict__.update(knobs)
    ...
    self._build_model(self.knob1, self.knob2)
Parameters:knobs (dict[str, any]) – Dictionary of knob values for this model instance
static get_knob_config()[source]

Return a dictionary defining this model class’ knob configuration (i.e. list of knob names, their data types and their ranges).

Returns:Dictionary defining this model’s knob configuration
Return type:dict[str, rafiki.model.BaseKnob]
train(dataset_uri)[source]

Train this model instance with given dataset and initialized knob values.

Parameters:dataset_uri (str) – URI of the train dataset in a format specified by the task
evaluate(dataset_uri)[source]

Evaluate this model instance with given dataset after training. This will be called only when model is trained.

Parameters:dataset_uri (str) – URI of the test dataset in a format specified by the task
Returns:Accuracy as float from 0-1 on the test dataset
Return type:float
predict(queries)[source]

Make predictions on a batch of queries with this model instance after training. Each prediction should be JSON serializable. This will be called only when model is trained.

Parameters:queries (list[any]) – List of queries, where a query is in a format specified by the task
Returns:List of predictions, where a prediction is in a format specified by the task
Return type:list[any]
dump_parameters()[source]

Return a dictionary of model parameters that fully defines this model instance’s trained state. This dictionary should be serializable by the Python’s pickle module. This will be used for trained model serialization within Rafiki. This will be called only when model is trained.

Returns:Dictionary of model parameters
Return type:dict[string, any]
load_parameters(params)[source]

Load a dictionary of model parameters into this model instance. This will be used for trained model deserialization within Rafiki. The model will be considered trained subsequently.

Parameters:params (dict[string, any]) – Dictionary of model parameters
destroy()[source]

Destroy this model instance, closing any sessions or freeing any connections. No other methods will be called subsequently.

class rafiki.model.BaseKnob(knob_args={})[source]

The base class for a knob type.

Knob Classes

class rafiki.model.CategoricalKnob(values)[source]

Knob type representing a categorical value of type int, float, bool or str. A generated value of this knob would be an element of values.

class rafiki.model.IntegerKnob(value_min, value_max, is_exp=False)[source]

Knob type representing any int value within a specific interval [value_min, value_max]. is_exp specifies whether the knob value should be scaled exponentially.

class rafiki.model.FloatKnob(value_min, value_max, is_exp=False)[source]

Knob type representing any float value within a specific interval [value_min, value_max]. is_exp specifies whether the knob value should be scaled exponentially.

class rafiki.model.FixedKnob(value)[source]

Knob type representing a single fixed value of type int, float, bool or str. Essentially, this represents a knob that does not require tuning.

Utility Classes & Methods

model.test_model_class(model_class, task, dependencies, train_dataset_uri, test_dataset_uri, queries=[], knobs=None)

Tests whether a model class is properly defined by running a full train-inference flow. The model instance’s methods will be called in an order similar to that in Rafiki. It is assumed that all of the model’s dependencies have been installed in the current Python environment.

Parameters:
  • model_file_path (str) – Path to a single Python file that contains the definition for the model class
  • model_class (type) – The name of the model class inside the Python file. This class should implement rafiki.model.BaseModel
  • task (str) – Task type of model
  • dependencies (dict[str, str]) – Model’s dependencies
  • train_dataset_uri (str) – URI of the train dataset for testing the training of model
  • test_dataset_uri (str) – URI of the test dataset for testing the evaluating of model
  • queries (list[any]) – List of queries for testing predictions with the trained model
  • knobs (dict[str, any]) – Knobs to train the model with. If not specified, knobs from an advisor will be used
Returns:

The trained model

class rafiki.model.ModelLogger[source]

Allows models to log messages and metrics during model training, and define plots for visualization of model training.

To use this logger, import the global logger instance from the module rafiki.model.

For example:

from rafiki.model import logger, BaseModel
...
class MyModel(BaseModel):
    ...
    def train(self, dataset_uri):
        ...
        logger.log('Starting model training...')
        logger.define_plot('Precision & Recall', ['precision', 'recall'], x_axis='epoch')
        ...
        logger.log(precision=0.1, recall=0.6, epoch=1)
        ...
        logger.log('Ending model training...')
        ...
define_loss_plot()[source]

Convenience method of defining a plot of loss against epoch. To be used with rafiki.model.ModelLogger.log_loss().

log_loss(loss, epoch)[source]

Convenience method for logging loss against epoch. To be used with rafiki.model.ModelLogger.define_loss_plot().

define_plot(title, metrics, x_axis=None)[source]

Defines a plot for a set of metrics for analysis of model training. By default, metrics will be plotted against time.

For example, a model’s precision & recall logged with e.g. log(precision=0.1, recall=0.6, epoch=1) can be visualized in the plots generated by define_plot('Precision & Recall', ['precision', 'recall']) (against time) or define_plot('Precision & Recall', ['precision', 'recall'], x_axis='epoch') (against epochs).

Only call this method in rafiki.model.BaseModel.train().

Parameters:
  • title (str) – Title of the plot
  • metrics (str[]) – List of metrics that should be plotted on the y-axis
  • x_axis (str) – Metric that should be plotted on the x-axis, against all other metrics. Defaults to 'time', which is automatically logged
log(msg='', **metrics)[source]

Logs a message and/or a set of metrics at a single point in time.

Logged messages will be viewable on Rafiki’s administrative UI.

To visualize logged metrics on plots, a plot must be defined via rafiki.model.ModelLogger.define_plot().

Only call this method in rafiki.model.BaseModel.train() and rafiki.model.BaseModel.evaluate().

Parameters:
  • msg (str) – Message to be logged
  • metrics (dict[str, int|float]) – Set of metrics & their values to be logged as { <metric>: <value> }, where <value> should be a number.
class rafiki.model.ModelDatasetUtils[source]

Collection of utility methods to help with the loading of datasets.

To use these utility methods, import the global dataset_utils instance from the module rafiki.model.

For example:

from rafiki.model import dataset_utils
...
def train(self, dataset_uri):
    ...
    dataset_utils.load_dataset_of_image_files(dataset_uri)
    ...
load_dataset_of_corpus(dataset_uri, tags=['tag'], split_by='\\n')[source]

Loads dataset with type CORPUS.

Parameters:dataset_uri (str) – URI of the dataset file
Returns:An instance of CorpusDataset.
load_dataset_of_image_files(dataset_uri, image_size=None)[source]

Loads dataset with type IMAGE_FILES.

Parameters:
  • dataset_uri (str) – URI of the dataset file
  • image_size (str) – dimensions to resize all images to (None for no resizing)
Returns:

An instance of ImageFilesDataset.

resize_as_images(images, image_size)[source]

Resize a list of N grayscale images to another size.

Parameters:
  • images (int[][][]) – images to resize as a N x 2D lists (grayscale)
  • image_size (int) – dimensions to resize all images to (None for no resizing)
Returns:

images as N x 2D numpy arrays

download_dataset_from_uri(dataset_uri)[source]

Maybe download the dataset at URI, ensuring that the dataset ends up in the local filesystem.

Parameters:dataset_uri (str) – URI of the dataset file
Returns:file path of the dataset file in the local filesystem
class rafiki.model.ImageFilesDataset(dataset_path, image_size)[source]

Class that helps loading of dataset with type IMAGE_FILES

classes is the number of image classes. Each dataset example is (image, class) where each image is a 2D list of integers (0, 255) as grayscale, each class is an integer from 0 to (k - 1).

class rafiki.model.CorpusDataset(dataset_path, tags, split_by)[source]

Class that helps loading of dataset with type CORPUS

tags is the expected list of tags for each token in the corpus. Dataset samples are grouped as sentences by a delimiter token corresponding to split_by.

tag_num_classes is a list of <number of classes for a tag>, in the same order as tags. Each dataset sample is [[token, <tag_1>, <tag_2>, …, <tag_k>]] where each token is a string, each tag_i is an integer from 0 to (k_i - 1) as each token’s corresponding class for that tag, with tags appearing in the same order as tags.