Quick Start (Application Developers)

As an App Developer, you can manage train & inference jobs on Rafiki.

To learn more about what you can do on Rafiki, explore the methods of rafiki.client.Client.

We assume that you have access to a running instance of Rafiki Admin at <rafiki_host>:<admin_port> and Rafiki Admin Web at <rafiki_host>:<admin_web_port>.

Installing the client

  1. Install Python 3.6 such that the python and pip point to the correct installation of Python (see Installing Python)

  2. Clone the project at https://github.com/nginyc/rafiki (e.g. with Git)

  3. Within the project’s root folder, install Rafiki Client’s dependencies by running:

    pip install -r ./rafiki/requirements.txt
    

Initializing the client

Example:

from rafiki.client import Client
client = Client(admin_host='localhost', admin_port=3000)
client.login(email='app_developer@rafiki', password='rafiki')

Listing available models by task

Example:

client.get_available_models(task='IMAGE_CLASSIFICATION')

Output:

[{'access_right': 'PRIVATE',
 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT',
 'dependencies': {'tensorflow': '1.12.0'},
 'id': '45df3f34-53d7-4fb8-a7c2-55391ea10030',
 'name': 'TfFeedForward',
 'task': 'IMAGE_CLASSIFICATION',
 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'},
 {'access_right': 'PRIVATE',
 'datetime_created': 'Mon, 17 Dec 2018 07:06:03 GMT',
 'dependencies': {'scikit-learn': '0.20.0'},
 'id': 'd0ea96ce-478b-4167-8a84-eb36ae631235',
 'name': 'SkDt',
 'task': 'IMAGE_CLASSIFICATION',
 'user_id': 'fb5671f1-c673-40e7-b53a-9208eb1ccc50'}]

Creating a train job

To create a model training job, you’ll need to submit your dataset and a target task (see Supported Tasks), together with your app’s name. You’ll need to prepare your dataset in a format specified by the target task, and upload it to a publicly accessible URL.

After creating a train job, you can monitor it on Rafiki Admin Web (see Using Rafiki’s Admin Web).

Refer to the parameters of rafiki.client.Client.create_train_job() for configuring how your train job runs on Rafiki, such as enabling GPU usage & specifying which models to use.

Example:

client.create_train_job(
    app='fashion_mnist_app',
    task='IMAGE_CLASSIFICATION',
    train_dataset_uri='https://github.com/nginyc/rafiki-datasets/blob/master/fashion_mnist/fashion_mnist_for_image_classification_train.zip?raw=true',
    test_dataset_uri='https://github.com/nginyc/rafiki-datasets/blob/master/fashion_mnist/fashion_mnist_for_image_classification_test.zip?raw=true',
    budget={ 'MODEL_TRIAL_COUNT': 5 }
)

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}

Note

The datasets in the above example have been pre-processed to conform to the task’s dataset specification. The code that does this pre-processing from the original Fashion MNIST dataset is available at ./examples/datasets/image_classification/load_mnist_format.py.

Listing train jobs

Example:

client.get_train_jobs_of_app(app='fashion_mnist_app')

Output:

[{'app': 'fashion_mnist_app',
'app_version': 1,
'budget': {'MODEL_TRIAL_COUNT': 5},
'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
'datetime_stopped': None,
'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8',
'status': 'RUNNING',
'task': 'IMAGE_CLASSIFICATION',
'test_dataset_uri': 'https://github.com/nginyc/rafiki-datasets/blob/master/fashion_mnist/fashion_mnist_for_image_classification_test.zip?raw=true',
'train_dataset_uri': 'https://github.com/nginyc/rafiki-datasets/blob/master/fashion_mnist/fashion_mnist_for_image_classification_train.zip?raw=true'}]

Retrieving the latest train job’s details

Example:

client.get_train_job(app='fashion_mnist_app')

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT',
'id': 'ec4db479-b9b2-4289-8086-52794ffc71c8',
'status': 'STOPPED',
'task': 'IMAGE_CLASSIFICATION',
'test_dataset_uri': 'https://github.com/nginyc/rafiki-datasets/blob/master/fashion_mnist/fashion_mnist_for_image_classification_test.zip?raw=true',
'train_dataset_uri': 'https://github.com/nginyc/rafiki-datasets/blob/master/fashion_mnist/fashion_mnist_for_image_classification_train.zip?raw=true',
'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
            'datetime_stopped': 'Mon, 17 Dec 2018 07:11:14 GMT',
            'model_name': 'SkDt',
            'replicas': 2,
            'service_id': '2ada1ff3-84e9-4eca-bac9-241cd8c765ef',
            'status': 'STOPPED'},
            {'datetime_started': 'Mon, 17 Dec 2018 07:08:05 GMT',
            'datetime_stopped': 'Mon, 17 Dec 2018 07:11:42 GMT',
            'model_name': 'TfFeedForward',
            'replicas': 2,
            'service_id': '81ff23a7-ddd0-4a62-9d86-a3cc985ca6fe',
            'status': 'STOPPED'}]}

Listing best trials of the latest train job

Example:

client.get_best_trials_of_train_job(app='fashion_mnist_app')

Output:

[{'datetime_started': 'Mon, 17 Dec 2018 07:09:17 GMT',
'datetime_stopped': 'Mon, 17 Dec 2018 07:11:38 GMT',
'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4',
'knobs': {'batch_size': 32,
            'epochs': 3,
            'hidden_layer_count': 2,
            'hidden_layer_units': 36,
            'image_size': 32,
            'learning_rate': 0.014650971133579896},
'model_name': 'TfFeedForward',
'score': 0.8269},
{'datetime_started': 'Mon, 17 Dec 2018 07:08:38 GMT',
'datetime_stopped': 'Mon, 17 Dec 2018 07:11:11 GMT',
'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf',
'knobs': {'criterion': 'entropy', 'max_depth': 4},
'model_name': 'SkDt',
'score': 0.6686}]

Creating an inference job with the latest train job

Your app’s users will make queries to the /predict endpoint of predictor_host over HTTP.

To create an model serving job, you’ll have to wait for your train job to stop. Then, you’ll submit the app name associated with the train job (with a status of STOPPED). The inference job would be created from the best trials from that train job.

Example:

client.create_inference_job(app='fashion_mnist_app')

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'id': '0477d03c-d312-48c5-8612-f9b37b368949',
'predictor_host': '127.0.0.1:30001',
'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}

Listing inference jobs

Example:

client.get_inference_jobs_of_app(app='fashion_mnist_app')

Output:

{'app': 'fashion_mnist_app',
  'app_version': 1,
  'datetime_started': 'Mon, 17 Dec 2018 07:15:12 GMT',
  'datetime_stopped': None,
  'id': '0477d03c-d312-48c5-8612-f9b37b368949',
  'predictor_host': '127.0.0.1:30000',
  'status': 'RUNNING',
  'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8'}

Retrieving details of running inference job

Example:

client.get_running_inference_job(app='fashion_mnist_app')

Output:

{'app': 'fashion_mnist_app',
'app_version': 1,
'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT',
'datetime_stopped': None,
'id': '09e5040e-2134-411b-855f-793927c80b4b',
'predictor_host': '127.0.0.1:30000',
'status': 'RUNNING',
'train_job_id': 'ec4db479-b9b2-4289-8086-52794ffc71c8',
'workers': [{'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT',
            'datetime_stopped': None,
            'replicas': 2,
            'service_id': '661035bb-3966-46e8-828c-e200960a76c0',
            'status': 'RUNNING',
            'trial': {'id': '1b7dc65a-87ae-4d42-9a01-67602115a4a4',
                        'knobs': {'batch_size': 32,
                                'epochs': 3,
                                'hidden_layer_count': 2,
                                'hidden_layer_units': 36,
                                'image_size': 32,
                                'learning_rate': 0.014650971133579896},
                        'model_name': 'TfFeedForward',
                        'score': 0.8269}},
            {'datetime_started': 'Mon, 17 Dec 2018 07:25:36 GMT',
            'datetime_stopped': None,
            'replicas': 2,
            'service_id': '6a769007-b18f-4271-b3db-8b60ed5fb545',
            'status': 'RUNNING',
            'trial': {'id': '0c1f9184-7b46-4aaf-a581-be62bf3f49bf',
                        'knobs': {'criterion': 'entropy', 'max_depth': 4},
                        'model_name': 'SkDt',
                        'score': 0.6686}}]}

Stopping a running inference job

Example:

client.stop_inference_job(app='fashion_mnist_app')

Downloading the trained model for a trial

After running a train job, you might want to download the trained model instance of a trial of the train job, instead of creating an inference job to make predictions. Subsequently, you’ll be able to make batch predictions locally with the trained model instance.

To do this, you must have the trial’s model class file already in your local filesystem, the dependencies of the model must have been installed separately, and the model class must have been imported and passed into this method.

To download the model class file, use the method rafiki.client.Client.download_model_file().

Example:

In shell,

# Install the dependencies of the `TfFeedForward` model
pip install tensorflow==1.12.0

In Python,

# Find the best trial for model `SkDt`
trials = [x for x in client.get_best_trials_of_train_job(app='fashion_mnist_app') if x.get('model_name') == 'TfFeedForward']
trial = trials[0]
trial_id = trial.get('id')

# Import the model class
from examples.models.image_classification.TfFeedForward import TfFeedForward

# Load an instance of the model with trial's parameters
model_inst = client.load_trial_model(trial_id, TfFeedForward)

# Make predictions with trained model instance associated with best trial
queries = [[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 0, 0, 7, 0, 37, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 27, 84, 11, 0,0, 0, 0, 0, 0, 119, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 88, 143, 110, 0, 0, 0, 0, 22, 93, 106, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 53, 129, 120, 147, 175, 157, 166, 135, 154, 168, 140, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 11, 137, 130, 128, 160, 176, 159, 167, 178, 149, 151, 144, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 2, 1, 0, 3, 0, 0, 115, 114, 106, 137, 168, 153, 156, 165, 167, 143, 157, 158, 11, 0], [0,0, 0, 0, 1, 0, 0, 0, 0, 0, 3, 0, 0, 89, 139, 90, 94, 153, 149, 131, 151, 169, 172, 143, 159, 169, 48, 0], [0, 0, 0, 0, 0, 0, 2, 4, 1, 0, 0, 0, 98, 136, 110, 109, 110, 162, 135, 144, 149, 159, 167, 144, 158, 169, 119, 0], [0, 0, 2, 2, 1, 2, 0, 0, 0, 0, 26, 108, 117, 99, 111, 117, 136, 156, 134, 154, 154, 156, 160, 141, 147, 156, 178, 0], [3, 0, 0, 0, 0, 0, 0, 21, 53, 92, 117, 111, 103, 115, 129, 134, 143, 154, 165, 170, 154, 151, 154, 143, 138, 150,165, 43], [0, 0, 23, 54, 65, 76, 85, 118, 128, 123, 111, 113, 118, 127, 125, 139, 133, 136, 160, 140, 155, 161, 144, 155, 172, 161, 189, 62], [0, 68, 94, 90, 111, 114, 111, 114, 115, 127, 135, 136, 143, 126, 127, 151, 154, 143, 148, 125, 162, 162, 144, 138, 153, 162, 196, 58], [70, 169, 129, 104, 98,100, 94, 97, 98, 102, 108, 106, 119, 120, 129, 149, 156, 167, 190, 190, 196, 198, 198, 187, 197, 189, 184, 36], [16, 126, 171, 188, 188, 184, 171, 153, 135, 120, 126, 127, 146, 185, 195, 209, 208, 255, 209, 177, 245, 252, 251, 251, 247, 220, 206, 49], [0, 0, 0, 12, 67, 106, 164, 185, 199, 210, 211, 210, 208, 190, 150, 82, 8, 0, 0, 0, 178, 208, 188, 175, 162, 158, 151, 11], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]
print(model_inst.predict(queries))