Supported Dataset Types
====================================================================
Dataset URIs must have the protocols of either ``http`` or ``https``.
.. note::
    
    You can alternatively use relative (e.g. ``data/dataset.zip``) filepaths as dataset URIs, 
    only if you have deployed the full Rafiki stack on your own machine. This filepath is relative to
    the root of the project directory.
.. note::
    Refer to `./examples/datasets/ `_ for examples on pre-processing 
    common dataset formats to conform to the Rafiki's own dataset formats.
.. _`dataset-type:IMAGE_FILES`:
IMAGE_FILES
--------------------------------------------------------------------
The dataset file must be of the ``.zip`` archive format with a ``images.csv`` at the root of the directory.
The ``images.csv`` should be of a `.CSV `_
format with 2 columns of ``path`` and ``class``.
For each row,
    ``path`` should be a file path to a ``.png``, ``.jpg`` or ``.jpeg`` image file within the archive, relative to the root of the directory.
    ``class`` should be an integer from ``0`` to ``k - 1``, where ``k`` is the number of classes in the classification of images.
An example of ``images.csv`` follows:
.. code-block:: text
    path,class
    image-0-of-class-0.png,0
    image-1-of-class-0.png,0
    ...
    image-0-of-class-1.png,1
    ...
    image-99-of-class-9.png,9
    
.. _`dataset-type:CORPUS`:
CORPUS
--------------------------------------------------------------------
The dataset file must be of the ``.zip`` archive format with a ``corpus.tsv`` at the root of the directory.
The ``corpus.tsv`` should be of a `.TSV `_ 
format with columns of ``token`` and ``N`` other variable column names (*tag columns*).
For each row,
    ``token`` should be a string, a token (e.g. word) in the corpus. 
    These tokens should appear in the order as it is in the text of the corpus.
    To delimit sentences, ``token`` can be take the value of ``\n``.
    The other ``N`` columns should be integers from ``0`` to ``k_i - 1``, where ``k_i`` is the number of classes for each column.
    These tag columns describe the corresponding token as part of the text of the corpus, and depends on the task.
An example of ``corpus.tsv`` for POS tagging follows:
.. code-block:: text
    token       tag
    Two         3
    leading     2
    ...
    line-item   1
    veto        5
    .           4
    \n          0
    Professors  6
    Philip      6
    ...
    previous    1
    presidents  8   
    .           4
    \n          0