Machine Learning Manager

Introduction
Interact with your Dataplant’s Machine Learning Manager

API Reference

class forepaas.ml.Notebook

Bases: object

get_notebook(notebook_id)

get_notebook returns the notebook from a notebook id

Parameters:

notebook_id (str) – ID of the notebook to retrieve

Returns:

notebook configuration

Return type:

dict

list_notebooks()

list_notebooks returns a list of all notebooks

Returns:

all notebook configurations

Return type:

list

forepaas.ml.count_testing_dataset(pipeline_id=None)

Returns total number of entries for the test dataset. This is either rows for structured data, or files for unstructured.

Parameters:

pipeline_id (str) – test dataset pipeline id, defaults to None

Returns:

number of items in the test dataset

Return type:

int

forepaas.ml.count_train_dataset(pipeline_id=None)

Returns total number of entries for the train dataset. This is either rows for structured data, or files for unstructured.

Parameters:

pipeline_id (str) – train dataset pipeline id, defaults to None

Returns:

number of items in the train dataset

Return type:

int

forepaas.ml.create_model(model_configuration)

create_model adds a model to the MLM

Parameters:

model_configuration (dict) – configuration of the model

Returns:

request status response

Return type:

dict

forepaas.ml.create_pipeline(pipeline_configuration, params={})

create_pipeline will create a ML pipeline from a configuration file

Parameters:
  • pipeline_configuration (dict) – a pipeline configuration json

  • params (dict, optional) – additional arguments

forepaas.ml.delete_model(model)

delete_model removes a model from the MLM

Parameters:

model (str) – model id to remove

Returns:

request status response

Return type:

dict

forepaas.ml.format_scoring(scoring)

gets scoring func from a score configuration

Parameters:

scoring (dict) – scoring configuration

Returns:

scoring function

Return type:

function

forepaas.ml.get_estimator(model_configuration, path=None)

get_estimator loads a persistent model file from the data store and returns the model. Supported files are pkl, h5, and pth.

Parameters:
  • model_configuration

  • path (str, optional) – path to estimator

Returns:

fitted model

Return type:

fitted model

forepaas.ml.get_hyper_parameters(train=None)

Retrieves the name and value set during the hyper parameter tuning portion of the MLM pipeline, returning key value pairs

Parameters:

train (dict, optional) – train configuration, defaults to None

Returns:

hyper parameters

Return type:

dict

forepaas.ml.get_model(model)

get_model retrieves a model configuration from the MLM

Parameters:

model – model id

:type model:str :returns: request status response :rtype: dict

forepaas.ml.get_pipeline(pipeline_id, params={})

get_pipeline returns the ML_CONFIG for a specific ML pipeline. Returns the pipeline configuration.

Parameters:
  • pipeline_id (str) – ID of the pipeline to retrieve

  • params (dict, optional) – additional arguments to provide

Returns:

pipeline configuration

Return type:

dict

forepaas.ml.get_pipeline_dataset(pipeline_id, params={})

get_pipeline_dataset returns the dataset section of a ML pipeline configuration

Parameters:
  • pipeline_id (str) – pipeline id of the dataset to retrieve

  • params (dict, optional) – additional arguments

Returns:

dataset configuration

Return type:

dict

forepaas.ml.get_pipeline_train(pipeline_id, params={})

get_pipeline_train returns the training section of a ML pipeline configuration

Parameters:
  • pipeline_id (str) – pipeline id of the train to retrieve

  • params (dict, optional) – additional arguments to call

Returns:

training configuration

Return type:

dict

forepaas.ml.get_testing_dataset(pipeline_id=None)

returns the testing dataset for a pipeline. If pipeline is unstructured the returned values will be a dataframe of filepaths.

Parameters:

pipeline_id (str) – test dataset pipeline id, defaults to None

Returns:

test dataset

Return type:

pandas.DataFrame

forepaas.ml.get_train_dataset(pipeline_id=None)

returns the training dataset for a pipeline. If pipeline is unstructured the returned values will be a dataframe of filepaths.

Parameters:

pipeline_id (str) – train dataset pipeline id, defaults to None

Returns:

Train dataset

Return type:

pandas.DataFrame

forepaas.ml.get_train_scoring_function(train=None)

Retrieves the scoring function used by a specific train

Parameters:

train (dict, optional) – train configuration, defaults to None

Returns:

scoring function

Return type:

function

forepaas.ml.list_model(filter=None)

list_model lists all models in the MLM

Parameters:

filter (dict,optional) – filter for model

Returns:

request status response

Return type:

dict

forepaas.ml.list_pipelines(params={})

list_pipelines returns a list of ML configuration for all pipelines in the ML

Parameters:

params (dict, optional) – additional arguments

Returns:

pipeline configurations

Return type:

List[dict]

forepaas.ml.predict(data, model_id=None, consumer_id=None, framework='sklearn', input_type='json', uri=None, return_type='dict', headers=None, timeout=360)

Passes data into a deployed ML model based on consumer id returning predicted values from those features

Parameters:
  • data (pandas.Dataframe or dict) – Data to input to model

  • model_id (str) – model id

  • consumer_id (str) – consumer id

  • framework (str) – ML library framework used for model, supported: sklearn (default), keras,pytorch

  • input_type (str) – Data input type, defaults to json. supported: ‘json’,’file’

  • uri (str) – uri of ml

  • return_type (str) – how the returned data is structured, accepted values are ‘json’ or ‘dataframe’

  • headers (dict) – additional headers to be included in request

  • timeout (int) – time in seconds before request times out

Returns:

predictions

Return type:

Union[List[dict], pandas.DataFrame]

forepaas.ml.random_split(dataset, params={})

Random split

Parameters:
  • dataset (lists, numpy arrays, scipy-sparse matrices or pandas dataframes) – dataset to split

  • params (dict, optional) – additionnal parameters for splitting, defaults to {}

Returns:

Train and test dataset

Return type:

Tuple(dataset, dataset)

forepaas.ml.save_model(model, conf, execution_id=None)

save_model saves a model to the data_store

Parameters:
  • model (SciKit learn, or other .pkl framework, model) – model to be saved

  • conf (dict) – configuration of the pipeline

  • execution_id (string, optional) – execution id

Returns:

file information

Return type:

dict

forepaas.ml.save_model_file(model, model_path)

save_model_file saves a persistent model file based on framework to the data store. Currently supports pkl, h5, and pth.

Parameters:
  • model (persistent file) – model file

  • model_path (str) – full file path to save the model to

forepaas.ml.update_model(model, conf)

update_model updates an existing model

Parameters:

model (str) – model id to update

:param conf:configuration of model :type conf: dict :returns: request status response :rtype: dict

forepaas.ml.update_pipeline_dataset(pipeline_id, dataset_conf, params={})

update_pipeline_dataset updates the dataset section of a ML pipeline configuration

Parameters:
  • pipeline_id (str) – pipeline id of the dataset to update

  • dataset_conf (dict) – dataset configuration

  • params (dict, optional) – additional arguments

forepaas.ml.update_pipeline_train(pipeline_id, train_conf, params={})

update_pipeline_train uppdates the training section of a ML pipeline configuration

Parameters:
  • pipeline_id (str) – pipeline id of the train to update

  • train_conf (dict) – configuration of the training

  • params (dict, optional) – additional arguments