Machine Learning Manager

Introduction

Interact with your Dataplant’s Machine Learning Manager

API Reference

class forepaas.ml.Notebook

Bases: object

get_notebook(notebook_id)

get_notebook returns the notebook from a notebook id

Parameters:: notebook_id (str) – ID of the notebook to retrieve
Returns:: notebook configuration
Return type:: dict

list_notebooks()

list_notebooks returns a list of all notebooks

Returns:: all notebook configurations
Return type:: list

forepaas.ml.count_testing_dataset(pipeline_id=None)

Returns total number of entries for the test dataset. This is either rows for structured data, or files for unstructured.

Parameters:: pipeline_id (str) – test dataset pipeline id, defaults to None
Returns:: number of items in the test dataset
Return type:: int

forepaas.ml.count_train_dataset(pipeline_id=None)

Returns total number of entries for the train dataset. This is either rows for structured data, or files for unstructured.

Parameters:: pipeline_id (str) – train dataset pipeline id, defaults to None
Returns:: number of items in the train dataset
Return type:: int

forepaas.ml.create_model(model_configuration)

create_model adds a model to the MLM

Parameters:: model_configuration (dict) – configuration of the model
Returns:: request status response
Return type:: dict

forepaas.ml.create_pipeline(pipeline_configuration, params={})

create_pipeline will create a ML pipeline from a configuration file

Parameters:

pipeline_configuration (dict) – a pipeline configuration json
params (dict, optional) – additional arguments

forepaas.ml.delete_model(model)

delete_model removes a model from the MLM

Parameters:: model (str) – model id to remove
Returns:: request status response
Return type:: dict

forepaas.ml.format_scoring(scoring)

gets scoring func from a score configuration

Parameters:: scoring (dict) – scoring configuration
Returns:: scoring function
Return type:: function

forepaas.ml.get_estimator(model_configuration, path=None)

get_estimator loads a persistent model file from the data store and returns the model. Supported files are pkl, h5, and pth.

Parameters:

model_configuration –
path (str, optional) – path to estimator

Returns:

fitted model

Return type:

fitted model

forepaas.ml.get_hyper_parameters(train=None)

Retrieves the name and value set during the hyper parameter tuning portion of the MLM pipeline, returning key value pairs

Parameters:: train (dict, optional) – train configuration, defaults to None
Returns:: hyper parameters
Return type:: dict

forepaas.ml.get_model(model)

get_model retrieves a model configuration from the MLM

Parameters:: model – model id

:type model:str :returns: request status response :rtype: dict

forepaas.ml.get_pipeline(pipeline_id, params={})

get_pipeline returns the ML_CONFIG for a specific ML pipeline. Returns the pipeline configuration.

Parameters:

pipeline_id (str) – ID of the pipeline to retrieve
params (dict, optional) – additional arguments to provide

Returns:

pipeline configuration

Return type:

dict

forepaas.ml.get_pipeline_dataset(pipeline_id, params={})

get_pipeline_dataset returns the dataset section of a ML pipeline configuration

Parameters:

pipeline_id (str) – pipeline id of the dataset to retrieve
params (dict, optional) – additional arguments

Returns:

dataset configuration

Return type:

dict

forepaas.ml.get_pipeline_train(pipeline_id, params={})

get_pipeline_train returns the training section of a ML pipeline configuration

Parameters:

pipeline_id (str) – pipeline id of the train to retrieve
params (dict, optional) – additional arguments to call

Returns:

training configuration

Return type:

dict

forepaas.ml.get_testing_dataset(pipeline_id=None)

returns the testing dataset for a pipeline. If pipeline is unstructured the returned values will be a dataframe of filepaths.

Parameters:: pipeline_id (str) – test dataset pipeline id, defaults to None
Returns:: test dataset
Return type:: pandas.DataFrame

forepaas.ml.get_train_dataset(pipeline_id=None)

returns the training dataset for a pipeline. If pipeline is unstructured the returned values will be a dataframe of filepaths.

Parameters:: pipeline_id (str) – train dataset pipeline id, defaults to None
Returns:: Train dataset
Return type:: pandas.DataFrame

forepaas.ml.get_train_scoring_function(train=None)

Retrieves the scoring function used by a specific train

Parameters:: train (dict, optional) – train configuration, defaults to None
Returns:: scoring function
Return type:: function

forepaas.ml.list_model(filter=None)

list_model lists all models in the MLM

Parameters:: filter (dict,optional) – filter for model
Returns:: request status response
Return type:: dict

forepaas.ml.list_pipelines(params={})

list_pipelines returns a list of ML configuration for all pipelines in the ML

Parameters:: params (dict, optional) – additional arguments
Returns:: pipeline configurations
Return type:: List[dict]

forepaas.ml.predict(data, model_id=None, consumer_id=None, framework='sklearn', input_type='json', uri=None, return_type='dict', headers=None, timeout=360)

Passes data into a deployed ML model based on consumer id returning predicted values from those features

Parameters:

data (pandas.Dataframe or dict) – Data to input to model
model_id (str) – model id
consumer_id (str) – consumer id
framework (str) – ML library framework used for model, supported: sklearn (default), keras,pytorch
input_type (str) – Data input type, defaults to json. supported: ‘json’,’file’
uri (str) – uri of ml
return_type (str) – how the returned data is structured, accepted values are ‘json’ or ‘dataframe’
headers (dict) – additional headers to be included in request
timeout (int) – time in seconds before request times out

Returns:

predictions

Return type:

Union[List[dict], pandas.DataFrame]

forepaas.ml.random_split(dataset, params={})

Random split

Parameters:

dataset (lists, numpy arrays, scipy-sparse matrices or pandas dataframes) – dataset to split
params (dict, optional) – additionnal parameters for splitting, defaults to {}

Returns:

Train and test dataset

Return type:

Tuple(dataset, dataset)

forepaas.ml.save_model(model, conf, execution_id=None)

save_model saves a model to the data_store

Parameters:

model (SciKit learn, or other .pkl framework, model) – model to be saved
conf (dict) – configuration of the pipeline
execution_id (string, optional) – execution id

Returns:

file information

Return type:

dict

forepaas.ml.save_model_file(model, model_path)

save_model_file saves a persistent model file based on framework to the data store. Currently supports pkl, h5, and pth.

Parameters:

model (persistent file) – model file
model_path (str) – full file path to save the model to

forepaas.ml.update_model(model, conf)

update_model updates an existing model

Parameters:: model (str) – model id to update

:param conf:configuration of model :type conf: dict :returns: request status response :rtype: dict

forepaas.ml.update_pipeline_dataset(pipeline_id, dataset_conf, params={})

update_pipeline_dataset updates the dataset section of a ML pipeline configuration

Parameters:

pipeline_id (str) – pipeline id of the dataset to update
dataset_conf (dict) – dataset configuration
params (dict, optional) – additional arguments

forepaas.ml.update_pipeline_train(pipeline_id, train_conf, params={})

update_pipeline_train uppdates the training section of a ML pipeline configuration

Parameters:

pipeline_id (str) – pipeline id of the train to update
train_conf (dict) – configuration of the training
params (dict, optional) – additional arguments