Web app API

The web application has an API of several methods:

[HEAD, OPTIONS, GET] /health - Web application status. Must answer with code 200.
[GET] /info - Information about the model, application, and its settings.
[POST] /predict - call model prediction.
[GET] /get-predict/{predict_id} - optional point. Obtaining prediction results.

API

/health

This is a simple web application state API method. Allows monitoring to ensure that the application is not frozen and can process requests.

HEAD

status_code: 200

GET, OPTIONS

status_code: 200
body: {"status": 200}

/info

This method returns information about the application and model, which allows the model client to know which version of the model and application it is working with, as well as some additional settings.

GET

status_code: 200

If debug=False is specified in the application parameters (by default False), returns only informational data:

{
  "model_info": {
    "name": "MyFirstMLupModel",
    "version": "1.0.0.0",
    "type": "sklearn",
    "columns": null
  },
  "web_app_info": {
    "version": "1.0.0.0"
  }
}

If debug=True is specified in the application parameters (the default is False_), it returns the entire config for the model and for the web application:

{
  "web_app_config": {
    "host": "0.0.0.0",
    "port": 8009,
    "web_app_version": "1.0.0.0",
    "column_validation": false,
    "custom_column_pydantic_model": null,
    "mode": "mlup.web.architecture.directly_to_predict.DirectlyToPredictArchitecture",
    "max_queue_size": 100,
    "ttl_predicted_data": 60,
    "ttl_client_wait": 30.0,
    "min_batch_len": 10,
    "batch_worker_timeout": 1.0,
    "is_long_predict": false,
    "show_docs": true,
    "debug": true,
    "throttling_max_requests": null,
    "throttling_max_request_len": null,
    "timeout_for_shutdown_daemon": 3.0,
    "item_id_col_name": "mlup_item_id"
    },
  "model_config": {
    "name": "MyFirstMLupModel",
    "version": "1.0.0.0",
    "type": "sklearn",
    "columns": null,
    "predict_method_name": "predict",
    "auto_detect_predict_params": true,
    "storage_type": "mlup.ml.storage.memory.MemoryStorage",
    "binarization_type": "auto",
    "use_thread_loop": true,
    "max_thread_loop_workers": null,
    "data_transformer_for_predict": "mlup.ml.data_transformers.numpy_data_transformer.NumpyDataTransformer",
    "data_transformer_for_predicted": "mlup.ml.data_transformers.numpy_data_transformer.NumpyDataTransformer",
    "dtype_for_predict": null
  }
}

This allows you to debug and check what configuration is currently in use.

/predict

The model's prediction itself. This is where the client sends data and receives the model's prediction. The API method itself may change depending on the configuration.

POST

If the parameter auto_detect_predict_params=False (default is True_), then the method accepts data in 1 parameter data_for_predict.

Request body:

{
  "data_for_predict": [
    // features_for_obj1: List,
    [1, 2, 3, 4, 5, 6],
    // features_for_obj2: List,
    [7, 6, 5, 4, 3, 2],
    // ...
  ]
}

If auto_detect_predict_params=True (default True), then mlup analyzes the model and its predict method. And converts the arguments of the predict method of the model into parameters accepted by this API method. (_See Description of the application life cycle).

For example, for scikit-learn models, the predict method has 1 argument X: Any. And then the API request will look like this:

Request body:

{
  "X": [
    // features_for_obj1: List,
    [1, 2, 3, 4, 5, 6],
    // features_for_obj2: List,
    [7, 6, 5, 4, 3, 2],
    // ...
  ]
}

If a model has several arguments, and they were all parsed, then they all end up in the API and can be passed into the model’s prediction method.

Request body:

{
  "data_for_predict": [
    // features_for_obj1: List,
    [1, 2, 3, 4, 5, 6],
    // features_for_obj2: List,
    [7, 6, 5, 4, 3, 2],
    // ...
  ],
  "check_something": false,
  // ...
}

If no problems occurred, the method returns the prediction results.

status_code: 200

{
  "predict_result": [
    // predict result for obj1
    [1, 2, 3],
    // predict result for obj2
    [4, 2, 1]
    // ...
  ]
}

In addition, the X-Predict-id header is always returned, even if an error occurs.

The architecture may change the response body returned by this method. For example, for the architecture mlup.web.architecture.worker_and_queue.WorkerAndQueueArchitecture and mlup.web.architecture.batching.BatchingSingleProcessArchitecture, if the is_long_predict=True configuration option is enabled, this method will return predict_id. You can use it to pick up the results later. (See Web app architectures).

{
  "predict_id": "7d9bea07-7505-45ec-a700-40417842025b"
}

/get-predict/{predict_id}

This API method is only used in the mlup.web.architecture.worker_and_queue.WorkerAndQueueArchitecture and mlup.web.architecture.batching.BatchingSingleProcessArchitecture architectures. If the model takes a long time to make a prediction, or the client wants to pick up the prediction results later, he can set the is_long_predict=True configuration parameter and go for the results later. (See Web app architectures).

GET

To get results, you need to send the required predict_id parameter to the URL.

If results for such a predict_id are found, they will be returned with status_code=200.

// results for predict_id = "7d9bea07-7505-45ec-a700-40417842025b"
{
  "predict_result": [
    // predict result for obj1
    [1, 2, 3],
    // predict result for obj2
    [4, 2, 1]
    // ...
  ]
}

And if the prediction results were not found, then after ttl_client_wait has elapsed from the start of the request time, the application will return status_code 408.

API Errors

All errors are in errors.

All error APIs have a common response format. Response format:

{
  "detail": [
    {
      "loc": [
        "string"
      ],
      "msg": "string",
      "type": "string"
    }
  ],
  "predict_id": "string"
}

422

Validation error. If such an error is returned, it means that the sent data was not validated and the request was not accepted.

Example answer:

{
  "detail": [
    {
      "loc": [
        "input_data",
        0
      ],
      "msg": "value is not a valid dict",
      "type": "type_error.dict"
    },
    {
      "loc": [
        "input_data",
        1,
        "col1"
      ],
      "msg": "field required",
      "type": "value_error.missing"
    }
  ],
  "predict_id": "15eb8236-02bf-4658-9c04-cc53f37afdc3"
}

429

This error occurs when the request is not throttled.

Example answer:

{
  "detail": [
    {
      "loc": [], 
      "msg": "Max requests in app. Please try again later.", 
      "type": "throttling_error"
    }
  ],
  "predict_id": "15eb8236-02bf-4658-9c04-cc53f37afdc3"
}

500

Error during model prediction or data transformation.

Example answer:

{
  "detail": [
    {
      "loc": [],
      "msg": "[ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Gather node. Name:'Gather_2' Status Message: indices element out of data bounds, idx=123124124124 must be within the inclusive range [-250002,250001]",
      "type": "predict_error"
    }
  ],
  "predict_id": "0c4b7afe-032e-442f-b265-7773d9d78580"
}

Свой API

As shown in the life_cycle.md section, you can completely customize a web application after it has been created.

Including, you can add new or even change existing API methods.