xboost

Version:: 1.5.0
Category:: lib
Cluster:: Loki

Author / Distributor

Description

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework and supports classification, regression, and ranking.

Version 1.5.0 offers:

GPU acceleration with CUDA 10.2
Native support for NumPy, Pandas, DMatrix
Improved JSON-based model format
Experimental support for categorical data
Integration with scikit-learn and cross-validation utilities

This module uses Python 3.7, CUDA 10.2, and GCC 8 for compatibility with legacy GPU-based workflows on Loki.

Documentation

usage: xgboost.train(params, dtrain, num_boost_round=10, evals=(), ...)

Common Parameters:
  booster: [gbtree, gblinear, dart]
  objective: [reg:squarederror, binary:logistic, multi:softmax, ...]
  eval_metric: [rmse, mae, logloss, error, merror, ...]
  tree_method: [auto, exact, approx, hist, gpu_hist]
  max_depth: Maximum depth of tree
  eta: Step size shrinkage
  subsample: Row sampling rate
  colsample_bytree: Column sampling rate per tree

CLI alternative:
  $ xgboost train.conf

For Python help:
  >>> import xgboost as xgb
  >>> help(xgb)

Examples/Usage

Load the module:

$ module load xgboost-py37-cuda10.2-gcc8/1.5.0

Python usage:

import xgboost as xgb
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)

dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

params = {
    "objective": "binary:logistic",
    "tree_method": "gpu_hist",
    "eval_metric": "logloss"
}

bst = xgb.train(params, dtrain, num_boost_round=50, evals=[(dtest, "eval")])

Unload the module:

$ module unload xgboost-py37-cuda10.2-gcc8/1.5.0

Installation

Source code is obtained from XGBoost