Skip to content

ml #

VSL Machine Learning (vsl.ml)

VSL aims to provide a robust set of tools for scientific computing with an emphasis on performance and ease of use. In the vsl.ml module, some machine learning models are designed as observers of data, meaning they re-train automatically when data changes, while others do not require this functionality.

Key Features

  • Observers of Data: Some machine learning models in VSL act as observers, re-training automatically when data changes.
  • High Performance: Leverages V’s performance optimizations and can integrate with C and Fortran libraries like Open BLAS and LAPACK.
  • Versatile Algorithms: Supports a variety of machine learning algorithms and models.

Usage

Loading Data

The Data struct in vsl.ml is designed to hold data in matrix format for machine learning tasks. Here's a brief overview of how to use it:

Creating a Data Object

You can create a Data object using the following methods:

  • Data.new: Creates a new Data object with specified dimensions.
  • Data.from_raw_x: Creates a Data object from raw x values (without y values).
  • Data.from_raw_xy: Creates a Data object from raw x and y values combined in a single matrix.
  • Data.from_raw_xy_sep: Creates a Data object from separate x and y raw values.

Data Methods

The Data struct has several key methods to manage and manipulate data:

  • set(x, y): Sets the x matrix and y vector and notifies observers.
  • set_y(y): Sets the y vector and notifies observers.
  • set_x(x): Sets the x matrix and notifies observers.
  • split(ratio): Splits the data into two parts based on the given ratio.
  • clone(): Returns a deep copy of the Data object without observers.
  • clone_with_same_x(): Returns a deep copy of the Data object but shares the same x reference.
  • add_observer(obs): Adds an observer to the data object.
  • notify_update(): Notifies observers of data changes.

Stat Observer

The Stat struct is an observer of Data, providing statistical analysis of the data it observes. It automatically updates its statistics when the underlying data changes.

Observer Models

The following machine learning models in VSL are compatible with the Observer pattern. This means they can observe data changes and automatically update themselves.

K-Means Clustering

K-Means Clustering is used for unsupervised learning to group data points into clusters. As an observer model, it re-trains automatically when the data changes, which is useful for dynamic datasets that require continuous updates.

K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is used for classification tasks where the target variable is categorical. As an observer model, it re-trains automatically when the data changes, which is beneficial for datasets that are frequently updated.

Logistic Regression

Logistic Regression is used for binary classification tasks. As an observer model, it automatically updates when data changes, recalculating internal statistics and preparing for retraining.

Support Vector Machine (SVM)

Support Vector Machine (SVM) is used for binary classification with support for non-linear decision boundaries through kernel functions. As an observer model, it marks itself for retraining when data changes.

Decision Tree

Decision Tree can handle both classification and regression tasks. As an observer model, it marks itself for retraining when data changes, allowing the tree to be rebuilt with new data.

Random Forest

Random Forest is an ensemble method combining multiple decision trees. As an observer model, it marks itself for retraining when data changes, allowing the entire forest to be rebuilt.

Non-Observer Models

The following machine learning models in VSL do not require the observer pattern and are trained once on a dataset without continuous updates.

Linear Regression

Linear Regression is used for predicting a continuous target variable based on one or more predictor variables. It is typically trained once on a dataset and used to make predictions without requiring continuous updates. Hence, it is not implemented as an observer model.

fn safe_log_1p_exp #

fn safe_log_1p_exp(z f64) f64

safe_log_1p_exp computes log(1+exp(-z)) safely by checking if exp(-z) is >> 1, thus returning -z. This is the case when z<0 and |z| is too large

fn CriterionType.from #

fn CriterionType.from[W](input W) !CriterionType

fn Data.from_raw_x #

fn Data.from_raw_x[T](xraw [][]T) !&Data[T]

Data.from_raw_x returns a new object with data set from raw x values Input: xraw -- [nb_samples][nb_features] table with x values (NO y values) Output: new object

fn Data.from_raw_xy #

fn Data.from_raw_xy[T](xyraw [][]T) !&Data[T]

Data.from_raw_xy returns a new object with data set from raw Xy values Input: Xyraw -- [nb_samples][nb_features+1] table with x and y raw values, where the last column contains y-values Output: new object

fn Data.from_raw_xy_sep #

fn Data.from_raw_xy_sep[T](xraw [][]T, yraw []T) !&Data[T]

Data.from_raw_xy_sep accepts two parameters: xraw [][]T and yraw []T. It acts similarly to Data.from_raw_xy, but instead of using the last column of xraw as the y data, it uses yraw instead.

fn Data.new #

fn Data.new[T](nb_samples int, nb_features int, use_y bool, allocate bool) !&Data[T]

Data.new returns a new object to hold ML data Input: nb_samples -- number of data samples (rows in x) nb_features -- number of features (columns in x) use_y -- use y data vector allocate -- allocates x (and y); otherwise, x and y must be set using set() method Output: new object

fn DecisionTree.new #

fn DecisionTree.new(mut data Data[f64], name string) &DecisionTree

DecisionTree.new returns a new DecisionTree object Input: data -- x,y data name -- unique name of this (observer) object Output: new DecisionTree object

fn ElasticNet.new #

fn ElasticNet.new(mut data Data[f64], name string, alpha f64, l1_ratio f64) &ElasticNet

ElasticNet.new creates a new ElasticNet regression model

fn KNN.new #

fn KNN.new(mut data Data[f64], name string) !&KNN

KNN.new accepts a vml.ml.Data parameter called data, that will be used to predict values with KNN.predict. You can use the following piece of code to make your life easier:mut knn := KNN.new(mut Data.from_raw_xy_sep([[0.0, 0.0], [10.0, 10.0]], [0.0, 1.0]))If you predict with knn.predict(1, [9.0, 9.0]), it should return 1.0 as it is the closestto [10.0, 10.0] (which is class 1.0).

fn KernelType.from #

fn KernelType.from[W](input W) !KernelType

fn Kmeans.new #

fn Kmeans.new(mut data Data[f64], nb_classes int, name string) &Kmeans

Kmeans.new returns a new K-means model

fn Lasso.new #

fn Lasso.new(mut data Data[f64], name string, alpha f64) &Lasso

Lasso.new creates a new Lasso regression model

fn LinReg.new #

fn LinReg.new(mut data Data[f64], name string) &LinReg

LinReg.new returns a new LinReg object Input: data -- x,y data name -- unique name of this (observer) object

fn LogReg.new #

fn LogReg.new(mut data Data[f64], name string) &LogReg

LogReg.new returns a new LogReg object Input: data -- x,y data name -- unique name of this (observer) object Output: new LogReg object

fn ParamsReg.new #

fn ParamsReg.new[T](nb_features int) &ParamsReg[T]

ParamsReg.new returns a new object to hold regression parameters

fn RandomForest.new #

fn RandomForest.new(mut data Data[f64], name string) &RandomForest

RandomForest.new returns a new RandomForest object Input: data -- x,y data name -- unique name of this (observer) object Output: new RandomForest object

fn RegularizationType.from #

fn RegularizationType.from[W](input W) !RegularizationType

fn Ridge.new #

fn Ridge.new(mut data Data[f64], name string, alpha f64) &Ridge

Ridge.new creates a new Ridge regression model

fn SVM.new #

fn SVM.new(mut data Data[f64], name string) &SVM

SVM.new returns a new SVM object Input: data -- x,y data (y should be -1.0 or 1.0 for SVM) name -- unique name of this (observer) object Output: new SVM object

fn Stat.from_data #

fn Stat.from_data[T](mut data Data[T], name string) &Stat[T]

stat returns a new Stat object

fn (Data[T]) set #

fn (mut o Data[T]) set(x &la.Matrix[T], y []T) !

set sets x matrix and y vector [optional] and notify observers Input: x -- x values y -- y values [optional]

fn (Data[T]) set_y #

fn (mut o Data[T]) set_y(y []T) !

fn (Data[T]) set_x #

fn (mut o Data[T]) set_x(x &la.Matrix[T]) !

fn (Data[T]) clone #

fn (o &Data[T]) clone() !&Data[T]

clone returns a deep copy of this object removing the observers

fn (Data[T]) clone_with_same_x #

fn (o &Data[T]) clone_with_same_x() !&Data[T]

clone_with_same_x returns a deep copy of this object, but with the same reference to x removing the observers

fn (Data[T]) add_observer #

fn (mut o Data[T]) add_observer(obs util.Observer)

add_observer adds an object to the list of interested observers

fn (Data[T]) notify_update #

fn (mut o Data[T]) notify_update()

notify_update notifies observers of updates

fn (Data[T]) split #

fn (o &Data[T]) split(ratio f64) !(&Data[T], &Data[T])

split returns a new object with data split into two parts Input: ratio -- ratio of samples to be put in the first part Output: new object

fn (ParamsReg[T]) init #

fn (mut o ParamsReg[T]) init(nb_features int)

init initializes ParamsReg with nb_features (number of features)

fn (ParamsReg[T]) backup #

fn (mut o ParamsReg[T]) backup()

backup creates an internal copy of parameters

fn (ParamsReg[T]) restore #

fn (mut o ParamsReg[T]) restore(skip_notification bool)

restore restores an internal copy of parameters and notifies observers

fn (ParamsReg[T]) set_params #

fn (mut o ParamsReg[T]) set_params(theta []T, b T)

set_params sets theta and b and notifies observers

fn (ParamsReg[T]) set_param #

fn (mut o ParamsReg[T]) set_param(i int, value T)

set_param sets either theta or b (use negative indices for b). Notifies observers i -- index of theta or -1 for bias

fn (ParamsReg[T]) get_param #

fn (o &ParamsReg[T]) get_param(i int) T

get_param returns either theta or b (use negative indices for b) i -- index of theta or -1 for bias

fn (ParamsReg[T]) set_thetas #

fn (mut o ParamsReg[T]) set_thetas(theta []T)

set_thetas sets the whole vector theta and notifies observers

fn (ParamsReg[T]) get_thetas #

fn (o &ParamsReg[T]) get_thetas() []T

get_thetas gets a copy of theta

fn (ParamsReg[T]) access_thetas #

fn (o &ParamsReg[T]) access_thetas() []T

access_thetas returns access (slice) to theta

fn (ParamsReg[T]) access_bias #

fn (o &ParamsReg[T]) access_bias() &T

access_bias returns access (pointer) to b

fn (ParamsReg[T]) set_theta #

fn (mut o ParamsReg[T]) set_theta(i int, thetai T)

set_theta sets one component of theta and notifies observers

fn (ParamsReg[T]) get_theta #

fn (o &ParamsReg[T]) get_theta(i int) T

get_theta returns the value of theta[i]

fn (ParamsReg[T]) set_bias #

fn (mut o ParamsReg[T]) set_bias(b T)

set_bias sets b and notifies observers

fn (ParamsReg[T]) get_bias #

fn (o &ParamsReg[T]) get_bias() T

get_bias gets a copy of b

fn (ParamsReg[T]) set_lambda #

fn (mut o ParamsReg[T]) set_lambda(lambda T)

set_lambda sets lambda and notifies observers

fn (ParamsReg[T]) get_lambda #

fn (o &ParamsReg[T]) get_lambda() T

get_lambda gets a copy of lambda

fn (ParamsReg[T]) set_degree #

fn (mut o ParamsReg[T]) set_degree(p int)

set_degree sets p and notifies observers

fn (ParamsReg[T]) get_degree #

fn (o &ParamsReg[T]) get_degree() int

get_degree gets a copy of p

fn (ParamsReg[T]) add_observer #

fn (mut o ParamsReg[T]) add_observer(obs util.Observer)

add_observer adds an object to the list of interested observers

fn (ParamsReg[T]) notify_update #

fn (mut o ParamsReg[T]) notify_update()

notify_update notifies observers of updates

fn (Stat[T]) name #

fn (o &Stat[T]) name() string

name returns the name of this stat object (thus defining the Observer interface)

fn (Stat[T]) update #

fn (mut o Stat[T]) update()

update compute statistics for given data (an Observer of Data)

fn (Stat[T]) sum_vars #

fn (mut o Stat[T]) sum_vars() ([]T, T)

sum_vars computes the sums along the columns of X and y Output: t -- scalar t = oᵀy sum of columns of the y vector: t = Σ_i^m o_i y_i s -- vector s = Xᵀo sum of columns of the X matrix: s_j = Σ_i^m o_i X_ij [n_features]

fn (Stat[T]) copy_into #

fn (o &Stat[T]) copy_into(mut p Stat[T])

copy_into copies stat into p

fn (Stat[T]) str #

fn (o &Stat[T]) str() string

str is a custom str function for observers to avoid printing data

enum CriterionType #

enum CriterionType {
	gini    // Gini impurity (for classification)
	entropy // Information gain / entropy (for classification)
	mse     // Mean Squared Error (for regression)
}

CriterionType represents the splitting criterion

enum KernelType #

enum KernelType {
	linear
	polynomial
	rbf
}

KernelType represents the type of kernel function

enum RegularizationType #

enum RegularizationType {
	none       // No regularization
	l1         // Lasso (L1)
	l2         // Ridge (L2)
	elasticnet // ElasticNet (L1 + L2)
}

RegularizationType specifies the type of regularization

struct Data #

@[heap]
struct Data[T] {
pub mut:
	observable  util.Observable = util.Observable{}
	nb_samples  int // number of data points (samples). number of rows in x and y
	nb_features int // number of features. number of columns in x
	x           &la.Matrix[T] = unsafe { nil } // [nb_samples][nb_features] x values
	y           []T // [nb_samples] y values [optional]
}

struct DecisionTree #

@[heap]
struct DecisionTree {
mut:
	name              string // name of this "observer"
	data              &Data[f64]    = unsafe { nil } // x-y data
	stat              &Stat[f64]    = unsafe { nil } // statistics
	root              &TreeNode     = unsafe { nil } // Root of the tree
	max_depth         int           = -1             // Maximum depth (-1 for unlimited)
	min_samples_split int           = 2              // Minimum samples to split
	min_samples_leaf  int           = 1              // Minimum samples in leaf
	criterion         CriterionType = .gini          // Splitting criterion
	trained           bool
	is_regression     bool // Whether this is a regression task
}

DecisionTree implements a decision tree classifier/regressor (Observer of Data)

fn (DecisionTree) name #

fn (o &DecisionTree) name() string

name returns the name of this DecisionTree object (thus defining the Observer interface)

fn (DecisionTree) update #

fn (mut o DecisionTree) update()

update perform updates after data has been changed (as an Observer)

fn (DecisionTree) set_max_depth #

fn (mut o DecisionTree) set_max_depth(depth int)

set_max_depth sets the maximum depth of the tree

fn (DecisionTree) set_min_samples_split #

fn (mut o DecisionTree) set_min_samples_split(min_samples int)

set_min_samples_split sets the minimum samples required to split

fn (DecisionTree) set_min_samples_leaf #

fn (mut o DecisionTree) set_min_samples_leaf(min_samples int)

set_min_samples_leaf sets the minimum samples in a leaf node

fn (DecisionTree) set_criterion #

fn (mut o DecisionTree) set_criterion(criterion CriterionType)

set_criterion sets the splitting criterion

fn (DecisionTree) train #

fn (mut o DecisionTree) train()

train builds the decision tree

fn (DecisionTree) predict #

fn (o &DecisionTree) predict(x []f64) f64

predict returns the predicted value for a single sample

fn (DecisionTree) predict_batch #

fn (o &DecisionTree) predict_batch(x [][]f64) []f64

predict_batch returns predictions for multiple samples

fn (DecisionTree) str #

fn (o &DecisionTree) str() string

str is a custom str function for observers to avoid printing data

fn (DecisionTree) get_plotter #

fn (o &DecisionTree) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting (2D only)

struct ElasticNet #

@[heap]
struct ElasticNet {
mut:
	name   string
	data   &Data[f64] = unsafe { nil }
	fitted bool
pub mut:
	coef_      []f64
	intercept_ f64
	alpha      f64        = 1.0 // regularization strength
	l1_ratio   f64        = 0.5 // ratio of L1 vs L2 (1.0 = pure Lasso, 0.0 = pure Ridge)
	max_iter   int        = 1000
	tol        f64        = 1e-4
	stat       &Stat[f64] = unsafe { nil }
}

ElasticNet implements linear regression with combined L1 and L2 regularization

fn (ElasticNet) name #

fn (o &ElasticNet) name() string

name returns the model name

fn (ElasticNet) predict #

fn (o &ElasticNet) predict(x []f64) f64

predict returns the prediction for a feature vector

fn (ElasticNet) train #

fn (mut o ElasticNet) train()

train fits the ElasticNet model using coordinate descent

fn (ElasticNet) get_plotter #

fn (o &ElasticNet) get_plotter() &plot.Plot

get_plotter returns a plot for the model

struct KNN #

@[heap]
struct KNN {
mut:
	name    string // name of this "observer"
	data    &Data[f64] = unsafe { nil }
	weights map[f64]f64 // weights[class] = weight
pub mut:
	neighbors []Neighbor
	trained   bool
}

KNN is the struct defining a K-Nearest Neighbors classifier.

fn (KNN) name #

fn (o &KNN) name() string

name returns the name of this KNN object (thus defining the Observer interface)

fn (KNN) set_weights #

fn (mut knn KNN) set_weights(weights map[f64]f64) !

set_weights will set the weights for the KNN. They default to 1.0 for every class when this function is not called.

fn (KNN) update #

fn (mut knn KNN) update()

update perform updates after data has been changed (as an Observer)

fn (KNN) train #

fn (mut knn KNN) train()

train computes the neighbors and weights during training

fn (KNN) predict #

fn (mut knn KNN) predict(config PredictConfig) !f64

predict will find the k points nearest to the specified to_pred. If the value of k results in a draw - that is, a tie when determining the most frequent class in those k nearest neighbors (example: class 1 has 10 occurrences, class 2 has 5 and class 3 has 10) -, k will be decreased until there are no more ties. The worst case scenario is k ending up as 1. Also, it makes sure that if we do have a tie when k = 1, we select the first closest neighbor.

fn (KNN) str #

fn (o &KNN) str() string

str is a custom str function for observers to avoid printing data

fn (KNN) get_plotter #

fn (o &KNN) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct with the data needed to plot the KNN model.

struct Kmeans #

@[heap]
struct Kmeans {
mut:
	name       string // name of this "observer"
	data       &Data[f64] = unsafe { nil } // x data
	stat       &Stat[f64] = unsafe { nil } // statistics about x (data)
	nb_classes int // expected number of classes
	bins       &gm.Bins = unsafe { nil } // "bins" to speed up searching for data points given their coordinates (2D or 3D only at the moment)
	nb_iter    int // number of iterations
pub mut:
	classes    []int   // [nb_samples] indices of classes of each sample
	centroids  [][]f64 // [nb_classes][nb_features] coordinates of centroids
	nb_members []int   // [nb_classes] number of members in each class
}

Kmeans implements the K-means model (Observer of Data)

fn (Kmeans) name #

fn (o &Kmeans) name() string

name returns the name of this Kmeans object (thus defining the Observer interface)

fn (Kmeans) update #

fn (mut o Kmeans) update()

update perform updates after data has been changed (as an Observer)

fn (Kmeans) nb_classes #

fn (o &Kmeans) nb_classes() int

nb_classes returns the number of classes

fn (Kmeans) set_centroids #

fn (mut o Kmeans) set_centroids(xc [][]f64)

set_centroids sets centroids; e.g. trial centroids xc -- [nb_class][nb_features]

fn (Kmeans) find_closest_centroids #

fn (mut o Kmeans) find_closest_centroids()

find_closest_centroids finds closest centroids to each sample

fn (Kmeans) compute_centroids #

fn (mut o Kmeans) compute_centroids()

compute_centroids update centroids based on new classes information (from find_closest_centroids)

fn (Kmeans) train #

fn (mut o Kmeans) train(config TrainConfig)

train trains model

fn (Kmeans) str #

fn (o &Kmeans) str() string

str is a custom str function for observers to avoid printing data

fn (Kmeans) get_plotter #

fn (o &Kmeans) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting

struct Lasso #

@[heap]
struct Lasso {
mut:
	name   string // name of this model
	data   &Data[f64] = unsafe { nil } // x-y data
	fitted bool
pub mut:
	coef_      []f64 // learned coefficients (theta)
	intercept_ f64   // learned intercept (bias)
	alpha      f64        = 1.0            // regularization strength
	max_iter   int        = 1000           // maximum iterations
	tol        f64        = 1e-4           // convergence tolerance
	stat       &Stat[f64] = unsafe { nil } // statistics
}

Lasso implements a linear regression model with L1 regularization Uses coordinate descent algorithm for optimization

fn (Lasso) name #

fn (o &Lasso) name() string

name returns the model name

fn (Lasso) predict #

fn (o &Lasso) predict(x []f64) f64

predict returns the prediction for a feature vector

fn (Lasso) train #

fn (mut o Lasso) train()

train fits the Lasso model using coordinate descent

fn (Lasso) get_plotter #

fn (o &Lasso) get_plotter() &plot.Plot

get_plotter returns a plot for the model

struct LinReg #

@[heap]
struct LinReg {
mut:
	// main
	name string // name of this "observer"
	data &Data[f64] = unsafe { nil } // x-y data
	// workspace
	e []f64 // vector e = b⋅o + x⋅theta - y [nb_samples]
pub mut:
	stat   &Stat[f64]      = unsafe { nil } // statistics
	params &ParamsReg[f64] = unsafe { nil }
}

LinReg implements a linear regression model

fn (LinReg) name #

fn (o &LinReg) name() string

name returns the name of this LinReg object (thus defining the Observer interface)

fn (LinReg) predict #

fn (o &LinReg) predict(x []f64) f64

predict returns the model evaluation @ {x;theta,b} Input: x -- vector of features Output: y -- model prediction y(x)

fn (LinReg) cost #

fn (mut o LinReg) cost() f64

cost returns the cost c(x;theta,b) Input: data -- x,y data params -- theta and b x -- vector of features Output: c -- total cost (model error)

fn (LinReg) gradients #

fn (mut o LinReg) gradients() ([]f64, f64)

gradients returns ∂C/∂theta and ∂C/∂b Output: dcdtheta -- ∂C/∂theta dcdb -- ∂C/∂b

fn (LinReg) train #

fn (mut o LinReg) train()

train finds theta and b using closed-form solution Input: data -- x,y data Output: params -- theta and b

fn (LinReg) calce #

fn (mut o LinReg) calce()

calce calculates e vector (save into o.e) Output: e = b⋅o + x⋅theta - y

fn (LinReg) str #

fn (o &LinReg) str() string

str is a custom str function for observers to avoid printing data

fn (LinReg) get_plotter #

fn (o &LinReg) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting the data and the linear regression model

struct LogReg #

@[heap]
struct LogReg {
mut:
	// main
	name string // name of this "observer"
	data &Data[f64] = unsafe { nil } // x-y data
	// workspace
	ybar []f64 // bar{y}: yb[i] = (1 - y[i]) / m
	l    []f64 // vector l = b⋅o + x⋅θ [nb_samples]
	hmy  []f64 // vector e = h(l) - y [nb_samples]
pub mut:
	params &ParamsReg[f64] = unsafe { nil } // parameters: θ, b, λ
	stat   &Stat[f64]      = unsafe { nil } // statistics
}

LogReg implements a logistic regression model (Observer of Data)

fn (LogReg) name #

fn (o &LogReg) name() string

name returns the name of this LogReg object (thus defining the Observer interface)

fn (LogReg) update #

fn (mut o LogReg) update()

update perform updates after data has been changed (as an Observer)

fn (LogReg) predict #

fn (o &LogReg) predict(x []f64) f64

predict returns the model evaluation @ {x;θ,b} Input: x -- vector of features Output: y -- model prediction y(x) (probability between 0 and 1)

fn (LogReg) cost #

fn (mut o LogReg) cost() f64

cost returns the cost c(x;θ,b) Output: c -- total cost (model error)

fn (LogReg) allocate_gradient #

fn (o &LogReg) allocate_gradient() []f64

allocate_gradient allocate object to compute gradients

fn (LogReg) gradients #

fn (mut o LogReg) gradients() ([]f64, f64)

gradients returns ∂C/∂θ and ∂C/∂b Output: dcdtheta -- ∂C/∂θ dcdb -- ∂C/∂b

fn (LogReg) allocate_hessian #

fn (o &LogReg) allocate_hessian() ([]f64, []f64, &la.Matrix[f64], &la.Matrix[f64])

allocate_hessian allocate objects to compute hessian

fn (LogReg) hessian #

fn (mut o LogReg) hessian(mut d []f64, mut v []f64, mut dm la.Matrix[f64], mut hm la.Matrix[f64]) f64

hessian computes the hessian matrix and other partial derivatives

Input: d -- [nSamples] d[i] = g(l[i]) * [ 1 - g(l[i]) ] auxiliary vector v -- [nFeatures] v = ∂²C/∂θ∂b second order partial derivative dm -- [nSamples][nFeatures] dm[i][j] = d[i]*x[i][j] auxiliary matrix hm -- [nFeatures][nFeatures] hm = ∂²C/∂θ² hessian matrix

Output: w -- ∂²C/∂b²

fn (LogReg) train #

fn (mut o LogReg) train(config LogRegTrainConfig)

train finds θ and b using gradient descent or Newton's method

fn (LogReg) calcl #

fn (mut o LogReg) calcl()

calcl calculates l vector (saves into o.l) (linear model) Output: l = b⋅o + x⋅θ

fn (LogReg) calcsumq #

fn (o &LogReg) calcsumq() f64

calcsumq calculates Σq[i] where q[i] = log(1 + exp(-l[i])) Input: l -- precomputed o.l Output: sq -- sum(q)

fn (LogReg) calchmy #

fn (mut o LogReg) calchmy()

calchmy calculates h(l) - y vector (saves into o.hmy) Input: l -- precomputed o.l Output: hmy -- computes hmy = h(l) - y

fn (LogReg) str #

fn (o &LogReg) str() string

str is a custom str function for observers to avoid printing data

fn (LogReg) get_plotter #

fn (o &LogReg) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting the data and the logistic regression model

struct LogRegTrainConfig #

struct LogRegTrainConfig {
pub:
	epochs        int = 1000 // maximum number of iterations
	learning_rate f64 = 0.01 // learning rate for gradient descent
	tolerance     f64 = 1e-6 // convergence tolerance
	use_newton    bool // use Newton's method instead of gradient descent
}

LogRegTrainConfig holds training configuration for logistic regression

struct ParamsReg #

@[heap]
struct ParamsReg[T] {
pub mut:
	observable util.Observable
	// main
	theta  []T // theta parameter [nb_features]
	bias   T   // bias parameter
	lambda T   // regularization parameter
	degree int // degree of polynomial
	// backup
	bkp_theta  []T // copy of theta
	bkp_bias   T   // copy of b
	bkp_lambda T   // copy of lambda
	bkp_degree int // copy of degree
}

struct PredictConfig #

struct PredictConfig {
pub:
	max_iter int
	k        int
	to_pred  []f64
}

data needed for KNN.predict

struct RandomForest #

@[heap]
struct RandomForest {
mut:
	name         string // name of this "observer"
	data         &Data[f64] = unsafe { nil } // x-y data
	max_features int        = -1             // features per split (-1 for sqrt(n_features))
pub mut:
	n_estimators  int  = 100  // number of trees
	bootstrap     bool = true // bootstrap sampling
	trained       bool
	is_regression bool // Whether this is a regression task
	stat          &Stat[f64] = unsafe { nil } // statistics
	trees         []&DecisionTree // ensemble of decision trees
}

RandomForest implements a Random Forest classifier/regressor (Observer of Data)

fn (RandomForest) name #

fn (o &RandomForest) name() string

name returns the name of this RandomForest object (thus defining the Observer interface)

fn (RandomForest) update #

fn (mut o RandomForest) update()

update perform updates after data has been changed (as an Observer)

fn (RandomForest) set_n_estimators #

fn (mut o RandomForest) set_n_estimators(n int)

set_n_estimators sets the number of trees in the forest

fn (RandomForest) set_max_features #

fn (mut o RandomForest) set_max_features(n int)

set_max_features sets the number of features to consider for each split

fn (RandomForest) set_bootstrap #

fn (mut o RandomForest) set_bootstrap(bootstrap bool)

set_bootstrap sets whether to use bootstrap sampling

fn (RandomForest) train #

fn (mut o RandomForest) train() !

train trains the random forest

fn (RandomForest) predict #

fn (o &RandomForest) predict(x []f64) f64

predict returns the predicted value for a single sample

fn (RandomForest) predict_proba #

fn (o &RandomForest) predict_proba(x []f64) f64

predict_proba returns probability estimates for classification

fn (RandomForest) get_feature_importance #

fn (o &RandomForest) get_feature_importance() []f64

get_feature_importance returns feature importance scores

fn (RandomForest) str #

fn (o &RandomForest) str() string

str is a custom str function for observers to avoid printing data

fn (RandomForest) get_plotter #

fn (o &RandomForest) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting

struct Ridge #

@[heap]
struct Ridge {
mut:
	name   string
	data   &Data[f64] = unsafe { nil }
	fitted bool
pub mut:
	coef_      []f64
	intercept_ f64
	alpha      f64        = 1.0
	stat       &Stat[f64] = unsafe { nil }
	linreg     &LinReg    = unsafe { nil }
}

Ridge implements linear regression with L2 regularization only This is a convenience wrapper using the existing LinReg with lambda

fn (Ridge) name #

fn (o &Ridge) name() string

name returns the model name

fn (Ridge) predict #

fn (o &Ridge) predict(x []f64) f64

predict returns the prediction for a feature vector

fn (Ridge) train #

fn (mut o Ridge) train()

train fits the Ridge model

fn (Ridge) get_plotter #

fn (o &Ridge) get_plotter() &plot.Plot

get_plotter returns a plot for the model

struct SVM #

@[heap]
struct SVM {
mut:
	name string // name of this "observer"
	data &Data[f64] = unsafe { nil } // x-y data
	// SVM-specific
	support_vector_labels []f64 // labels of support vectors
	alpha                 []f64 // Lagrange multipliers [nb_samples]
	bias                  f64   // bias term
	kernel_type           KernelType = .linear
	degree                int        = 3 // polynomial kernel degree
pub mut:
	trained         bool
	stat            &Stat[f64] = unsafe { nil } // statistics
	support_vectors [][]f64 // support vectors
	c               f64 = 1.0 // regularization parameter
	gamma           f64 = 1.0 // RBF kernel parameter
}

SVM implements a Support Vector Machine classifier (Observer of Data)

fn (SVM) name #

fn (o &SVM) name() string

name returns the name of this SVM object (thus defining the Observer interface)

fn (SVM) update #

fn (mut o SVM) update()

update perform updates after data has been changed (as an Observer)

fn (SVM) set_kernel #

fn (mut o SVM) set_kernel(kernel_type KernelType, gamma f64, degree int)

set_kernel sets the kernel type and parameters

fn (SVM) set_c #

fn (mut o SVM) set_c(c f64)

set_c sets the regularization parameter C

fn (SVM) train #

fn (mut o SVM) train(max_iter int, tolerance f64)

train trains the SVM using simplified SMO (Sequential Minimal Optimization)

fn (SVM) predict #

fn (o &SVM) predict(x []f64) f64

predict returns the predicted class (0.0 or 1.0, converted from -1.0/1.0)

fn (SVM) predict_proba #

fn (o &SVM) predict_proba(x []f64) f64

predict_proba returns probability estimate (for soft margin)

fn (SVM) str #

fn (o &SVM) str() string

str is a custom str function for observers to avoid printing data

fn (SVM) get_plotter #

fn (o &SVM) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting the data and SVM decision boundary

struct Stat #

@[heap]
struct Stat[T] {
pub mut:
	data   &Data[T] = unsafe { nil } // data
	name   string // name of this object
	min_x  []T    // [n_features] min x values
	max_x  []T    // [n_features] max x values
	sum_x  []T    // [n_features] sum of x values
	mean_x []T    // [n_features] mean of x values
	sig_x  []T    // [n_features] standard deviations of x
	del_x  []T    // [n_features] difference: max(x) - min(x)
	min_y  T      // min of y values
	max_y  T      // max of y values
	sum_y  T      // sum of y values
	mean_y T      // mean of y values
	sig_y  T      // standard deviation of y
	del_y  T      // difference: max(y) - min(y)
}

Stat holds statistics about data

Note: Stat is an Observer of Data; thus, data.notify_update() will recompute stat

struct TrainConfig #

struct TrainConfig {
pub:
	epochs          int
	tol_norm_change f64
}

struct TreeNode #

@[heap]
struct TreeNode {
mut:
	feature_index int = -1 // Feature index for split (-1 for leaf)
	threshold     f64 // Threshold value for split
	left          &TreeNode = unsafe { nil } // Left child
	right         &TreeNode = unsafe { nil } // Right child
	value         f64  // Value for leaf nodes (class or regression value)
	is_leaf       bool // Whether this is a leaf node
	samples       int  // Number of samples in this node
}

TreeNode represents a node in the decision tree