ml #
VSL Machine Learning (vsl.ml)
VSL aims to provide a robust set of tools for scientific computing with an emphasis on performance and ease of use. In the vsl.ml module, some machine learning models are designed as observers of data, meaning they re-train automatically when data changes, while others do not require this functionality.
Key Features
- Observers of Data: Some machine learning models in VSL act as observers, re-training automatically when data changes.
- High Performance: Leverages V’s performance optimizations and can integrate with C and Fortran libraries like Open BLAS and LAPACK.
- Versatile Algorithms: Supports a variety of machine learning algorithms and models.
Usage
Loading Data
The Data struct in vsl.ml is designed to hold data in matrix format for machine learning tasks. Here's a brief overview of how to use it:
Creating a Data Object
You can create a Data object using the following methods:
Data.new: Creates a newDataobject with specified dimensions.Data.from_raw_x: Creates aDataobject from raw x values (without y values).Data.from_raw_xy: Creates aDataobject from raw x and y values combined in a single matrix.Data.from_raw_xy_sep: Creates aDataobject from separate x and y raw values.
Data Methods
The Data struct has several key methods to manage and manipulate data:
set(x, y): Sets the x matrix and y vector and notifies observers.set_y(y): Sets the y vector and notifies observers.set_x(x): Sets the x matrix and notifies observers.split(ratio): Splits the data into two parts based on the given ratio.clone(): Returns a deep copy of the Data object without observers.clone_with_same_x(): Returns a deep copy of the Data object but shares the same x reference.add_observer(obs): Adds an observer to the data object.notify_update(): Notifies observers of data changes.
Stat Observer
The Stat struct is an observer of Data, providing statistical analysis of the data it observes. It automatically updates its statistics when the underlying data changes.
Observer Models
The following machine learning models in VSL are compatible with the Observer pattern. This means they can observe data changes and automatically update themselves.
K-Means Clustering
K-Means Clustering is used for unsupervised learning to group data points into clusters. As an observer model, it re-trains automatically when the data changes, which is useful for dynamic datasets that require continuous updates.
K-Nearest Neighbors (KNN)
K-Nearest Neighbors (KNN) is used for classification tasks where the target variable is categorical. As an observer model, it re-trains automatically when the data changes, which is beneficial for datasets that are frequently updated.
Logistic Regression
Logistic Regression is used for binary classification tasks. As an observer model, it automatically updates when data changes, recalculating internal statistics and preparing for retraining.
Support Vector Machine (SVM)
Support Vector Machine (SVM) is used for binary classification with support for non-linear decision boundaries through kernel functions. As an observer model, it marks itself for retraining when data changes.
Decision Tree
Decision Tree can handle both classification and regression tasks. As an observer model, it marks itself for retraining when data changes, allowing the tree to be rebuilt with new data.
Random Forest
Random Forest is an ensemble method combining multiple decision trees. As an observer model, it marks itself for retraining when data changes, allowing the entire forest to be rebuilt.
Non-Observer Models
The following machine learning models in VSL do not require the observer pattern and are trained once on a dataset without continuous updates.
Linear Regression
Linear Regression is used for predicting a continuous target variable based on one or more predictor variables. It is typically trained once on a dataset and used to make predictions without requiring continuous updates. Hence, it is not implemented as an observer model.
fn safe_log_1p_exp #
fn safe_log_1p_exp(z f64) f64
safe_log_1p_exp computes log(1+exp(-z)) safely by checking if exp(-z) is >> 1, thus returning -z. This is the case when z<0 and |z| is too large
fn CriterionType.from #
fn CriterionType.from[W](input W) !CriterionType
fn Data.from_raw_x #
fn Data.from_raw_x[T](xraw [][]T) !&Data[T]
Data.from_raw_x returns a new object with data set from raw x values Input: xraw -- [nb_samples][nb_features] table with x values (NO y values) Output: new object
fn Data.from_raw_xy #
fn Data.from_raw_xy[T](xyraw [][]T) !&Data[T]
Data.from_raw_xy returns a new object with data set from raw Xy values Input: Xyraw -- [nb_samples][nb_features+1] table with x and y raw values, where the last column contains y-values Output: new object
fn Data.from_raw_xy_sep #
fn Data.from_raw_xy_sep[T](xraw [][]T, yraw []T) !&Data[T]
Data.from_raw_xy_sep accepts two parameters: xraw [][]T and yraw []T. It acts similarly to Data.from_raw_xy, but instead of using the last column of xraw as the y data, it uses yraw instead.
fn Data.new #
fn Data.new[T](nb_samples int, nb_features int, use_y bool, allocate bool) !&Data[T]
Data.new returns a new object to hold ML data Input: nb_samples -- number of data samples (rows in x) nb_features -- number of features (columns in x) use_y -- use y data vector allocate -- allocates x (and y); otherwise, x and y must be set using set() method Output: new object
fn DecisionTree.new #
fn DecisionTree.new(mut data Data[f64], name string) &DecisionTree
DecisionTree.new returns a new DecisionTree object Input: data -- x,y data name -- unique name of this (observer) object Output: new DecisionTree object
fn ElasticNet.new #
fn ElasticNet.new(mut data Data[f64], name string, alpha f64, l1_ratio f64) &ElasticNet
ElasticNet.new creates a new ElasticNet regression model
fn KNN.new #
fn KNN.new(mut data Data[f64], name string) !&KNN
KNN.new accepts a vml.ml.Data parameter called data, that will be used to predict values with KNN.predict. You can use the following piece of code to make your life easier:mut knn := KNN.new(mut Data.from_raw_xy_sep([[0.0, 0.0], [10.0, 10.0]], [0.0, 1.0]))If you predict with knn.predict(1, [9.0, 9.0]), it should return 1.0 as it is the closestto [10.0, 10.0] (which is class 1.0).
fn KernelType.from #
fn KernelType.from[W](input W) !KernelType
fn Kmeans.new #
fn Kmeans.new(mut data Data[f64], nb_classes int, name string) &Kmeans
Kmeans.new returns a new K-means model
fn Lasso.new #
fn Lasso.new(mut data Data[f64], name string, alpha f64) &Lasso
Lasso.new creates a new Lasso regression model
fn LinReg.new #
fn LinReg.new(mut data Data[f64], name string) &LinReg
LinReg.new returns a new LinReg object Input: data -- x,y data name -- unique name of this (observer) object
fn LogReg.new #
fn LogReg.new(mut data Data[f64], name string) &LogReg
LogReg.new returns a new LogReg object Input: data -- x,y data name -- unique name of this (observer) object Output: new LogReg object
fn ParamsReg.new #
fn ParamsReg.new[T](nb_features int) &ParamsReg[T]
ParamsReg.new returns a new object to hold regression parameters
fn RandomForest.new #
fn RandomForest.new(mut data Data[f64], name string) &RandomForest
RandomForest.new returns a new RandomForest object Input: data -- x,y data name -- unique name of this (observer) object Output: new RandomForest object
fn RegularizationType.from #
fn RegularizationType.from[W](input W) !RegularizationType
fn Ridge.new #
fn Ridge.new(mut data Data[f64], name string, alpha f64) &Ridge
Ridge.new creates a new Ridge regression model
fn SVM.new #
fn SVM.new(mut data Data[f64], name string) &SVM
SVM.new returns a new SVM object Input: data -- x,y data (y should be -1.0 or 1.0 for SVM) name -- unique name of this (observer) object Output: new SVM object
fn Stat.from_data #
fn Stat.from_data[T](mut data Data[T], name string) &Stat[T]
stat returns a new Stat object
fn (Data[T]) set #
fn (mut o Data[T]) set(x &la.Matrix[T], y []T) !
set sets x matrix and y vector [optional] and notify observers Input: x -- x values y -- y values [optional]
fn (Data[T]) set_y #
fn (mut o Data[T]) set_y(y []T) !
fn (Data[T]) set_x #
fn (mut o Data[T]) set_x(x &la.Matrix[T]) !
fn (Data[T]) clone #
fn (o &Data[T]) clone() !&Data[T]
clone returns a deep copy of this object removing the observers
fn (Data[T]) clone_with_same_x #
fn (o &Data[T]) clone_with_same_x() !&Data[T]
clone_with_same_x returns a deep copy of this object, but with the same reference to x removing the observers
fn (Data[T]) add_observer #
fn (mut o Data[T]) add_observer(obs util.Observer)
add_observer adds an object to the list of interested observers
fn (Data[T]) notify_update #
fn (mut o Data[T]) notify_update()
notify_update notifies observers of updates
fn (Data[T]) split #
fn (o &Data[T]) split(ratio f64) !(&Data[T], &Data[T])
split returns a new object with data split into two parts Input: ratio -- ratio of samples to be put in the first part Output: new object
fn (ParamsReg[T]) init #
fn (mut o ParamsReg[T]) init(nb_features int)
init initializes ParamsReg with nb_features (number of features)
fn (ParamsReg[T]) backup #
fn (mut o ParamsReg[T]) backup()
backup creates an internal copy of parameters
fn (ParamsReg[T]) restore #
fn (mut o ParamsReg[T]) restore(skip_notification bool)
restore restores an internal copy of parameters and notifies observers
fn (ParamsReg[T]) set_params #
fn (mut o ParamsReg[T]) set_params(theta []T, b T)
set_params sets theta and b and notifies observers
fn (ParamsReg[T]) set_param #
fn (mut o ParamsReg[T]) set_param(i int, value T)
set_param sets either theta or b (use negative indices for b). Notifies observers i -- index of theta or -1 for bias
fn (ParamsReg[T]) get_param #
fn (o &ParamsReg[T]) get_param(i int) T
get_param returns either theta or b (use negative indices for b) i -- index of theta or -1 for bias
fn (ParamsReg[T]) set_thetas #
fn (mut o ParamsReg[T]) set_thetas(theta []T)
set_thetas sets the whole vector theta and notifies observers
fn (ParamsReg[T]) get_thetas #
fn (o &ParamsReg[T]) get_thetas() []T
get_thetas gets a copy of theta
fn (ParamsReg[T]) access_thetas #
fn (o &ParamsReg[T]) access_thetas() []T
access_thetas returns access (slice) to theta
fn (ParamsReg[T]) access_bias #
fn (o &ParamsReg[T]) access_bias() &T
access_bias returns access (pointer) to b
fn (ParamsReg[T]) set_theta #
fn (mut o ParamsReg[T]) set_theta(i int, thetai T)
set_theta sets one component of theta and notifies observers
fn (ParamsReg[T]) get_theta #
fn (o &ParamsReg[T]) get_theta(i int) T
get_theta returns the value of theta[i]
fn (ParamsReg[T]) set_bias #
fn (mut o ParamsReg[T]) set_bias(b T)
set_bias sets b and notifies observers
fn (ParamsReg[T]) get_bias #
fn (o &ParamsReg[T]) get_bias() T
get_bias gets a copy of b
fn (ParamsReg[T]) set_lambda #
fn (mut o ParamsReg[T]) set_lambda(lambda T)
set_lambda sets lambda and notifies observers
fn (ParamsReg[T]) get_lambda #
fn (o &ParamsReg[T]) get_lambda() T
get_lambda gets a copy of lambda
fn (ParamsReg[T]) set_degree #
fn (mut o ParamsReg[T]) set_degree(p int)
set_degree sets p and notifies observers
fn (ParamsReg[T]) get_degree #
fn (o &ParamsReg[T]) get_degree() int
get_degree gets a copy of p
fn (ParamsReg[T]) add_observer #
fn (mut o ParamsReg[T]) add_observer(obs util.Observer)
add_observer adds an object to the list of interested observers
fn (ParamsReg[T]) notify_update #
fn (mut o ParamsReg[T]) notify_update()
notify_update notifies observers of updates
fn (Stat[T]) name #
fn (o &Stat[T]) name() string
name returns the name of this stat object (thus defining the Observer interface)
fn (Stat[T]) update #
fn (mut o Stat[T]) update()
update compute statistics for given data (an Observer of Data)
fn (Stat[T]) sum_vars #
fn (mut o Stat[T]) sum_vars() ([]T, T)
sum_vars computes the sums along the columns of X and y Output: t -- scalar t = oᵀy sum of columns of the y vector: t = Σ_i^m o_i y_i s -- vector s = Xᵀo sum of columns of the X matrix: s_j = Σ_i^m o_i X_ij [n_features]
fn (Stat[T]) copy_into #
fn (o &Stat[T]) copy_into(mut p Stat[T])
copy_into copies stat into p
fn (Stat[T]) str #
fn (o &Stat[T]) str() string
str is a custom str function for observers to avoid printing data
enum CriterionType #
enum CriterionType {
gini // Gini impurity (for classification)
entropy // Information gain / entropy (for classification)
mse // Mean Squared Error (for regression)
}
CriterionType represents the splitting criterion
enum KernelType #
enum KernelType {
linear
polynomial
rbf
}
KernelType represents the type of kernel function
enum RegularizationType #
enum RegularizationType {
none // No regularization
l1 // Lasso (L1)
l2 // Ridge (L2)
elasticnet // ElasticNet (L1 + L2)
}
RegularizationType specifies the type of regularization
struct Data #
struct Data[T] {
pub mut:
observable util.Observable = util.Observable{}
nb_samples int // number of data points (samples). number of rows in x and y
nb_features int // number of features. number of columns in x
x &la.Matrix[T] = unsafe { nil } // [nb_samples][nb_features] x values
y []T // [nb_samples] y values [optional]
}
struct DecisionTree #
struct DecisionTree {
mut:
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x-y data
stat &Stat[f64] = unsafe { nil } // statistics
root &TreeNode = unsafe { nil } // Root of the tree
max_depth int = -1 // Maximum depth (-1 for unlimited)
min_samples_split int = 2 // Minimum samples to split
min_samples_leaf int = 1 // Minimum samples in leaf
criterion CriterionType = .gini // Splitting criterion
trained bool
is_regression bool // Whether this is a regression task
}
DecisionTree implements a decision tree classifier/regressor (Observer of Data)
fn (DecisionTree) name #
fn (o &DecisionTree) name() string
name returns the name of this DecisionTree object (thus defining the Observer interface)
fn (DecisionTree) update #
fn (mut o DecisionTree) update()
update perform updates after data has been changed (as an Observer)
fn (DecisionTree) set_max_depth #
fn (mut o DecisionTree) set_max_depth(depth int)
set_max_depth sets the maximum depth of the tree
fn (DecisionTree) set_min_samples_split #
fn (mut o DecisionTree) set_min_samples_split(min_samples int)
set_min_samples_split sets the minimum samples required to split
fn (DecisionTree) set_min_samples_leaf #
fn (mut o DecisionTree) set_min_samples_leaf(min_samples int)
set_min_samples_leaf sets the minimum samples in a leaf node
fn (DecisionTree) set_criterion #
fn (mut o DecisionTree) set_criterion(criterion CriterionType)
set_criterion sets the splitting criterion
fn (DecisionTree) train #
fn (mut o DecisionTree) train()
train builds the decision tree
fn (DecisionTree) predict #
fn (o &DecisionTree) predict(x []f64) f64
predict returns the predicted value for a single sample
fn (DecisionTree) predict_batch #
fn (o &DecisionTree) predict_batch(x [][]f64) []f64
predict_batch returns predictions for multiple samples
fn (DecisionTree) str #
fn (o &DecisionTree) str() string
str is a custom str function for observers to avoid printing data
fn (DecisionTree) get_plotter #
fn (o &DecisionTree) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting (2D only)
struct ElasticNet #
struct ElasticNet {
mut:
name string
data &Data[f64] = unsafe { nil }
fitted bool
pub mut:
coef_ []f64
intercept_ f64
alpha f64 = 1.0 // regularization strength
l1_ratio f64 = 0.5 // ratio of L1 vs L2 (1.0 = pure Lasso, 0.0 = pure Ridge)
max_iter int = 1000
tol f64 = 1e-4
stat &Stat[f64] = unsafe { nil }
}
ElasticNet implements linear regression with combined L1 and L2 regularization
fn (ElasticNet) name #
fn (o &ElasticNet) name() string
name returns the model name
fn (ElasticNet) predict #
fn (o &ElasticNet) predict(x []f64) f64
predict returns the prediction for a feature vector
fn (ElasticNet) train #
fn (mut o ElasticNet) train()
train fits the ElasticNet model using coordinate descent
fn (ElasticNet) get_plotter #
fn (o &ElasticNet) get_plotter() &plot.Plot
get_plotter returns a plot for the model
struct KNN #
struct KNN {
mut:
name string // name of this "observer"
data &Data[f64] = unsafe { nil }
weights map[f64]f64 // weights[class] = weight
pub mut:
neighbors []Neighbor
trained bool
}
KNN is the struct defining a K-Nearest Neighbors classifier.
fn (KNN) name #
fn (o &KNN) name() string
name returns the name of this KNN object (thus defining the Observer interface)
fn (KNN) set_weights #
fn (mut knn KNN) set_weights(weights map[f64]f64) !
set_weights will set the weights for the KNN. They default to 1.0 for every class when this function is not called.
fn (KNN) update #
fn (mut knn KNN) update()
update perform updates after data has been changed (as an Observer)
fn (KNN) train #
fn (mut knn KNN) train()
train computes the neighbors and weights during training
fn (KNN) predict #
fn (mut knn KNN) predict(config PredictConfig) !f64
predict will find the k points nearest to the specified to_pred. If the value of k results in a draw - that is, a tie when determining the most frequent class in those k nearest neighbors (example: class 1 has 10 occurrences, class 2 has 5 and class 3 has 10) -, k will be decreased until there are no more ties. The worst case scenario is k ending up as 1. Also, it makes sure that if we do have a tie when k = 1, we select the first closest neighbor.
fn (KNN) str #
fn (o &KNN) str() string
str is a custom str function for observers to avoid printing data
fn (KNN) get_plotter #
fn (o &KNN) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct with the data needed to plot the KNN model.
struct Kmeans #
struct Kmeans {
mut:
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x data
stat &Stat[f64] = unsafe { nil } // statistics about x (data)
nb_classes int // expected number of classes
bins &gm.Bins = unsafe { nil } // "bins" to speed up searching for data points given their coordinates (2D or 3D only at the moment)
nb_iter int // number of iterations
pub mut:
classes []int // [nb_samples] indices of classes of each sample
centroids [][]f64 // [nb_classes][nb_features] coordinates of centroids
nb_members []int // [nb_classes] number of members in each class
}
Kmeans implements the K-means model (Observer of Data)
fn (Kmeans) name #
fn (o &Kmeans) name() string
name returns the name of this Kmeans object (thus defining the Observer interface)
fn (Kmeans) update #
fn (mut o Kmeans) update()
update perform updates after data has been changed (as an Observer)
fn (Kmeans) nb_classes #
fn (o &Kmeans) nb_classes() int
nb_classes returns the number of classes
fn (Kmeans) set_centroids #
fn (mut o Kmeans) set_centroids(xc [][]f64)
set_centroids sets centroids; e.g. trial centroids xc -- [nb_class][nb_features]
fn (Kmeans) find_closest_centroids #
fn (mut o Kmeans) find_closest_centroids()
find_closest_centroids finds closest centroids to each sample
fn (Kmeans) compute_centroids #
fn (mut o Kmeans) compute_centroids()
compute_centroids update centroids based on new classes information (from find_closest_centroids)
fn (Kmeans) train #
fn (mut o Kmeans) train(config TrainConfig)
train trains model
fn (Kmeans) str #
fn (o &Kmeans) str() string
str is a custom str function for observers to avoid printing data
fn (Kmeans) get_plotter #
fn (o &Kmeans) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting
struct Lasso #
struct Lasso {
mut:
name string // name of this model
data &Data[f64] = unsafe { nil } // x-y data
fitted bool
pub mut:
coef_ []f64 // learned coefficients (theta)
intercept_ f64 // learned intercept (bias)
alpha f64 = 1.0 // regularization strength
max_iter int = 1000 // maximum iterations
tol f64 = 1e-4 // convergence tolerance
stat &Stat[f64] = unsafe { nil } // statistics
}
Lasso implements a linear regression model with L1 regularization Uses coordinate descent algorithm for optimization
fn (Lasso) name #
fn (o &Lasso) name() string
name returns the model name
fn (Lasso) predict #
fn (o &Lasso) predict(x []f64) f64
predict returns the prediction for a feature vector
fn (Lasso) train #
fn (mut o Lasso) train()
train fits the Lasso model using coordinate descent
fn (Lasso) get_plotter #
fn (o &Lasso) get_plotter() &plot.Plot
get_plotter returns a plot for the model
struct LinReg #
struct LinReg {
mut:
// main
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x-y data
// workspace
e []f64 // vector e = b⋅o + x⋅theta - y [nb_samples]
pub mut:
stat &Stat[f64] = unsafe { nil } // statistics
params &ParamsReg[f64] = unsafe { nil }
}
LinReg implements a linear regression model
fn (LinReg) name #
fn (o &LinReg) name() string
name returns the name of this LinReg object (thus defining the Observer interface)
fn (LinReg) predict #
fn (o &LinReg) predict(x []f64) f64
predict returns the model evaluation @ {x;theta,b} Input: x -- vector of features Output: y -- model prediction y(x)
fn (LinReg) cost #
fn (mut o LinReg) cost() f64
cost returns the cost c(x;theta,b) Input: data -- x,y data params -- theta and b x -- vector of features Output: c -- total cost (model error)
fn (LinReg) gradients #
fn (mut o LinReg) gradients() ([]f64, f64)
gradients returns ∂C/∂theta and ∂C/∂b Output: dcdtheta -- ∂C/∂theta dcdb -- ∂C/∂b
fn (LinReg) train #
fn (mut o LinReg) train()
train finds theta and b using closed-form solution Input: data -- x,y data Output: params -- theta and b
fn (LinReg) calce #
fn (mut o LinReg) calce()
calce calculates e vector (save into o.e) Output: e = b⋅o + x⋅theta - y
fn (LinReg) str #
fn (o &LinReg) str() string
str is a custom str function for observers to avoid printing data
fn (LinReg) get_plotter #
fn (o &LinReg) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting the data and the linear regression model
struct LogReg #
struct LogReg {
mut:
// main
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x-y data
// workspace
ybar []f64 // bar{y}: yb[i] = (1 - y[i]) / m
l []f64 // vector l = b⋅o + x⋅θ [nb_samples]
hmy []f64 // vector e = h(l) - y [nb_samples]
pub mut:
params &ParamsReg[f64] = unsafe { nil } // parameters: θ, b, λ
stat &Stat[f64] = unsafe { nil } // statistics
}
LogReg implements a logistic regression model (Observer of Data)
fn (LogReg) name #
fn (o &LogReg) name() string
name returns the name of this LogReg object (thus defining the Observer interface)
fn (LogReg) update #
fn (mut o LogReg) update()
update perform updates after data has been changed (as an Observer)
fn (LogReg) predict #
fn (o &LogReg) predict(x []f64) f64
predict returns the model evaluation @ {x;θ,b} Input: x -- vector of features Output: y -- model prediction y(x) (probability between 0 and 1)
fn (LogReg) cost #
fn (mut o LogReg) cost() f64
cost returns the cost c(x;θ,b) Output: c -- total cost (model error)
fn (LogReg) allocate_gradient #
fn (o &LogReg) allocate_gradient() []f64
allocate_gradient allocate object to compute gradients
fn (LogReg) gradients #
fn (mut o LogReg) gradients() ([]f64, f64)
gradients returns ∂C/∂θ and ∂C/∂b Output: dcdtheta -- ∂C/∂θ dcdb -- ∂C/∂b
fn (LogReg) allocate_hessian #
fn (o &LogReg) allocate_hessian() ([]f64, []f64, &la.Matrix[f64], &la.Matrix[f64])
allocate_hessian allocate objects to compute hessian
fn (LogReg) hessian #
fn (mut o LogReg) hessian(mut d []f64, mut v []f64, mut dm la.Matrix[f64], mut hm la.Matrix[f64]) f64
hessian computes the hessian matrix and other partial derivatives
Input: d -- [nSamples] d[i] = g(l[i]) * [ 1 - g(l[i]) ] auxiliary vector v -- [nFeatures] v = ∂²C/∂θ∂b second order partial derivative dm -- [nSamples][nFeatures] dm[i][j] = d[i]*x[i][j] auxiliary matrix hm -- [nFeatures][nFeatures] hm = ∂²C/∂θ² hessian matrix
Output: w -- ∂²C/∂b²
fn (LogReg) train #
fn (mut o LogReg) train(config LogRegTrainConfig)
train finds θ and b using gradient descent or Newton's method
fn (LogReg) calcl #
fn (mut o LogReg) calcl()
calcl calculates l vector (saves into o.l) (linear model) Output: l = b⋅o + x⋅θ
fn (LogReg) calcsumq #
fn (o &LogReg) calcsumq() f64
calcsumq calculates Σq[i] where q[i] = log(1 + exp(-l[i])) Input: l -- precomputed o.l Output: sq -- sum(q)
fn (LogReg) calchmy #
fn (mut o LogReg) calchmy()
calchmy calculates h(l) - y vector (saves into o.hmy) Input: l -- precomputed o.l Output: hmy -- computes hmy = h(l) - y
fn (LogReg) str #
fn (o &LogReg) str() string
str is a custom str function for observers to avoid printing data
fn (LogReg) get_plotter #
fn (o &LogReg) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting the data and the logistic regression model
struct LogRegTrainConfig #
struct LogRegTrainConfig {
pub:
epochs int = 1000 // maximum number of iterations
learning_rate f64 = 0.01 // learning rate for gradient descent
tolerance f64 = 1e-6 // convergence tolerance
use_newton bool // use Newton's method instead of gradient descent
}
LogRegTrainConfig holds training configuration for logistic regression
struct ParamsReg #
struct ParamsReg[T] {
pub mut:
observable util.Observable
// main
theta []T // theta parameter [nb_features]
bias T // bias parameter
lambda T // regularization parameter
degree int // degree of polynomial
// backup
bkp_theta []T // copy of theta
bkp_bias T // copy of b
bkp_lambda T // copy of lambda
bkp_degree int // copy of degree
}
struct PredictConfig #
struct PredictConfig {
pub:
max_iter int
k int
to_pred []f64
}
data needed for KNN.predict
struct RandomForest #
struct RandomForest {
mut:
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x-y data
max_features int = -1 // features per split (-1 for sqrt(n_features))
pub mut:
n_estimators int = 100 // number of trees
bootstrap bool = true // bootstrap sampling
trained bool
is_regression bool // Whether this is a regression task
stat &Stat[f64] = unsafe { nil } // statistics
trees []&DecisionTree // ensemble of decision trees
}
RandomForest implements a Random Forest classifier/regressor (Observer of Data)
fn (RandomForest) name #
fn (o &RandomForest) name() string
name returns the name of this RandomForest object (thus defining the Observer interface)
fn (RandomForest) update #
fn (mut o RandomForest) update()
update perform updates after data has been changed (as an Observer)
fn (RandomForest) set_n_estimators #
fn (mut o RandomForest) set_n_estimators(n int)
set_n_estimators sets the number of trees in the forest
fn (RandomForest) set_max_features #
fn (mut o RandomForest) set_max_features(n int)
set_max_features sets the number of features to consider for each split
fn (RandomForest) set_bootstrap #
fn (mut o RandomForest) set_bootstrap(bootstrap bool)
set_bootstrap sets whether to use bootstrap sampling
fn (RandomForest) train #
fn (mut o RandomForest) train() !
train trains the random forest
fn (RandomForest) predict #
fn (o &RandomForest) predict(x []f64) f64
predict returns the predicted value for a single sample
fn (RandomForest) predict_proba #
fn (o &RandomForest) predict_proba(x []f64) f64
predict_proba returns probability estimates for classification
fn (RandomForest) get_feature_importance #
fn (o &RandomForest) get_feature_importance() []f64
get_feature_importance returns feature importance scores
fn (RandomForest) str #
fn (o &RandomForest) str() string
str is a custom str function for observers to avoid printing data
fn (RandomForest) get_plotter #
fn (o &RandomForest) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting
struct Ridge #
struct Ridge {
mut:
name string
data &Data[f64] = unsafe { nil }
fitted bool
pub mut:
coef_ []f64
intercept_ f64
alpha f64 = 1.0
stat &Stat[f64] = unsafe { nil }
linreg &LinReg = unsafe { nil }
}
Ridge implements linear regression with L2 regularization only This is a convenience wrapper using the existing LinReg with lambda
fn (Ridge) name #
fn (o &Ridge) name() string
name returns the model name
fn (Ridge) predict #
fn (o &Ridge) predict(x []f64) f64
predict returns the prediction for a feature vector
fn (Ridge) train #
fn (mut o Ridge) train()
train fits the Ridge model
fn (Ridge) get_plotter #
fn (o &Ridge) get_plotter() &plot.Plot
get_plotter returns a plot for the model
struct SVM #
struct SVM {
mut:
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x-y data
// SVM-specific
support_vector_labels []f64 // labels of support vectors
alpha []f64 // Lagrange multipliers [nb_samples]
bias f64 // bias term
kernel_type KernelType = .linear
degree int = 3 // polynomial kernel degree
pub mut:
trained bool
stat &Stat[f64] = unsafe { nil } // statistics
support_vectors [][]f64 // support vectors
c f64 = 1.0 // regularization parameter
gamma f64 = 1.0 // RBF kernel parameter
}
SVM implements a Support Vector Machine classifier (Observer of Data)
fn (SVM) name #
fn (o &SVM) name() string
name returns the name of this SVM object (thus defining the Observer interface)
fn (SVM) update #
fn (mut o SVM) update()
update perform updates after data has been changed (as an Observer)
fn (SVM) set_kernel #
fn (mut o SVM) set_kernel(kernel_type KernelType, gamma f64, degree int)
set_kernel sets the kernel type and parameters
fn (SVM) set_c #
fn (mut o SVM) set_c(c f64)
set_c sets the regularization parameter C
fn (SVM) train #
fn (mut o SVM) train(max_iter int, tolerance f64)
train trains the SVM using simplified SMO (Sequential Minimal Optimization)
fn (SVM) predict #
fn (o &SVM) predict(x []f64) f64
predict returns the predicted class (0.0 or 1.0, converted from -1.0/1.0)
fn (SVM) predict_proba #
fn (o &SVM) predict_proba(x []f64) f64
predict_proba returns probability estimate (for soft margin)
fn (SVM) str #
fn (o &SVM) str() string
str is a custom str function for observers to avoid printing data
fn (SVM) get_plotter #
fn (o &SVM) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting the data and SVM decision boundary
struct Stat #
struct Stat[T] {
pub mut:
data &Data[T] = unsafe { nil } // data
name string // name of this object
min_x []T // [n_features] min x values
max_x []T // [n_features] max x values
sum_x []T // [n_features] sum of x values
mean_x []T // [n_features] mean of x values
sig_x []T // [n_features] standard deviations of x
del_x []T // [n_features] difference: max(x) - min(x)
min_y T // min of y values
max_y T // max of y values
sum_y T // sum of y values
mean_y T // mean of y values
sig_y T // standard deviation of y
del_y T // difference: max(y) - min(y)
}
Stat holds statistics about data
Note: Stat is an Observer of Data; thus, data.notify_update() will recompute stat
struct TrainConfig #
struct TrainConfig {
pub:
epochs int
tol_norm_change f64
}
struct TreeNode #
struct TreeNode {
mut:
feature_index int = -1 // Feature index for split (-1 for leaf)
threshold f64 // Threshold value for split
left &TreeNode = unsafe { nil } // Left child
right &TreeNode = unsafe { nil } // Right child
value f64 // Value for leaf nodes (class or regression value)
is_leaf bool // Whether this is a leaf node
samples int // Number of samples in this node
}
TreeNode represents a node in the decision tree
- README
- fn safe_log_1p_exp
- fn CriterionType.from
- fn Data.from_raw_x
- fn Data.from_raw_xy
- fn Data.from_raw_xy_sep
- fn Data.new
- fn DecisionTree.new
- fn ElasticNet.new
- fn KNN.new
- fn KernelType.from
- fn Kmeans.new
- fn Lasso.new
- fn LinReg.new
- fn LogReg.new
- fn ParamsReg.new
- fn RandomForest.new
- fn RegularizationType.from
- fn Ridge.new
- fn SVM.new
- fn Stat.from_data
- type Data[T]
- type ParamsReg[T]
- type Stat[T]
- enum CriterionType
- enum KernelType
- enum RegularizationType
- struct Data
- struct DecisionTree
- struct ElasticNet
- struct KNN
- struct Kmeans
- struct Lasso
- struct LinReg
- struct LogReg
- struct LogRegTrainConfig
- struct ParamsReg
- struct PredictConfig
- struct RandomForest
- struct Ridge
- struct SVM
- struct Stat
- struct TrainConfig
- struct TreeNode