ml #
VSL Machine Learning (vsl.ml)
VSL aims to provide a robust set of tools for scientific computing with an emphasis on performance and ease of use. In the vsl.ml
module, some machine learning models are designed as observers of data, meaning they re-train automatically when data changes, while others do not require this functionality.
Key Features
- Observers of Data: Some machine learning models in VSL act as observers,re-training automatically when data changes.- High Performance: Leverages V’s performance optimizations and can integratewith C and Fortran libraries like Open BLAS and LAPACK.- Versatile Algorithms: Supports a variety of machine learning algorithms andmodels.
Usage
Loading Data
The Data
struct in vsl.ml
is designed to hold data in matrix format for machine learning tasks. Here's a brief overview of how to use it:
Creating a Data Object
You can create a Data
object using the following methods:
Data.new
: Creates a newData
object with specified dimensions.Data.from_raw_x
: Creates aData
object from raw x values (without y values).Data.from_raw_xy
: Creates aData
object from raw x and y values combined in a single matrix.Data.from_raw_xy_sep
: Creates aData
object from separate x and y raw values.
Data Methods
The Data
struct has several key methods to manage and manipulate data:
set(x, y)
: Sets the x matrix and y vector and notifies observers.set_y(y)
: Sets the y vector and notifies observers.set_x(x)
: Sets the x matrix and notifies observers.split(ratio)
: Splits the data into two parts based on the given ratio.clone()
: Returns a deep copy of the Data object without observers.clone_with_same_x()
: Returns a deep copy of the Data object but shares the same x reference.add_observer(obs)
: Adds an observer to the data object.notify_update()
: Notifies observers of data changes.
Stat Observer
The Stat
struct is an observer of Data
, providing statistical analysis of the data it observes. It automatically updates its statistics when the underlying data changes.
Observer Models
The following machine learning models in VSL are compatible with the Observer
pattern. This means they can observe data changes and automatically update themselves.
K-Means Clustering
K-Means Clustering is used for unsupervised learning to group data points into clusters. As an observer model, it re-trains automatically when the data changes, which is useful for dynamic datasets that require continuous updates.
K-Nearest Neighbors (KNN)
K-Nearest Neighbors (KNN) is used for classification tasks where the target variable is categorical. As an observer model, it re-trains automatically when the data changes, which is beneficial for datasets that are frequently updated.
Non-Observer Models
The following machine learning models in VSL do not require the observer pattern and are trained once on a dataset without continuous updates.
Linear Regression
Linear Regression is used for predicting a continuous target variable based on one or more predictor variables. It is typically trained once on a dataset and used to make predictions without requiring continuous updates. Hence, it is not implemented as an observer model.
fn Data.from_raw_x #
fn Data.from_raw_x[T](xraw [][]T) !&Data[T]
Data.from_raw_x returns a new object with data set from raw x values Input: xraw -- [nb_samples][nb_features] table with x values (NO y values) Output: new object
fn Data.from_raw_xy #
fn Data.from_raw_xy[T](xyraw [][]T) !&Data[T]
Data.from_raw_xy returns a new object with data set from raw Xy values Input: Xyraw -- [nb_samples][nb_features+1] table with x and y raw values, where the last column contains y-values Output: new object
fn Data.from_raw_xy_sep #
fn Data.from_raw_xy_sep[T](xraw [][]T, yraw []T) !&Data[T]
Data.from_raw_xy_sep accepts two parameters: xraw [][]T and yraw []T. It acts similarly to Data.from_raw_xy, but instead of using the last column of xraw as the y data, it uses yraw instead.
fn Data.new #
fn Data.new[T](nb_samples int, nb_features int, use_y bool, allocate bool) !&Data[T]
Data.new returns a new object to hold ML data Input: nb_samples -- number of data samples (rows in x) nb_features -- number of features (columns in x) use_y -- use y data vector allocate -- allocates x (and y); otherwise, x and y must be set using set() method Output: new object
fn KNN.new #
fn KNN.new(mut data Data[f64], name string) !&KNN
KNN.new accepts a vml.ml.Data
parameter called data
, that will be used to predict values with KNN.predict
. You can use the following piece of code to make your life easier:mut knn := KNN.new(mut Data.from_raw_xy_sep([[0.0, 0.0], [10.0, 10.0]], [0.0, 1.0]))
If you predict with knn.predict(1, [9.0, 9.0])
, it should return 1.0 as it is the closestto [10.0, 10.0] (which is class 1.0).
fn Kmeans.new #
fn Kmeans.new(mut data Data[f64], nb_classes int, name string) &Kmeans
Kmeans.new returns a new K-means model
fn LinReg.new #
fn LinReg.new(mut data Data[f64], name string) &LinReg
LinReg.new returns a new LinReg object Input: data -- x,y data name -- unique name of this (observer) object
fn ParamsReg.new #
fn ParamsReg.new[T](nb_features int) &ParamsReg[T]
ParamsReg.new returns a new object to hold regression parameters
fn Stat.from_data #
fn Stat.from_data[T](mut data Data[T], name string) &Stat[T]
stat returns a new Stat object
fn (Data[T]) set #
fn (mut o Data[T]) set(x &la.Matrix[T], y []T) !
set sets x matrix and y vector [optional] and notify observers Input: x -- x values y -- y values [optional]
fn (Data[T]) set_y #
fn (mut o Data[T]) set_y(y []T) !
fn (Data[T]) set_x #
fn (mut o Data[T]) set_x(x &la.Matrix[T]) !
fn (Data[T]) clone #
fn (o &Data[T]) clone() !&Data[T]
clone returns a deep copy of this object removing the observers
fn (Data[T]) clone_with_same_x #
fn (o &Data[T]) clone_with_same_x() !&Data[T]
clone_with_same_x returns a deep copy of this object, but with the same reference to x removing the observers
fn (Data[T]) add_observer #
fn (mut o Data[T]) add_observer(obs util.Observer)
add_observer adds an object to the list of interested observers
fn (Data[T]) notify_update #
fn (mut o Data[T]) notify_update()
notify_update notifies observers of updates
fn (Data[T]) split #
fn (o &Data[T]) split(ratio f64) !(&Data[T], &Data[T])
split returns a new object with data split into two parts Input: ratio -- ratio of samples to be put in the first part Output: new object
fn (ParamsReg[T]) init #
fn (mut o ParamsReg[T]) init(nb_features int)
init initializes ParamsReg with nb_features (number of features)
fn (ParamsReg[T]) backup #
fn (mut o ParamsReg[T]) backup()
backup creates an internal copy of parameters
fn (ParamsReg[T]) restore #
fn (mut o ParamsReg[T]) restore(skip_notification bool)
restore restores an internal copy of parameters and notifies observers
fn (ParamsReg[T]) set_params #
fn (mut o ParamsReg[T]) set_params(theta []T, b T)
set_params sets theta and b and notifies observers
fn (ParamsReg[T]) set_param #
fn (mut o ParamsReg[T]) set_param(i int, value T)
set_param sets either theta or b (use negative indices for b). Notifies observers i -- index of theta or -1 for bias
fn (ParamsReg[T]) get_param #
fn (o &ParamsReg[T]) get_param(i int) T
get_param returns either theta or b (use negative indices for b) i -- index of theta or -1 for bias
fn (ParamsReg[T]) set_thetas #
fn (mut o ParamsReg[T]) set_thetas(theta []T)
set_thetas sets the whole vector theta and notifies observers
fn (ParamsReg[T]) get_thetas #
fn (o &ParamsReg[T]) get_thetas() []T
get_thetas gets a copy of theta
fn (ParamsReg[T]) access_thetas #
fn (o &ParamsReg[T]) access_thetas() []T
access_thetas returns access (slice) to theta
fn (ParamsReg[T]) access_bias #
fn (o &ParamsReg[T]) access_bias() &T
access_bias returns access (pointer) to b
fn (ParamsReg[T]) set_theta #
fn (mut o ParamsReg[T]) set_theta(i int, thetai T)
set_theta sets one component of theta and notifies observers
fn (ParamsReg[T]) get_theta #
fn (o &ParamsReg[T]) get_theta(i int) T
get_theta returns the value of theta[i]
fn (ParamsReg[T]) set_bias #
fn (mut o ParamsReg[T]) set_bias(b T)
set_bias sets b and notifies observers
fn (ParamsReg[T]) get_bias #
fn (o &ParamsReg[T]) get_bias() T
get_bias gets a copy of b
fn (ParamsReg[T]) set_lambda #
fn (mut o ParamsReg[T]) set_lambda(lambda T)
set_lambda sets lambda and notifies observers
fn (ParamsReg[T]) get_lambda #
fn (o &ParamsReg[T]) get_lambda() T
get_lambda gets a copy of lambda
fn (ParamsReg[T]) set_degree #
fn (mut o ParamsReg[T]) set_degree(p int)
set_degree sets p and notifies observers
fn (ParamsReg[T]) get_degree #
fn (o &ParamsReg[T]) get_degree() int
get_degree gets a copy of p
fn (ParamsReg[T]) add_observer #
fn (mut o ParamsReg[T]) add_observer(obs util.Observer)
add_observer adds an object to the list of interested observers
fn (ParamsReg[T]) notify_update #
fn (mut o ParamsReg[T]) notify_update()
notify_update notifies observers of updates
fn (Stat[T]) name #
fn (o &Stat[T]) name() string
name returns the name of this stat object (thus defining the Observer interface)
fn (Stat[T]) update #
fn (mut o Stat[T]) update()
update compute statistics for given data (an Observer of Data)
fn (Stat[T]) sum_vars #
fn (mut o Stat[T]) sum_vars() ([]T, T)
sum_vars computes the sums along the columns of X and y Output: t -- scalar t = oᵀy sum of columns of the y vector: t = Σ_i^m o_i y_i s -- vector s = Xᵀo sum of columns of the X matrix: s_j = Σ_i^m o_i X_ij [n_features]
fn (Stat[T]) copy_into #
fn (o &Stat[T]) copy_into(mut p Stat[T])
copy_into copies stat into p
fn (Stat[T]) str #
fn (o &Stat[T]) str() string
str is a custom str function for observers to avoid printing data
struct Data #
struct Data[T] {
pub mut:
observable util.Observable = util.Observable{}
nb_samples int // number of data points (samples). number of rows in x and y
nb_features int // number of features. number of columns in x
x &la.Matrix[T] = unsafe { nil } // [nb_samples][nb_features] x values
y []T // [nb_samples] y values [optional]
}
struct KNN #
struct KNN {
mut:
name string // name of this "observer"
data &Data[f64] = unsafe { nil }
weights map[f64]f64 // weights[class] = weight
pub mut:
neighbors []Neighbor
trained bool
}
KNN is the struct defining a K-Nearest Neighbors classifier.
fn (KNN) name #
fn (o &KNN) name() string
name returns the name of this KNN object (thus defining the Observer interface)
fn (KNN) set_weights #
fn (mut knn KNN) set_weights(weights map[f64]f64) !
set_weights will set the weights for the KNN. They default to 1.0 for every class when this function is not called.
fn (KNN) update #
fn (mut knn KNN) update()
update perform updates after data has been changed (as an Observer)
fn (KNN) train #
fn (mut knn KNN) train()
train computes the neighbors and weights during training
fn (KNN) predict #
fn (mut knn KNN) predict(config PredictConfig) !f64
predict will find the k
points nearest to the specified to_pred
. If the value of k
results in a draw - that is, a tie when determining the most frequent class in those k nearest neighbors (example: class 1 has 10 occurrences, class 2 has 5 and class 3 has 10) -, k
will be decreased until there are no more ties. The worst case scenario is k
ending up as 1. Also, it makes sure that if we do have a tie when k = 1, we select the first closest neighbor.
fn (KNN) str #
fn (o &KNN) str() string
str is a custom str function for observers to avoid printing data
fn (KNN) get_plotter #
fn (o &KNN) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct with the data needed to plot the KNN model.
struct Kmeans #
struct Kmeans {
mut:
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x data
stat &Stat[f64] = unsafe { nil } // statistics about x (data)
nb_classes int // expected number of classes
bins &gm.Bins = unsafe { nil } // "bins" to speed up searching for data points given their coordinates (2D or 3D only at the moment)
nb_iter int // number of iterations
pub mut:
classes []int // [nb_samples] indices of classes of each sample
centroids [][]f64 // [nb_classes][nb_features] coordinates of centroids
nb_members []int // [nb_classes] number of members in each class
}
Kmeans implements the K-means model (Observer of Data)
fn (Kmeans) name #
fn (o &Kmeans) name() string
name returns the name of this Kmeans object (thus defining the Observer interface)
fn (Kmeans) update #
fn (mut o Kmeans) update()
update perform updates after data has been changed (as an Observer)
fn (Kmeans) nb_classes #
fn (o &Kmeans) nb_classes() int
nb_classes returns the number of classes
fn (Kmeans) set_centroids #
fn (mut o Kmeans) set_centroids(xc [][]f64)
set_centroids sets centroids; e.g. trial centroids xc -- [nb_class][nb_features]
fn (Kmeans) find_closest_centroids #
fn (mut o Kmeans) find_closest_centroids()
find_closest_centroids finds closest centroids to each sample
fn (Kmeans) compute_centroids #
fn (mut o Kmeans) compute_centroids()
compute_centroids update centroids based on new classes information (from find_closest_centroids)
fn (Kmeans) train #
fn (mut o Kmeans) train(config TrainConfig)
train trains model
fn (Kmeans) str #
fn (o &Kmeans) str() string
str is a custom str function for observers to avoid printing data
fn (Kmeans) get_plotter #
fn (o &Kmeans) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting
struct LinReg #
struct LinReg {
mut:
// main
name string // name of this "observer"
data &Data[f64] = unsafe { nil } // x-y data
// workspace
e []f64 // vector e = b⋅o + x⋅theta - y [nb_samples]
pub mut:
stat &Stat[f64] = unsafe { nil } // statistics
params &ParamsReg[f64] = unsafe { nil }
}
LinReg implements a linear regression model
fn (LinReg) name #
fn (o &LinReg) name() string
name returns the name of this LinReg object (thus defining the Observer interface)
fn (LinReg) predict #
fn (o &LinReg) predict(x []f64) f64
predict returns the model evaluation @ {x;theta,b} Input: x -- vector of features Output: y -- model prediction y(x)
fn (LinReg) cost #
fn (mut o LinReg) cost() f64
cost returns the cost c(x;theta,b) Input: data -- x,y data params -- theta and b x -- vector of features Output: c -- total cost (model error)
fn (LinReg) gradients #
fn (mut o LinReg) gradients() ([]f64, f64)
gradients returns ∂C/∂theta and ∂C/∂b Output: dcdtheta -- ∂C/∂theta dcdb -- ∂C/∂b
fn (LinReg) train #
fn (mut o LinReg) train()
train finds theta and b using closed-form solution Input: data -- x,y data Output: params -- theta and b
fn (LinReg) calce #
fn (mut o LinReg) calce()
calce calculates e vector (save into o.e) Output: e = b⋅o + x⋅theta - y
fn (LinReg) str #
fn (o &LinReg) str() string
str is a custom str function for observers to avoid printing data
fn (LinReg) get_plotter #
fn (o &LinReg) get_plotter() &plot.Plot
get_plotter returns a plot.Plot struct for plotting the data and the linear regression model
struct ParamsReg #
struct ParamsReg[T] {
pub mut:
observable util.Observable
// main
theta []T // theta parameter [nb_features]
bias T // bias parameter
lambda T // regularization parameter
degree int // degree of polynomial
// backup
bkp_theta []T // copy of theta
bkp_bias T // copy of b
bkp_lambda T // copy of lambda
bkp_degree int // copy of degree
}
struct PredictConfig #
struct PredictConfig {
pub:
max_iter int
k int
to_pred []f64
}
data needed for KNN.predict
struct Stat #
struct Stat[T] {
pub mut:
data &Data[T] = unsafe { nil } // data
name string // name of this object
min_x []T // [n_features] min x values
max_x []T // [n_features] max x values
sum_x []T // [n_features] sum of x values
mean_x []T // [n_features] mean of x values
sig_x []T // [n_features] standard deviations of x
del_x []T // [n_features] difference: max(x) - min(x)
min_y T // min of y values
max_y T // max of y values
sum_y T // sum of y values
mean_y T // mean of y values
sig_y T // standard deviation of y
del_y T // difference: max(y) - min(y)
}
Stat holds statistics about data
Note: Stat is an Observer of Data; thus, data.notify_update() will recompute stat
struct TrainConfig #
struct TrainConfig {
pub:
epochs int
tol_norm_change f64
}
- README
- fn Data.from_raw_x
- fn Data.from_raw_xy
- fn Data.from_raw_xy_sep
- fn Data.new
- fn KNN.new
- fn Kmeans.new
- fn LinReg.new
- fn ParamsReg.new
- fn Stat.from_data
- type Data[T]
- type ParamsReg[T]
- type Stat[T]
- struct Data
- struct KNN
- struct Kmeans
- struct LinReg
- struct ParamsReg
- struct PredictConfig
- struct Stat
- struct TrainConfig