Skip to content

ml #

VSL Machine Learning (vsl.ml)

VSL aims to provide a robust set of tools for scientific computing with an emphasis on performance and ease of use. In the vsl.ml module, some machine learning models are designed as observers of data, meaning they re-train automatically when data changes, while others do not require this functionality.

Key Features

  • Observers of Data: Some machine learning models in VSL act as observers,re-training automatically when data changes.- High Performance: Leverages V’s performance optimizations and can integratewith C and Fortran libraries like Open BLAS and LAPACK.- Versatile Algorithms: Supports a variety of machine learning algorithms andmodels.

Usage

Loading Data

The Data struct in vsl.ml is designed to hold data in matrix format for machine learning tasks. Here's a brief overview of how to use it:

Creating a Data Object

You can create a Data object using the following methods:

  • Data.new: Creates a new Data object with specified dimensions.
  • Data.from_raw_x: Creates a Data object from raw x values (without y values).
  • Data.from_raw_xy: Creates a Data object from raw x and y values combined in a single matrix.
  • Data.from_raw_xy_sep: Creates a Data object from separate x and y raw values.

Data Methods

The Data struct has several key methods to manage and manipulate data:

  • set(x, y): Sets the x matrix and y vector and notifies observers.
  • set_y(y): Sets the y vector and notifies observers.
  • set_x(x): Sets the x matrix and notifies observers.
  • split(ratio): Splits the data into two parts based on the given ratio.
  • clone(): Returns a deep copy of the Data object without observers.
  • clone_with_same_x(): Returns a deep copy of the Data object but shares the same x reference.
  • add_observer(obs): Adds an observer to the data object.
  • notify_update(): Notifies observers of data changes.

Stat Observer

The Stat struct is an observer of Data, providing statistical analysis of the data it observes. It automatically updates its statistics when the underlying data changes.

Observer Models

The following machine learning models in VSL are compatible with the Observer pattern. This means they can observe data changes and automatically update themselves.

K-Means Clustering

K-Means Clustering is used for unsupervised learning to group data points into clusters. As an observer model, it re-trains automatically when the data changes, which is useful for dynamic datasets that require continuous updates.

K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) is used for classification tasks where the target variable is categorical. As an observer model, it re-trains automatically when the data changes, which is beneficial for datasets that are frequently updated.

Non-Observer Models

The following machine learning models in VSL do not require the observer pattern and are trained once on a dataset without continuous updates.

Linear Regression

Linear Regression is used for predicting a continuous target variable based on one or more predictor variables. It is typically trained once on a dataset and used to make predictions without requiring continuous updates. Hence, it is not implemented as an observer model.

fn Data.from_raw_x #

fn Data.from_raw_x[T](xraw [][]T) !&Data[T]

Data.from_raw_x returns a new object with data set from raw x values Input: xraw -- [nb_samples][nb_features] table with x values (NO y values) Output: new object

fn Data.from_raw_xy #

fn Data.from_raw_xy[T](xyraw [][]T) !&Data[T]

Data.from_raw_xy returns a new object with data set from raw Xy values Input: Xyraw -- [nb_samples][nb_features+1] table with x and y raw values, where the last column contains y-values Output: new object

fn Data.from_raw_xy_sep #

fn Data.from_raw_xy_sep[T](xraw [][]T, yraw []T) !&Data[T]

Data.from_raw_xy_sep accepts two parameters: xraw [][]T and yraw []T. It acts similarly to Data.from_raw_xy, but instead of using the last column of xraw as the y data, it uses yraw instead.

fn Data.new #

fn Data.new[T](nb_samples int, nb_features int, use_y bool, allocate bool) !&Data[T]

Data.new returns a new object to hold ML data Input: nb_samples -- number of data samples (rows in x) nb_features -- number of features (columns in x) use_y -- use y data vector allocate -- allocates x (and y); otherwise, x and y must be set using set() method Output: new object

fn KNN.new #

fn KNN.new(mut data Data[f64], name string) !&KNN

KNN.new accepts a vml.ml.Data parameter called data, that will be used to predict values with KNN.predict. You can use the following piece of code to make your life easier:mut knn := KNN.new(mut Data.from_raw_xy_sep([[0.0, 0.0], [10.0, 10.0]], [0.0, 1.0]))If you predict with knn.predict(1, [9.0, 9.0]), it should return 1.0 as it is the closestto [10.0, 10.0] (which is class 1.0).

fn Kmeans.new #

fn Kmeans.new(mut data Data[f64], nb_classes int, name string) &Kmeans

Kmeans.new returns a new K-means model

fn LinReg.new #

fn LinReg.new(mut data Data[f64], name string) &LinReg

LinReg.new returns a new LinReg object Input: data -- x,y data name -- unique name of this (observer) object

fn ParamsReg.new #

fn ParamsReg.new[T](nb_features int) &ParamsReg[T]

ParamsReg.new returns a new object to hold regression parameters

fn Stat.from_data #

fn Stat.from_data[T](mut data Data[T], name string) &Stat[T]

stat returns a new Stat object

fn (Data[T]) set #

fn (mut o Data[T]) set(x &la.Matrix[T], y []T) !

set sets x matrix and y vector [optional] and notify observers Input: x -- x values y -- y values [optional]

fn (Data[T]) set_y #

fn (mut o Data[T]) set_y(y []T) !

fn (Data[T]) set_x #

fn (mut o Data[T]) set_x(x &la.Matrix[T]) !

fn (Data[T]) clone #

fn (o &Data[T]) clone() !&Data[T]

clone returns a deep copy of this object removing the observers

fn (Data[T]) clone_with_same_x #

fn (o &Data[T]) clone_with_same_x() !&Data[T]

clone_with_same_x returns a deep copy of this object, but with the same reference to x removing the observers

fn (Data[T]) add_observer #

fn (mut o Data[T]) add_observer(obs util.Observer)

add_observer adds an object to the list of interested observers

fn (Data[T]) notify_update #

fn (mut o Data[T]) notify_update()

notify_update notifies observers of updates

fn (Data[T]) split #

fn (o &Data[T]) split(ratio f64) !(&Data[T], &Data[T])

split returns a new object with data split into two parts Input: ratio -- ratio of samples to be put in the first part Output: new object

fn (ParamsReg[T]) init #

fn (mut o ParamsReg[T]) init(nb_features int)

init initializes ParamsReg with nb_features (number of features)

fn (ParamsReg[T]) backup #

fn (mut o ParamsReg[T]) backup()

backup creates an internal copy of parameters

fn (ParamsReg[T]) restore #

fn (mut o ParamsReg[T]) restore(skip_notification bool)

restore restores an internal copy of parameters and notifies observers

fn (ParamsReg[T]) set_params #

fn (mut o ParamsReg[T]) set_params(theta []T, b T)

set_params sets theta and b and notifies observers

fn (ParamsReg[T]) set_param #

fn (mut o ParamsReg[T]) set_param(i int, value T)

set_param sets either theta or b (use negative indices for b). Notifies observers i -- index of theta or -1 for bias

fn (ParamsReg[T]) get_param #

fn (o &ParamsReg[T]) get_param(i int) T

get_param returns either theta or b (use negative indices for b) i -- index of theta or -1 for bias

fn (ParamsReg[T]) set_thetas #

fn (mut o ParamsReg[T]) set_thetas(theta []T)

set_thetas sets the whole vector theta and notifies observers

fn (ParamsReg[T]) get_thetas #

fn (o &ParamsReg[T]) get_thetas() []T

get_thetas gets a copy of theta

fn (ParamsReg[T]) access_thetas #

fn (o &ParamsReg[T]) access_thetas() []T

access_thetas returns access (slice) to theta

fn (ParamsReg[T]) access_bias #

fn (o &ParamsReg[T]) access_bias() &T

access_bias returns access (pointer) to b

fn (ParamsReg[T]) set_theta #

fn (mut o ParamsReg[T]) set_theta(i int, thetai T)

set_theta sets one component of theta and notifies observers

fn (ParamsReg[T]) get_theta #

fn (o &ParamsReg[T]) get_theta(i int) T

get_theta returns the value of theta[i]

fn (ParamsReg[T]) set_bias #

fn (mut o ParamsReg[T]) set_bias(b T)

set_bias sets b and notifies observers

fn (ParamsReg[T]) get_bias #

fn (o &ParamsReg[T]) get_bias() T

get_bias gets a copy of b

fn (ParamsReg[T]) set_lambda #

fn (mut o ParamsReg[T]) set_lambda(lambda T)

set_lambda sets lambda and notifies observers

fn (ParamsReg[T]) get_lambda #

fn (o &ParamsReg[T]) get_lambda() T

get_lambda gets a copy of lambda

fn (ParamsReg[T]) set_degree #

fn (mut o ParamsReg[T]) set_degree(p int)

set_degree sets p and notifies observers

fn (ParamsReg[T]) get_degree #

fn (o &ParamsReg[T]) get_degree() int

get_degree gets a copy of p

fn (ParamsReg[T]) add_observer #

fn (mut o ParamsReg[T]) add_observer(obs util.Observer)

add_observer adds an object to the list of interested observers

fn (ParamsReg[T]) notify_update #

fn (mut o ParamsReg[T]) notify_update()

notify_update notifies observers of updates

fn (Stat[T]) name #

fn (o &Stat[T]) name() string

name returns the name of this stat object (thus defining the Observer interface)

fn (Stat[T]) update #

fn (mut o Stat[T]) update()

update compute statistics for given data (an Observer of Data)

fn (Stat[T]) sum_vars #

fn (mut o Stat[T]) sum_vars() ([]T, T)

sum_vars computes the sums along the columns of X and y Output: t -- scalar t = oᵀy sum of columns of the y vector: t = Σ_i^m o_i y_i s -- vector s = Xᵀo sum of columns of the X matrix: s_j = Σ_i^m o_i X_ij [n_features]

fn (Stat[T]) copy_into #

fn (o &Stat[T]) copy_into(mut p Stat[T])

copy_into copies stat into p

fn (Stat[T]) str #

fn (o &Stat[T]) str() string

str is a custom str function for observers to avoid printing data

struct Data #

@[heap]
struct Data[T] {
pub mut:
	observable  util.Observable = util.Observable{}
	nb_samples  int // number of data points (samples). number of rows in x and y
	nb_features int // number of features. number of columns in x
	x           &la.Matrix[T] = unsafe { nil } // [nb_samples][nb_features] x values
	y           []T // [nb_samples] y values [optional]
}

struct KNN #

@[heap]
struct KNN {
mut:
	name    string // name of this "observer"
	data    &Data[f64] = unsafe { nil }
	weights map[f64]f64 // weights[class] = weight
pub mut:
	neighbors []Neighbor
	trained   bool
}

KNN is the struct defining a K-Nearest Neighbors classifier.

fn (KNN) name #

fn (o &KNN) name() string

name returns the name of this KNN object (thus defining the Observer interface)

fn (KNN) set_weights #

fn (mut knn KNN) set_weights(weights map[f64]f64) !

set_weights will set the weights for the KNN. They default to 1.0 for every class when this function is not called.

fn (KNN) update #

fn (mut knn KNN) update()

update perform updates after data has been changed (as an Observer)

fn (KNN) train #

fn (mut knn KNN) train()

train computes the neighbors and weights during training

fn (KNN) predict #

fn (mut knn KNN) predict(config PredictConfig) !f64

predict will find the k points nearest to the specified to_pred. If the value of k results in a draw - that is, a tie when determining the most frequent class in those k nearest neighbors (example: class 1 has 10 occurrences, class 2 has 5 and class 3 has 10) -, k will be decreased until there are no more ties. The worst case scenario is k ending up as 1. Also, it makes sure that if we do have a tie when k = 1, we select the first closest neighbor.

fn (KNN) str #

fn (o &KNN) str() string

str is a custom str function for observers to avoid printing data

fn (KNN) get_plotter #

fn (o &KNN) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct with the data needed to plot the KNN model.

struct Kmeans #

@[heap]
struct Kmeans {
mut:
	name       string // name of this "observer"
	data       &Data[f64] = unsafe { nil } // x data
	stat       &Stat[f64] = unsafe { nil } // statistics about x (data)
	nb_classes int // expected number of classes
	bins       &gm.Bins = unsafe { nil } // "bins" to speed up searching for data points given their coordinates (2D or 3D only at the moment)
	nb_iter    int // number of iterations
pub mut:
	classes    []int   // [nb_samples] indices of classes of each sample
	centroids  [][]f64 // [nb_classes][nb_features] coordinates of centroids
	nb_members []int   // [nb_classes] number of members in each class
}

Kmeans implements the K-means model (Observer of Data)

fn (Kmeans) name #

fn (o &Kmeans) name() string

name returns the name of this Kmeans object (thus defining the Observer interface)

fn (Kmeans) update #

fn (mut o Kmeans) update()

update perform updates after data has been changed (as an Observer)

fn (Kmeans) nb_classes #

fn (o &Kmeans) nb_classes() int

nb_classes returns the number of classes

fn (Kmeans) set_centroids #

fn (mut o Kmeans) set_centroids(xc [][]f64)

set_centroids sets centroids; e.g. trial centroids xc -- [nb_class][nb_features]

fn (Kmeans) find_closest_centroids #

fn (mut o Kmeans) find_closest_centroids()

find_closest_centroids finds closest centroids to each sample

fn (Kmeans) compute_centroids #

fn (mut o Kmeans) compute_centroids()

compute_centroids update centroids based on new classes information (from find_closest_centroids)

fn (Kmeans) train #

fn (mut o Kmeans) train(config TrainConfig)

train trains model

fn (Kmeans) str #

fn (o &Kmeans) str() string

str is a custom str function for observers to avoid printing data

fn (Kmeans) get_plotter #

fn (o &Kmeans) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting

struct LinReg #

@[heap]
struct LinReg {
mut:
	// main
	name string // name of this "observer"
	data &Data[f64] = unsafe { nil } // x-y data
	// workspace
	e []f64 // vector e = b⋅o + x⋅theta - y [nb_samples]
pub mut:
	stat   &Stat[f64]      = unsafe { nil } // statistics
	params &ParamsReg[f64] = unsafe { nil }
}

LinReg implements a linear regression model

fn (LinReg) name #

fn (o &LinReg) name() string

name returns the name of this LinReg object (thus defining the Observer interface)

fn (LinReg) predict #

fn (o &LinReg) predict(x []f64) f64

predict returns the model evaluation @ {x;theta,b} Input: x -- vector of features Output: y -- model prediction y(x)

fn (LinReg) cost #

fn (mut o LinReg) cost() f64

cost returns the cost c(x;theta,b) Input: data -- x,y data params -- theta and b x -- vector of features Output: c -- total cost (model error)

fn (LinReg) gradients #

fn (mut o LinReg) gradients() ([]f64, f64)

gradients returns ∂C/∂theta and ∂C/∂b Output: dcdtheta -- ∂C/∂theta dcdb -- ∂C/∂b

fn (LinReg) train #

fn (mut o LinReg) train()

train finds theta and b using closed-form solution Input: data -- x,y data Output: params -- theta and b

fn (LinReg) calce #

fn (mut o LinReg) calce()

calce calculates e vector (save into o.e) Output: e = b⋅o + x⋅theta - y

fn (LinReg) str #

fn (o &LinReg) str() string

str is a custom str function for observers to avoid printing data

fn (LinReg) get_plotter #

fn (o &LinReg) get_plotter() &plot.Plot

get_plotter returns a plot.Plot struct for plotting the data and the linear regression model

struct ParamsReg #

@[heap]
struct ParamsReg[T] {
pub mut:
	observable util.Observable
	// main
	theta  []T // theta parameter [nb_features]
	bias   T   // bias parameter
	lambda T   // regularization parameter
	degree int // degree of polynomial
	// backup
	bkp_theta  []T // copy of theta
	bkp_bias   T   // copy of b
	bkp_lambda T   // copy of lambda
	bkp_degree int // copy of degree
}

struct PredictConfig #

struct PredictConfig {
pub:
	max_iter int
	k        int
	to_pred  []f64
}

data needed for KNN.predict

struct Stat #

@[heap]
struct Stat[T] {
pub mut:
	data   &Data[T] = unsafe { nil } // data
	name   string // name of this object
	min_x  []T    // [n_features] min x values
	max_x  []T    // [n_features] max x values
	sum_x  []T    // [n_features] sum of x values
	mean_x []T    // [n_features] mean of x values
	sig_x  []T    // [n_features] standard deviations of x
	del_x  []T    // [n_features] difference: max(x) - min(x)
	min_y  T      // min of y values
	max_y  T      // max of y values
	sum_y  T      // sum of y values
	mean_y T      // mean of y values
	sig_y  T      // standard deviation of y
	del_y  T      // difference: max(y) - min(y)
}

Stat holds statistics about data

Note: Stat is an Observer of Data; thus, data.notify_update() will recompute stat

struct TrainConfig #

struct TrainConfig {
pub:
	epochs          int
	tol_norm_change f64
}