-
-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Adds Linear, Ridge and Lasso regressions #10
Conversation
[WIP] Add Linear ridge and lasso regression
Codecov Report
@@ Coverage Diff @@
## master #10 +/- ##
===========================================
- Coverage 96.68% 83.33% -13.36%
===========================================
Files 7 9 +2
Lines 181 210 +29
===========================================
Hits 175 175
- Misses 6 35 +29
Continue to review full report at Codecov.
|
@@ -13,6 +13,8 @@ keywords = ["machine-learning", "linfa", "ai", "ml"] | |||
categories = ["algorithms", "mathematics", "science"] | |||
|
|||
[dependencies] | |||
|
|||
linfa-supervised = {path = "linfa-supervised"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add a version number, or publishing the linfa
crate to crates.io will fail 😉
linfa-supervised = {path = "linfa-supervised"} | |
linfa-supervised = { path = "linfa-supervised", version = "0.1" } |
|
||
[dependencies] | ||
ndarray = { version = "0.13" , features = ["rayon"] } | ||
ndarray-linalg = { version = "0.12", features = ["openblas"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to force the usage of a certain BLAS implementation over the other ones (e.g. intel-mkl
) - I would suggest to use a features-strategy similar to what I put down here: https://github.com/rust-ndarray/ndarray-examples/blob/d7d43e018cade1f8bf0d92220081ba66963bab07/linear_regression/Cargo.toml#L8
let x_hat = array![[6.0], [7.0]]; | ||
println!("{:#?}", linear_regression.predict(&x_hat)); | ||
|
||
let mut linear_regression2 = LinearRegression::new(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest splitting this example (and the one below) in two separate examples, with a descriptive name (e.g. with/without intercept).
It would be ideal to have them in separate files as well under the examples
folder.
use ndarray::{stack, Array, Array1, ArrayBase, Axis, Data, Ix1, Ix2}; | ||
use ndarray_linalg::Solve; | ||
/* I will probably change the implementation for an enum for more type safety. | ||
I have to make sure, it is a great idea when it comes to pyhton interoperability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is considerable freedom in how to wrap the Rust version for Python consumption - as detailed in #8, we shouldn't let Python move our Rust design in directions which are not idiomatic. The wrapping code can do the bridging when required 😀
So I'd definitely suggest to go with the commented out version, which uses an enum (LinearRegression::new(false)
is much more confusing than LinearRegression::new(Intercept::NoIntercept)
).
use ndarray::{Array, Array1, ArrayBase, Data, Ix1, Ix2}; | ||
use ndarray_linalg::Solve; | ||
/* The difference between a linear regression and a Ridge regression is | ||
that ridge regression has an L2 penalisation term to having some features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? avoid having
?
} | ||
|
||
impl RidgeRegression { | ||
pub fn new(alpha: f64) -> RidgeRegression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we have an intercept
parameter here as well?
@@ -0,0 +1,6 @@ | |||
trait GradientDescent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used anywhere?
@@ -0,0 +1,16 @@ | |||
[package] | |||
name = "linfa-supervised" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably prefer this to be named linfa-linear
- we will have many more algorithm belonging to the family of supervised learning and we don't want to have all of them in the same sub-crate.
let dummy_column: Array<f64, _> = Array::ones((n_samples, 1)); | ||
let X = stack(Axis(1), &[dummy_column.view(), X.view()]).unwrap(); | ||
match &self.beta { | ||
None => panic!("The linear regression estimator has to be fitted first!"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ideal - can we refactor the way we build LinearRegression
to make sure that beta
is always there when a user calls predict
?
The easiest way to do this is re-using the same approach I put down in KMeans
: a structure to hold the hyperparameters of the model (built with/without builder pattern, depending on how many hyperparameters we have there) and a fit
method on it that returns a fitted LinearRegression
- basically, using fit
as constructor. In this way we can be sure that beta
is there and we have one less panic
condition 😁
Thanks for working on it @Nimpruda! I see we have some issues in CI due to |
I'll work on it whenever I have time! Thanks for all the feedback |
superseded by #20 |
see #7