Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added GLM #5004

Closed
wants to merge 13 commits into from
Closed

Added GLM #5004

wants to merge 13 commits into from

Conversation

Khalifa1997
Copy link
Contributor

@Khalifa1997 Khalifa1997 commented Apr 12, 2020

in reponse to #5000
The train machine function is currently empty, I will fill it up as soon as I get some feedback on a few things
In my head, I am using a DescendUpdaterWithCorrection/DescendUpdater inorder to have access to the function update_variable inside my class, yet. Which params should I pass to it, I know I could use get_name() to get which specific optimizer this and downcast it to this optimizer but I feel like there is a better way to go around it than this

Edit: I just saw that GLM.h has some extra doxygen comments, I will remove them on my next commit

Copy link
Member

@gf712 gf712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it’s a good start, but need to see how it will be implemented in train_machine!

src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.h Outdated Show resolved Hide resolved
src/shogun/regression/GLM.h Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.h Outdated Show resolved Hide resolved
src/shogun/regression/GLM.h Outdated Show resolved Hide resolved
src/shogun/regression/GLM.h Show resolved Hide resolved
src/shogun/regression/GLM.h Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.h Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.h Outdated Show resolved Hide resolved
NORMAL_DISTRIBUTION,
EXPONENTIAL_DISTRIBUTION,
GAMMA_DISTRIBUTION,
BINOMIAL_DISTRIBUTION,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, I am not very sure of representing these things as enums, this will lead to spaghetti code with that many. BUT you can leave it for now and just focus on the poisson regression. Once that is done we can think about this again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make the GLM class accept a GLMDistribution object instead. It will contain the log-likelihood and gradients of the given distribution (e.g., GLMDistributionPoisson). Then, the GLM class will only call the methods of the GLMDistribution to train itself.

However, this is obviously out of the scope of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that, or have locally defined helper classes that are then instantiated based on an enum.
But the code for single cases (eg posison) should be in a single place (likelihood contribution, gradients, etc)

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good progress from the first version! But they key thing now will be to write a train method. I suggest you add a log_likelihood method first, and then one that computes the gradient of it. And then finally you can call the descent updater in the train method. See wikipedia for the expressions. As said, I would start to do this for Poisson regression first, and then once it works we can modularise it

SG_ADD(&m_family, "family", "family used", ParameterProperties::SETTING);
SG_ADD(
&m_linkfn, "linkfn", "Link function used",
ParameterProperties::SETTING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to initialize all values to defaults here. This brings up an interesting question. What is the default descend updater? I think it would be good to have one set so that users are not forced to pass one (tedious).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since they are just linear models at the end of the day, there isn't really alot of parameters to learn like in Neural Networks, SGD would work best here which I believe is GradientDescendUpdater

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loss is convex (for Poisson regression with log link function), so a second order method like Newton will be better. But we can change that later, best is to start with something simple, then work it up from there.

@Khalifa1997
Copy link
Contributor Author

Khalifa1997 commented Apr 13, 2020

I have updated the class I will try to list all the changes I have made

  1. Changed the ctor structure such that the default values are inline in the class variable declarations so the default ctor problem no longer exists
  2. Removed pointless comments and changed variable naming + added some description to values like alpha/lambda
  3. I tried implementing log_likelihood and I know how, based on this the only problem is I was using SGVector and SGMatrix but since I am going to have to use things like the dot operator maybe using Features class would be the better idea here - and label class for the label- but Idk which type of Features should I be using, DotFeatures?
  4. Have yet to decide a default DescendUpdater

@karlnapf
Copy link
Member

I just read through the code, and many of the things we commented on are not addressed yet.

RE your question about the features, read some other classes, e.g. ridge regression, that will make it clear

@Khalifa1997
Copy link
Contributor Author

I have updated the log_likelihood function such that it takes dense features, and computes the likelihood, as you could tell tho the naming of variables is abit off, so I am open to suggestions in that area. As for the gradient calculation I am using this implementation, however I am not sure if shogun has a sigmoid function or would I have to make a small function for it?

Comment on lines +19 to +22
SG_ADD(
(std::shared_ptr<SGObject>*)&m_descend_updater, "descend_updater",
"Descend Updater used for updating weights",
ParameterProperties::SETTING);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for those lines, I think that's what causing the build to fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts? @karlnapf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get an error when I test as you could tell in the CI, in the SGObject test, so I have a feeling this is what causing it. what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you report such thing, could you please make sure that you copy here the link to the CI line where you think the error is....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is probably that m_descend_updater (DescendUpdater) is never initialised, so you are serialising a nullptr, and I think that might be causing issues... @vigsterkr ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial problems...

src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved
SGVector<float64_t> feature_vector = features->get_feature_vector(i);
// Assume beta is the same as the feature vector
SGVector<float64_t> beta = feature_vector.clone();
// Assume beta0 is the same as the first element in the feature vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't do that, this is a trick to make the math look easier, but we will treat bias and weights separately please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean use bias and m_w for weights?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for example; the point is, you should not assume the bias is the first element of the feature vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgoetz yes I have changed that in my last commit thanks :)

ASSERT(vector_count > 0 && label->get_num_labels() == vector_count)
// Array of Lambdas
SGVector<float64_t> lambda(vector_count);
for (auto i = 0; i < vector_count; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to loop here if using dense features. You can do a matrix vector product

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no log operation in linalg thus I don't think it can be done without a loop

SGVector<float64_t> log_lambda(vector_count);

for (auto i = 0; i < vector_count; i++)
log_lambda.set_element(log(lambda.get_element(i)), i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have element wise log in linalg? if not we should have it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is. I had checked in the documentation and didn't find anything for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see src/shogun/distributions/LinearHMM.cpp for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not in linalg, add it there. We have something for elementwise ops in there iirc.
I don't think LinearHMM is a good place to look .. It is way too old school ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf do I start another PR and add it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not in linalg, add it there. We have something for elementwise ops in there iirc.
I don't think LinearHMM is a good place to look .. It is way too old school ;)

sorry ;)

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done on adding the log likelihood. Though you should vectorize it.

Next step is to implement the gradient of that

@Khalifa1997
Copy link
Contributor Author

Khalifa1997 commented Apr 16, 2020

Now I am going to briefly sum up what I have added here:
the expected result of the log_likelihood_derivative should be of count len(beta) + 1 //for the beta0 element so I have made a result array for those things.
Now I am facing two problems atm

  1. there is no log function in linalg
  2. there is no elementwise division in linalg

now on line 70 in the .cpp file it should of been y*s / q but since there is no linalg I would have to do it without vectorization which would make the code much much more ugly.

So I would love to have some feedback on what to do now thanks!

N.B: implementing grad_beta should be very straight forward once the issues I have mentionned are solved

I misunderstood likelihood to be an array but it's supposed to be a number (per B0,Beta) thus cleaned this up + changed stuff to likelihood_derivative so that it runs
float64_t beta0 = LinearMachine::get_bias();
for (auto i = 0; i < vector_count; i++)
{
SGVector<float64_t> feature_vector = features->get_feature_vector(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for (auto i = 0; i < vector_count; i++)
{
SGVector<float64_t> feature_vector = features->get_feature_vector(i);
float64_t res = linalg::dot(feature_vector, beta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont do a loop here but a matrix vector product

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

const std::shared_ptr<DenseFeatures<float64_t>>& features,
const std::shared_ptr<Labels>& label)
{
auto vector_count = features->get_num_vectors();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good if you could modularise this.
The log likelihood consists of multiple parts, activation, data terms, etc. You could put each of those into a helper method to make it more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will do that in my next commit thanks:)

SGVector<float64_t> q(vector_count);
for (auto i = 0; i < vector_count; i++)
{
q.set_element(log(1 + std::exp(z.get_element(0, i))), i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q[i] z[i]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use linalg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gf712 there is no linalg for log :S,
and by your comment @karlnapf do you mean this is how I should get the elements?

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, great that you have the gradient in place now

You should slowly start thinking about a unit test where you compare likelihood / gradient values with a reference implementation on some fixed simple data

Also made some minor comments


SGVector<float64_t> lambda(vector_count);
SGVector<float64_t> beta = LinearMachine::get_w();
float64_t beta0 = LinearMachine::get_bias();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we override get_w/get_bias? As in is there a particular reason why you specify LinearMachine::?

Copy link
Contributor Author

@Khalifa1997 Khalifa1997 Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I get get_w not declared in this scope without me doing so

Edit: now it works after I fixed some of my includes.. ig :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not so sure how includes changed things...

SGVector<float64_t> q(vector_count);
for (auto i = 0; i < vector_count; i++)
{
q.set_element(log(1 + std::exp(z.get_element(0, i))), i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And use linalg?


SGVector<float64_t> lambda(vector_count);
SGVector<float64_t> beta = LinearMachine::get_w();
float64_t beta0 = LinearMachine::get_bias();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not so sure how includes changed things...

auto feature_count = features->get_num_features();
ASSERT(vector_count > 0 && label->get_num_labels() == vector_count)
SGVector<float64_t> result(vector_count + 1);
SGVector<float64_t> beta = LinearMachine::get_w();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_w() should just work.

ASSERT(vector_count > 0 && label->get_num_labels() == vector_count)
SGVector<float64_t> result(vector_count + 1);
SGVector<float64_t> beta = LinearMachine::get_w();
float64_t beta0 = LinearMachine::get_bias();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

SGVector<float64_t> result(vector_count + 1);
SGVector<float64_t> beta = LinearMachine::get_w();
float64_t beta0 = LinearMachine::get_bias();
SGMatrix<float64_t> z = linalg::matrix_prod(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

{
q[i] = log(1 + exponent[i]);
}
float64_t beta0_grad =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

m_link_fn = link_fn;
m_descend_updater = descend_updater;
m_family = family;
init();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it once you've fixed the ctor i've mentioned above.

GLM(const std::shared_ptr<DescendUpdater>& descend_updater,
DistributionFamily family, LinkFunction link_fn, float64_t tau);

virtual ~GLM(){};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have explicitly stated that these should be override instead of virtual. you have marked this resolved, but it has not been changed at all....

* @param data training data
* @return whether training was successful
*/
virtual bool train_machine(std::shared_ptr<Features> data = NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still protected... and as said before if you would be using override then you would get an error actually

};

/** @return object name */
virtual const char* get_name() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override...

MACHINE_PROBLEM_TYPE(PT_REGRESSION);

GLM();
float64_t log_likelihood(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is nitpicking but still: ctors should be first and then class methods... although i'm not sure whether these really need to be public?

Added Autos wherever possible + changes to ctor + override
const std::shared_ptr<DenseFeatures<float64_t>>& features,
const std::shared_ptr<Labels>& label);

virtual bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just drop virtual

};

/** @return object name */
virtual const char* get_name() const override
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can drop virtual here as well.... but get_name is public.

GLM(const std::shared_ptr<DescendUpdater>& descend_updater,
DistributionFamily family, LinkFunction link_fn, float64_t tau);

virtual ~GLM() override{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can drop virtual...

auto exponent = linalg::exponent(res);
for (auto i = 0; i < vector_count; i++)
{
lambda[i] = log(1 + exponent[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::

@gf712
Copy link
Member

gf712 commented Aug 4, 2020

Implemented in #5005

@gf712 gf712 closed this Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants