Added GLM #5004

Khalifa1997 · 2020-04-12T00:28:37Z

in reponse to #5000
The train machine function is currently empty, I will fill it up as soon as I get some feedback on a few things
In my head, I am using a DescendUpdaterWithCorrection/DescendUpdater inorder to have access to the function update_variable inside my class, yet. Which params should I pass to it, I know I could use get_name() to get which specific optimizer this and downcast it to this optimizer but I feel like there is a better way to go around it than this

Edit: I just saw that GLM.h has some extra doxygen comments, I will remove them on my next commit

gf712

I think it’s a good start, but need to see how it will be implemented in train_machine!

src/shogun/regression/GLM.cpp

src/shogun/regression/GLM.h

src/shogun/regression/GLM.cpp

src/shogun/regression/GLM.h

src/shogun/regression/GLM.cpp

src/shogun/regression/GLM.h

src/shogun/regression/GLM.cpp

src/shogun/regression/GLM.h

karlnapf · 2020-04-12T13:17:20Z

src/shogun/regression/GLM.h

+		NORMAL_DISTRIBUTION,
+		EXPONENTIAL_DISTRIBUTION,
+		GAMMA_DISTRIBUTION,
+		BINOMIAL_DISTRIBUTION,


As I said before, I am not very sure of representing these things as enums, this will lead to spaghetti code with that many. BUT you can leave it for now and just focus on the poisson regression. Once that is done we can think about this again

We could make the GLM class accept a GLMDistribution object instead. It will contain the log-likelihood and gradients of the given distribution (e.g., GLMDistributionPoisson). Then, the GLM class will only call the methods of the GLMDistribution to train itself.

However, this is obviously out of the scope of this PR.

that, or have locally defined helper classes that are then instantiated based on an enum.
But the code for single cases (eg posison) should be in a single place (likelihood contribution, gradients, etc)

src/shogun/regression/GLM.h

src/shogun/regression/GLM.cpp

src/shogun/regression/GLM.h

karlnapf

This is good progress from the first version! But they key thing now will be to write a train method. I suggest you add a log_likelihood method first, and then one that computes the gradient of it. And then finally you can call the descent updater in the train method. See wikipedia for the expressions. As said, I would start to do this for Poisson regression first, and then once it works we can modularise it

src/shogun/regression/GLM.h

src/shogun/regression/GLM.cpp

karlnapf · 2020-04-12T16:20:04Z

src/shogun/regression/GLM.cpp

+	SG_ADD(&m_family, "family", "family used", ParameterProperties::SETTING);
+	SG_ADD(
+	    &m_linkfn, "linkfn", "Link function used",
+	    ParameterProperties::SETTING);


you need to initialize all values to defaults here. This brings up an interesting question. What is the default descend updater? I think it would be good to have one set so that users are not forced to pass one (tedious).

I think since they are just linear models at the end of the day, there isn't really alot of parameters to learn like in Neural Networks, SGD would work best here which I believe is GradientDescendUpdater

The loss is convex (for Poisson regression with log link function), so a second order method like Newton will be better. But we can change that later, best is to start with something simple, then work it up from there.

src/shogun/regression/GLM.cpp

Khalifa1997 · 2020-04-13T03:10:33Z

I have updated the class I will try to list all the changes I have made

Changed the ctor structure such that the default values are inline in the class variable declarations so the default ctor problem no longer exists
Removed pointless comments and changed variable naming + added some description to values like alpha/lambda
I tried implementing log_likelihood and I know how, based on this the only problem is I was using SGVector and SGMatrix but since I am going to have to use things like the dot operator maybe using Features class would be the better idea here - and label class for the label- but Idk which type of Features should I be using, DotFeatures?
Have yet to decide a default DescendUpdater

src/shogun/regression/GLM.cpp

karlnapf · 2020-04-13T11:17:38Z

I just read through the code, and many of the things we commented on are not addressed yet.

RE your question about the features, read some other classes, e.g. ridge regression, that will make it clear

Khalifa1997 · 2020-04-14T05:29:46Z

I have updated the log_likelihood function such that it takes dense features, and computes the likelihood, as you could tell tho the naming of variables is abit off, so I am open to suggestions in that area. As for the gradient calculation I am using this implementation, however I am not sure if shogun has a sigmoid function or would I have to make a small function for it?

Khalifa1997 · 2020-04-14T05:33:04Z

src/shogun/regression/GLM.cpp

+	SG_ADD(
+	    (std::shared_ptr<SGObject>*)&m_descend_updater, "descend_updater",
+	    "Descend Updater used for updating weights",
+	    ParameterProperties::SETTING);


as for those lines, I think that's what causing the build to fail

any thoughts? @karlnapf

Not sure what you mean

I get an error when I test as you could tell in the CI, in the SGObject test, so I have a feeling this is what causing it. what do you think?

when you report such thing, could you please make sure that you copy here the link to the CI line where you think the error is....

The issue is probably that m_descend_updater (DescendUpdater) is never initialised, so you are serialising a nullptr, and I think that might be causing issues... @vigsterkr ?

Hence https://dev.azure.com/shogunml/shogun/_build/results?buildId=3277&view=logs&j=c857963e-3b4d-5192-8776-ae81779de3af&t=542541c1-2650-5e44-f82f-a76d8c4f5255&l=216

mmm but this should catch that:
https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/io/serialization/JsonSerializer.cpp#L178

vigsterkr

initial problems...

src/shogun/regression/GLM.cpp

karlnapf · 2020-04-14T17:47:24Z

src/shogun/regression/GLM.cpp

+		SGVector<float64_t> feature_vector = features->get_feature_vector(i);
+		// Assume beta is the same as the feature vector
+		SGVector<float64_t> beta = feature_vector.clone();
+		// Assume beta0 is the same as the first element in the feature vector


we don't do that, this is a trick to make the math look easier, but we will treat bias and weights separately please

you mean use bias and m_w for weights?

yes, for example; the point is, you should not assume the bias is the first element of the feature vector

@lgoetz yes I have changed that in my last commit thanks :)

karlnapf · 2020-04-14T17:48:12Z

src/shogun/regression/GLM.cpp

+	ASSERT(vector_count > 0 && label->get_num_labels() == vector_count)
+	// Array of Lambdas
+	SGVector<float64_t> lambda(vector_count);
+	for (auto i = 0; i < vector_count; i++)


you don't need to loop here if using dense features. You can do a matrix vector product

there is no log operation in linalg thus I don't think it can be done without a loop

karlnapf · 2020-04-14T17:48:42Z

src/shogun/regression/GLM.cpp

+	SGVector<float64_t> log_lambda(vector_count);
+
+	for (auto i = 0; i < vector_count; i++)
+		log_lambda.set_element(log(lambda.get_element(i)), i);


don't we have element wise log in linalg? if not we should have it

I don't think there is. I had checked in the documentation and didn't find anything for it

see src/shogun/distributions/LinearHMM.cpp for example

If it is not in linalg, add it there. We have something for elementwise ops in there iirc.
I don't think LinearHMM is a good place to look .. It is way too old school ;)

@karlnapf do I start another PR and add it?

If it is not in linalg, add it there. We have something for elementwise ops in there iirc.
I don't think LinearHMM is a good place to look .. It is way too old school ;)

sorry ;)

src/shogun/regression/GLM.cpp

karlnapf

well done on adding the log likelihood. Though you should vectorize it.

Next step is to implement the gradient of that

Only using tau to stay consistent with the naming convention of other regression models

Khalifa1997 · 2020-04-16T03:12:14Z

Now I am going to briefly sum up what I have added here:
the expected result of the log_likelihood_derivative should be of count len(beta) + 1 //for the beta0 element so I have made a result array for those things.
Now I am facing two problems atm

there is no log function in linalg
there is no elementwise division in linalg

now on line 70 in the .cpp file it should of been y*s / q but since there is no linalg I would have to do it without vectorization which would make the code much much more ugly.

So I would love to have some feedback on what to do now thanks!

N.B: implementing grad_beta should be very straight forward once the issues I have mentionned are solved

I misunderstood likelihood to be an array but it's supposed to be a number (per B0,Beta) thus cleaned this up + changed stuff to likelihood_derivative so that it runs

karlnapf · 2020-04-16T17:12:13Z

src/shogun/regression/GLM.cpp

+	float64_t beta0 = LinearMachine::get_bias();
+	for (auto i = 0; i < vector_count; i++)
+	{
+		SGVector<float64_t> feature_vector = features->get_feature_vector(i);


karlnapf · 2020-04-16T17:13:24Z

src/shogun/regression/GLM.cpp

+	for (auto i = 0; i < vector_count; i++)
+	{
+		SGVector<float64_t> feature_vector = features->get_feature_vector(i);
+		float64_t res = linalg::dot(feature_vector, beta);


dont do a loop here but a matrix vector product

karlnapf · 2020-04-16T17:14:22Z

src/shogun/regression/GLM.cpp

+    const std::shared_ptr<DenseFeatures<float64_t>>& features,
+    const std::shared_ptr<Labels>& label)
+{
+	auto vector_count = features->get_num_vectors();


it would be good if you could modularise this.
The log likelihood consists of multiple parts, activation, data terms, etc. You could put each of those into a helper method to make it more readable

Okay will do that in my next commit thanks:)

karlnapf · 2020-04-16T17:14:58Z

src/shogun/regression/GLM.cpp

+	SGVector<float64_t> q(vector_count);
+	for (auto i = 0; i < vector_count; i++)
+	{
+		q.set_element(log(1 + std::exp(z.get_element(0, i))), i);


And use linalg?

@gf712 there is no linalg for log :S,
and by your comment @karlnapf do you mean this is how I should get the elements?

karlnapf

Cool, great that you have the gradient in place now

You should slowly start thinking about a unit test where you compare likelihood / gradient values with a reference implementation on some fixed simple data

Also made some minor comments

gf712 · 2020-04-16T19:05:38Z

src/shogun/regression/GLM.cpp

+
+	SGVector<float64_t> lambda(vector_count);
+	SGVector<float64_t> beta = LinearMachine::get_w();
+	float64_t beta0 = LinearMachine::get_bias();


Do we override get_w/get_bias? As in is there a particular reason why you specify LinearMachine::?

For some reason, I get get_w not declared in this scope without me doing so

Edit: now it works after I fixed some of my includes.. ig :D

i'm not so sure how includes changed things...

gf712 · 2020-04-16T19:08:11Z

src/shogun/regression/GLM.cpp

+	SGVector<float64_t> q(vector_count);
+	for (auto i = 0; i < vector_count; i++)
+	{
+		q.set_element(log(1 + std::exp(z.get_element(0, i))), i);


And use linalg?

vigsterkr · 2020-04-17T05:12:26Z

src/shogun/regression/GLM.cpp

+
+	SGVector<float64_t> lambda(vector_count);
+	SGVector<float64_t> beta = LinearMachine::get_w();
+	float64_t beta0 = LinearMachine::get_bias();


i'm not so sure how includes changed things...

vigsterkr · 2020-04-17T05:13:35Z

src/shogun/regression/GLM.cpp

+	auto feature_count = features->get_num_features();
+	ASSERT(vector_count > 0 && label->get_num_labels() == vector_count)
+	SGVector<float64_t> result(vector_count + 1);
+	SGVector<float64_t> beta = LinearMachine::get_w();


get_w() should just work.

vigsterkr · 2020-04-17T05:13:53Z

src/shogun/regression/GLM.cpp

+	ASSERT(vector_count > 0 && label->get_num_labels() == vector_count)
+	SGVector<float64_t> result(vector_count + 1);
+	SGVector<float64_t> beta = LinearMachine::get_w();
+	float64_t beta0 = LinearMachine::get_bias();


vigsterkr · 2020-04-17T05:13:59Z

src/shogun/regression/GLM.cpp

+	SGVector<float64_t> result(vector_count + 1);
+	SGVector<float64_t> beta = LinearMachine::get_w();
+	float64_t beta0 = LinearMachine::get_bias();
+	SGMatrix<float64_t> z = linalg::matrix_prod(


vigsterkr · 2020-04-17T05:14:13Z

src/shogun/regression/GLM.cpp

+	{
+		q[i] = log(1 + exponent[i]);
+	}
+	float64_t beta0_grad =


vigsterkr · 2020-04-17T05:18:27Z

src/shogun/regression/GLM.cpp

+	m_link_fn = link_fn;
+	m_descend_updater = descend_updater;
+	m_family = family;
+	init();


remove it once you've fixed the ctor i've mentioned above.

vigsterkr · 2020-04-17T05:19:19Z

src/shogun/regression/GLM.h

+		GLM(const std::shared_ptr<DescendUpdater>& descend_updater,
+		    DistributionFamily family, LinkFunction link_fn, float64_t tau);
+
+		virtual ~GLM(){};


i have explicitly stated that these should be override instead of virtual. you have marked this resolved, but it has not been changed at all....

vigsterkr · 2020-04-17T05:19:51Z

src/shogun/regression/GLM.h

+		 * @param data training data
+		 * @return whether training was successful
+		 */
+		virtual bool train_machine(std::shared_ptr<Features> data = NULL)


it's still protected... and as said before if you would be using override then you would get an error actually

vigsterkr · 2020-04-17T05:20:01Z

src/shogun/regression/GLM.h

+		};
+
+		/** @return object name */
+		virtual const char* get_name() const


override...

vigsterkr · 2020-04-17T05:21:20Z

src/shogun/regression/GLM.h

+		MACHINE_PROBLEM_TYPE(PT_REGRESSION);
+
+		GLM();
+		float64_t log_likelihood(


this is nitpicking but still: ctors should be first and then class methods... although i'm not sure whether these really need to be public?

Added Autos wherever possible + changes to ctor + override

vigsterkr · 2020-04-17T06:22:03Z

src/shogun/regression/GLM.h

+		    const std::shared_ptr<DenseFeatures<float64_t>>& features,
+		    const std::shared_ptr<Labels>& label);
+
+		virtual bool


you can just drop virtual

vigsterkr · 2020-04-17T06:22:33Z

src/shogun/regression/GLM.h

+		};
+
+		/** @return object name */
+		virtual const char* get_name() const override


you can drop virtual here as well.... but get_name is public.

vigsterkr · 2020-04-17T06:22:48Z

src/shogun/regression/GLM.h

+		GLM(const std::shared_ptr<DescendUpdater>& descend_updater,
+		    DistributionFamily family, LinkFunction link_fn, float64_t tau);
+
+		virtual ~GLM() override{};


you can drop virtual...

karlnapf · 2020-04-17T17:24:00Z

src/shogun/regression/GLM.cpp

+	auto exponent = linalg::exponent(res);
+	for (auto i = 0; i < vector_count; i++)
+	{
+		lambda[i] = log(1 + exponent[i]);


gf712 · 2020-08-04T10:22:51Z

Implemented in #5005

Added GLM

1ed0589

gf712 reviewed Apr 12, 2020

View reviewed changes

vigsterkr requested changes Apr 12, 2020

View reviewed changes

Requested changes

3753768

vigsterkr requested changes Apr 12, 2020

View reviewed changes

src/shogun/regression/GLM.cpp Outdated Show resolved Hide resolved

src/shogun/regression/GLM.h Outdated Show resolved Hide resolved

added const std::shared_ptr<DescendUpdater>&

dbde48d