Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add more parameters to control fitting, and add data checks #24

Merged
merged 2 commits into from
Feb 22, 2022

Conversation

OkonSamuel
Copy link
Member

This PR add several checks on the dataset. One of such checks is to make sure that the dataset doesn't contain more features than observations (i.e the dataset isn't wide). This PR also throws an informative error message for some hidden errors due to flawed offsetcol logic. For example, users were previously allowed (maybe unintentionally) to choose an offset column that wasn't among the features of the training dataset, leading to confusing errors.

cc. @ablaom , @rikhuijzer

@OkonSamuel
Copy link
Member Author

I feel that using @formula syntax here is unnecessary as currently one has to

  1. materialize a matrix, append the target
  2. convert the appended matrix back to a table
    then pass to the GLM lm/glm call which
  3. Materializes the appended matrix.
  4. effectively splits the target vector from the feature_matrix.

I find it unnecessary to use @formula syntax when we could just have done (1) above then passed it to the GLM lm/glm call which doesn't need to perform (3) and (4) above.
Although the main advantage of the @formula syntax is it's flexibility and the ease of use it provides the user in selecting combination of features to fit on, this doesn't apply here as the @formula syntax isn't currently exposed to the end user. Also for simple selection of features MLJ has several Transformers to do this.
As mentioned in JuliaAI/MLJModels.jl#406 (comment), the @formula syntax could make transformers easier to use.
So the @formula syntax should be dropped from this package as soon as JuliaAI/MLJModels.jl#406 (comment) is resolved.

@ablaom ablaom requested a review from rikhuijzer February 17, 2022 01:34
@rikhuijzer
Copy link
Member

I'm sorry for the late reply. I forgot about this PR.

I feel that using @formula syntax here is unnecessary
[...]
this doesn't apply here as the @formula syntax isn't currently exposed to the end user.

Hmm yes good point I guess.

src/MLJGLMInterface.jl Show resolved Hide resolved
src/MLJGLMInterface.jl Outdated Show resolved Hide resolved
src/MLJGLMInterface.jl Outdated Show resolved Hide resolved
src/MLJGLMInterface.jl Outdated Show resolved Hide resolved
src/MLJGLMInterface.jl Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
@OkonSamuel OkonSamuel merged commit 35c2bc0 into master Feb 22, 2022
@ablaom
Copy link
Member

ablaom commented Feb 24, 2022

Thank you @OkonSamuel for this extend sequence of PRs and @rikhuijzer for the informed reviews. ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants