Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revive this? #8

Open
danpovey opened this issue Sep 5, 2018 · 23 comments
Open

Revive this? #8

danpovey opened this issue Sep 5, 2018 · 23 comments

Comments

@danpovey
Copy link
Member

danpovey commented Sep 5, 2018

@njoly @bodgerer I'm wondering how much you guys know about GridEngine internals, or if you know anyone else who might? Dave Love seems to have disappeared again, and I'm wondering who else might have a deep knowledge of GridEngine and be willing to maintain a GitHub-based version of the project?

@addylaszlokovacs
Copy link

addylaszlokovacs commented Sep 5, 2018 via email

@ppoilbarbe
Copy link

Hello,
I sent patches in the past but I don't know the internals enough to be an efficient maintainer.
If I interfere in the discussion, it is because I don't understand why maintaining a github-based version of the project would be different from another SCM. Except that on github, it is more visible and easy to clone than the previous one (or previous ones if I understood correctly).
Concerning SGE, I am more a user with a cluster of about 150 heterogeneous nodes to administrate (desktop and rack computers in datacenter).

@danpovey
Copy link
Member Author

danpovey commented Sep 5, 2018

It's not really about the specific SCM, it's about who will pick up the torch when Dave stops maintaining it-- which may have already appened.

@mightybigcar
Copy link

mightybigcar commented Sep 5, 2018 via email

@bodgerer
Copy link
Contributor

bodgerer commented Sep 6, 2018

Hi all,

I've worked on a few of the subsystems within gridengine, fixing a few horrible problems, and am happy to share what I've learned and help out if I can. Regarding maintaining a github based fork, I'm not sure my day job would give me time (SGE admin since version 5.3, currently have 3 SoGE clusters / 750 nodes).

I might be seeing Dave later this month, but he did write on the list that he'd be happy for someone to take over maintenance - as long as he gets to properly hand things over. He's done a great job with SoGE over the past years.

Mark

@danpovey
Copy link
Member Author

danpovey commented Sep 6, 2018 via email

@bodgerer
Copy link
Contributor

I'm not sure the problem is not many people who know the internals well enough: it's that there is a general lack of people interested in reading and editing the source code. I'm not sure how to fix that.

@ppoilbarbe
Copy link

Some years ago, I read the code (to find a memory leak problem in masterd). And really there is many things that I don't understand. And if I don't understand what is around what I modify, I'm not sure to be more restorative than destructive.

@bodgerer
Copy link
Contributor

I guess it needs better documentation, at the very least. I'll see if I can find time to start some.

@danpovey
Copy link
Member Author

danpovey commented Sep 17, 2018 via email

@bodgerer
Copy link
Contributor

I was thinking code-level, e.g. a getting started guide for prospective developers

@danpovey
Copy link
Member Author

danpovey commented Oct 2, 2018 via email

@bodgerer
Copy link
Contributor

Hi Dan,

Sorry, forgot to reply. Yes, I saw Dave - but unfortunately we didn't have time to discuss gridengine.

Cheers,

Mark

@mightybigcar
Copy link

OK, it looks like I'm going to get into maintenance in a serious way soon. I've got requests to get SGE up and running on MacOS X (yikes!), and to add GPU memory allocation support (looks like we can build on gpu_loadsensor.c for that).

Currently we're running Dave Love's Fedora 28 build on our systems, so it would be best to build on that. Unfortunately, I can't seem to download the corresponding SRPMs - has anyone figured which commit(s)/tag(s) in this tree or one of Dave's correspond to that build?

Thanks,
Chris

@danpovey
Copy link
Member Author

danpovey commented Oct 27, 2018 via email

@mightybigcar
Copy link

mightybigcar commented Oct 27, 2018 via email

@danpovey
Copy link
Member Author

danpovey commented Oct 27, 2018 via email

@danpovey
Copy link
Member Author

danpovey commented Oct 27, 2018 via email

@mightybigcar
Copy link

Hmmm. There is a tag in there that might refer to 8.1.9, but it's not clear if that corresponds directly to the F28 packages. At least it's a starting point.

@mightybigcar
Copy link

I'm generally with you on the "just use the whole GPU" approach. But the machine learning team doesn't like to take "because I say so" as a reason, and sometimes they're quite justified in that.

With some more poking around, it looks like we can use prolog/epilog to manage this data for the well behaved programs that will account for the initial use cases. Hopefully we won't need to resort to a bunch of messing with the SGE internals.

@prod-feng
Copy link

prod-feng commented Oct 27, 2018 via email

@mightybigcar
Copy link

mightybigcar commented Oct 28, 2018 via email

@bodgerer
Copy link
Contributor

bodgerer commented Oct 29, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants