Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug that caused MAIN_GAIA_TABLE in astroquery.gaia to not work #2153

Merged
merged 2 commits into from
Oct 18, 2021

Conversation

eerovaher
Copy link
Member

@eerovaher eerovaher commented Sep 22, 2021

Instead of attempting to overhaul the configuration system in astroquery.gaia, this pull request has been scaled down to only fix #2093 and fix #2099. #1760 will be dealt with in a separate pull request.

The original pull request description is preserved below to provide context to the discussions that took place here.

ORIGINAL PULL REQUEST MESSAGE (OUTDATED!):

The cone_search() and query_object() families of functions in astroquery.gaia now properly use the astroquery.gaia.conf items MAIN_GAIA_TABLE, MAIN_GAIA_TABLE_RA, MAIN_GAIA_TABLE_DEC and ROW_LIMIT, meaning that these configuration options can be updated at runtime. Furthermore, those functions also accept optional keyword arguments that overrule the values from conf. Some functions already allowed for some of these arguments, and all new arguments use the same names as the existing ones. However, these names differ from the names used in conf, and renaming them might be desirable in the future.

The GaiaClass attributes that correspond to the previously listed conf items provide no value and have been deprecated.

There is still more that can be done. For example VALID_DATALINK_RETRIEVAL_TYPES should either not be defined in conf at all or it should be defined as an instance of ConfigItem. Another question not addressed here is if the config options should affect the cross_match() function. But this pull request is quite large already and it fixes #2093, #2099 and #1760.

@@ -357,7 +369,9 @@ def get_datalinks(self, ids, verbose=False):
return self.__gaiadata.get_datalinks(ids=ids, verbose=verbose)

def __query_object(self, coordinate, radius=None, width=None, height=None,
async_job=False, verbose=False, columns=[]):
async_job=False, verbose=False, columns=[],
table_name=None, ra_column_name=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't do this with the other modules and I don't think it's the overall preference to bloat up the number of kwargs with configurable members. cc @keflavich

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I see the motivation to have a configurable default table and row limit (:+1: for those), but not for the RA/Dec column name. If there's a specific reason to make that configurable (i.e., if the RA/Dec columns have version numbers in them), then that would be a good enough reason, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main logic was to just use another instance for that, right? (not that gaia_xyz = Gaia() works, it doesn't, unlike other modules). If useful, we could even make it into the __init__ as we do it in e.g. Vizier, to make it very convenient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I added the *_column_name arguments to the query_object functions was that the cone_search functions already had them, and having a common set of arguments for both seemed like a good idea.

That being said I don't think those config options should exist at all. It seems to me that having the coordinate system as a config option and/or a keyword argument would be beneficial, but the names of the longitude and latitude coordinates should be automatically taken from some built-in dictionary instead of having to rely on the user to supply both. If this is an idea we wish to pursue further in some other pull request then we would have to deprecate the *_column_name config options, and adding more public keyword arguments in this pull request that would have to be deprecated in the future would be counterproductive.

The best thing to do here seems to be to undo the addition of the *_column_name arguments to the query_object functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the idea that coordinate system selection should be by system, not pairs of columns.

regarding configuration of the default table / row limit: those are, in many packages, accessible as both configuration items and per-instance items. The table selection is more like a base URL choice, which is always(?) configurable at the config level.

@bsipocz
Copy link
Member

bsipocz commented Sep 22, 2021

@astropy/coordinators - could someone add @eerovaher to the org, it's way overdue!

@eerovaher
Copy link
Member Author

Before I update this pull request I need to make sure that I understand how the configuration system is supposed to work here.

The __init__.py files define conf items using the same configuration system as astropy, allowing the users to modify the configuration options. But users that might not be familiar with the astropy configuration system would benefit from having an alternative way of changing the settings. Is it correct that the intended way to do that in astroquery is to use class attributes instead of function arguments?

If that is the case then I should remove all the new keyword arguments and undo the deprecation of the class attributes.

Based on the discussion above we might want to actually deprecate some of the attributes, but that would be better done in a separate pull request because its not really related to the issues that are being fixed here.

@keflavich
Copy link
Contributor

These are good questions.

There are some things that are meant to be treated as class attributes or configuration items - things that you want to persist across a whole session and are more directly associated with a feature of the service, such as: the URL of the service, the timeout limit, etc. Other items, anything that should vary from one query to the next, should be left as keyword arguments. The line between these different classes of items is not sharp, though, and there can be arguments both ways for some of attributes/kwargs. If there is ambiguity, it is acceptable to have function-specific kwargs that override the class attribute for that single call, though generally this is probably not needed.

@bsipocz feel free to disagree with / suggest changes to this "policy". I am simply summarizing how it has been done (sometimes), not how it has to be.

@eerovaher
Copy link
Member Author

@keflavich, based on your answer it would seem that the table name should be available as a class attribute and row limit should be available as a function argument.

@keflavich
Copy link
Contributor

Yes, though I would suggest including ROW_LIMIT as both a class attribute and a kwarg; for some services, it is good to have a default global row limit in place to avoid accidentally enabling gigantic requests (at least, I think that was the original reason we did it in Vizier)

@eerovaher
Copy link
Member Author

What I meant was that the table name should be a class attribute in addition to being in conf and likewise row limit would be a function argument and in conf. So if the default row limit is set in conf then why would there be any need for a class attribute?

@bsipocz
Copy link
Member

bsipocz commented Sep 23, 2021

I don't really disagree with any of the above, but prefer to have consistency, or converge toward consistency among the modules.

And if possible not bloat up the number up method kwargs with repeated ones (but am not against in having things as class attributes that are modifiable either by config and class kwargs).

@keflavich
Copy link
Contributor

So if the default row limit is set in conf then why would there be any need for a class attribute?

We always make the config items class attributes. e.g., in Vizier, it's set during __init__ but defaults to the config version:
https://github.com/astropy/astroquery/blob/main/astroquery/vizier/core.py#L50
https://github.com/astropy/astroquery/blob/main/astroquery/vizier/core.py#L87

While in SIMBAD it's a class attribute defined earlier...
https://github.com/astropy/astroquery/blob/main/astroquery/simbad/core.py#L296
because that still allows users to modify Simbad.ROW_LIMIT after initializing the class... but I think the Vizier approach is better.

Yes, we should definitely make these consistent. I prefer the Vizier approach out of the two above, but I would defer to others if there's reason to prefer the direct attribute approach.

@keflavich
Copy link
Contributor

I'll add, though, that the fact that this is the first time this has come up (or close to) suggests that these options are not often used and/or not confusing to users, so while consistency would be better, we do not seem to have a glaring UI problem.

@eerovaher
Copy link
Member Author

For consistency's sake it does seem better to make the row limit available through conf, as a class attribute and also as a function argument.

It looks like the changes needed in the code are much smaller than I had thought, and in that case it becomes more difficult to justify dealing with all three issues here simultaneously. Should I open a separate pull request for addressing #1760 and only deal with the MAIN_TABLE_NAME issues here?

@bsipocz
Copy link
Member

bsipocz commented Sep 23, 2021

What came up before is that the dr release attribute wasn't mutable for Gaia. But that module is not really usable the way we envisioned the rest with modified attributes/configs. So that's a clear bug.

I think I have the slight preference to use class instances, as in the Simbad docs, but don't have a strong preference whether values are set at initialisation or mutable later as attributes. How about systemically checking what we're doing in different modules, but also pulling in more people to ask for their API opinions (e.g. AdrianD).

And am sorry @eerovaher for derailing your bugfix pr.

@keflavich
Copy link
Contributor

Yeah, we should definitely have an issue summarizing where we stand and requesting feedback.

and ditto, sorry @eerovaher for derailing!

@eerovaher
Copy link
Member Author

Opening a dedicated issue for the configuration questions looks like a good idea.

In the meanwhile I will fix #2093 and #2099 here with the minimal required changes to the code and open a separate issue for #1760.

@codecov
Copy link

codecov bot commented Sep 24, 2021

Codecov Report

Merging #2153 (561011f) into main (241d896) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2153      +/-   ##
==========================================
+ Coverage   66.42%   66.44%   +0.01%     
==========================================
  Files         418      418              
  Lines       28026    28041      +15     
==========================================
+ Hits        18616    18631      +15     
  Misses       9410     9410              
Impacted Files Coverage Δ
astroquery/gaia/core.py 61.59% <100.00%> (ø)
astroquery/gaia/tests/test_gaiatap.py 99.65% <100.00%> (+0.01%) ⬆️
astroquery/utils/commons.py 77.72% <0.00%> (+0.11%) ⬆️
astroquery/utils/tests/test_utils.py 93.13% <0.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 241d896...561011f. Read the comment docs.

@eerovaher eerovaher changed the title Properly use the configuration system in the Gaia module Fix the bug that caused MAIN_GAIA_TABLE in astroquery.gaia to not work Sep 24, 2021
@eerovaher
Copy link
Member Author

This pull request does not have a label yet.

Currently querying the Gaia archive can return results from
`gaiadr2.gaia_source` even if the user has specified that they wish to
query a different table. This commit adds a regression test to see if
the right table actually gets queried.
It is now possible to change the table Gaia queries are made from
through the MAIN_GAIA_TABLE config item or class attribute at runtime.
If the class attribute is specified then it will take precedence over
the config item.
@eerovaher
Copy link
Member Author

I rebased the commits to solve a merge conflict in the change log.

@eerovaher
Copy link
Member Author

@bsipocz, it would be nice to get these bugfixes merged.

@bsipocz
Copy link
Member

bsipocz commented Oct 18, 2021

Thanks @eerovaher!

@eerovaher
Copy link
Member Author

This pull request has two approvals, it should be good to merge now.

@keflavich keflavich merged commit 7de9084 into astropy:main Oct 18, 2021
@bsipocz
Copy link
Member

bsipocz commented Oct 18, 2021

Hah, I meant to merge not just approve last night, sorry about that 😅

@eerovaher eerovaher deleted the gaia-config-use branch November 2, 2021 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting MAIN_GAIA_TABLE in astroquery.gaia.Gaia has no effect Cannot Query EDR3 data
3 participants