Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: catch errors if JSON parsing encounters JS Object #539

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

naomatheus
Copy link
Collaborator

First pass at addressing:
#538

See issue for details

@naomatheus
Copy link
Collaborator Author

checking that CI tests pass

Copy link
Contributor

@Tammo-Feldmann Tammo-Feldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check makes sense to me. Thank you for the fix @naomatheus.

@edkeeble
Copy link
Contributor

edkeeble commented Jun 13, 2023

Doesn't this make the keywords object unusable if the JSON parser fails? I feel like allowing the build to pass with invalid data is going to give us the appearance of a successful build, but the actual output will be unpredictable. If we can't get the parser to handle the provided data, then we should fix the problem in the backend and ensure we're always storing valid JSON in that field.

@Tammo-Feldmann
Copy link
Contributor

Tammo-Feldmann commented Jun 13, 2023

I agree with @edkeeble, it seems that something has changed in the backend where we are now storing and serving non-valid json data for doi.keywords. I'm not sure we can or should try to resolve this in the frontend.

@Tammo-Feldmann
Copy link
Contributor

Here is a list of the dois with non-valid JSON for the keyword strings:

  • CPEXCV-DAWN_DC8
  • CPEXCV-HALO_DC8
  • BIGFOOT_FD_868
  • bigfoot_gpp_surfaces_749
  • bigfoot_npp_surfaces_750
  • bigfoot_lai_747
  • BIGFOOT_METEOROLOGICAL_1065
  • bigfoot_landcove_748
  • CPEXCV-Dropsondes
  • CPEXCV_Cloud_AircraftInSitu_DC8_Data
  • CPEXCV_Merge_Data
  • metnavcpexcv
  • hamsrcpexcv
  • CPEXCV-DAWN_DC8
  • CPEXCV-HALO_DC8
  • BIGFOOT_FD_868
  • bigfoot_gpp_surfaces_749
  • bigfoot_npp_surfaces_750
  • bigfoot_lai_747
  • BIGFOOT_METEOROLOGICAL_1065
  • bigfoot_landcove_748
  • CPEXCV-Dropsondes
  • CPEXCV_Cloud_AircraftInSitu_DC8_Data
  • CPEXCV_Merge_Data
  • metnavcpexcv
  • hamsrcpexcv

Valid cmr keyword strings have escape characters and double quotes as in this example:

Screen Shot 2023-06-13 at 9 03 35 AM

Not valid examples look like:

"[{'Term': 'CLOUDS', 'Topic': 'ATMOSPHERE', 'Category': 'EARTH SCIENCE', 'VariableLevel1': 'CLOUD MICROPHYSICS', 'VariableLevel2': 'PARTICLE SIZE DISTRIBUTION'}, {'Term': 'CLOUDS', 'Topic': 'ATMOSPHERE', 'Category': 'EARTH SCIENCE'}]"

I believe the next step would be to figure out where in the backend we started generating these non-valid JSON strings for gcmd_keywords in the past week or so.

cc: @edkeeble

@naomatheus
Copy link
Collaborator Author

naomatheus commented Jun 13, 2023

Yea that makes sense @Tammo-Feldmann . Preferable to just letting invalid JSON stay as well.

Doesn't this make the keywords object unusable if the JSON parser fails?

But @edkeeble my understanding was that the JSON parser fails when the object encountered has already been deserialized. Is that the case with this list of invalid JSON examples above?

@naomatheus
Copy link
Collaborator Author

we should fix the problem in the backend and ensure we're always storing valid JSON in that field.

Any know off of top where we're writing these objects to the DB? Will investigate

@Tammo-Feldmann
Copy link
Contributor

Thank you for looking into it @naomatheus.

@edkeeble
Copy link
Contributor

Yea that makes sense @Tammo-Feldmann . Preferable to just letting invalid JSON stay as well.

Doesn't this make the keywords object unusable if the JSON parser fails?

But @edkeeble my understanding was that the JSON parser fails when the object encountered has already been deserialized. Is that the case with this list of invalid JSON examples above?

JSON.parse is the function that handles deserializing the string and I don't believe we have any other logic to process these fields prior to that step.

@edkeeble
Copy link
Contributor

I believe the next step would be to figure out where in the backend we started generating these non-valid JSON strings for gcmd_keywords in the past week or so.

We just deployed the ADMG backend to production for the first time in about a year 1.5 weeks ago, so the answer is probably somewhere in the 171 files changed in this PR... https://github.com/NASA-IMPACT/admg-backend/pull/414/files

@edkeeble
Copy link
Contributor

From the looks of the bad strings, we're storing the string encoding of the dictionaries for each field instead of serializing them into JSON. For example:

>>> import json
>>> a = {"test": "something"}
>>> str(a)
"{'test': 'something'}"
>>> json.dumps(a)
'{"test": "something"}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants