Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete rewrite of CBA_fnc_parseJSON #1582

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

initramfs
Copy link
Contributor

As compared to my last PR (#1580) this represents a complete rewrite of the JSON parsing code using the latest features available (as of the time of writing: version 2.12) in the scripting engine (though heavily inspired by the original shift-reduce design).

The parser is now (assuming bug-free) a fully validating parser that supports all the grammar defined at json.org, including previously unsupported features such as hex escape strings.

In addition to simply parsing a JSON document it can also track row/column numbers of the input tokens in order to generate appropriate error messages upon encountering an invalid JSON document. Row/column numbers are tracked at the unicode code point level rather than at the actual grapheme level for simplicity (so the use of things like U+200D ZERO WIDTH JOINER may cause an offset error in the reported column index). Row numbers are defined strictly by the presence of the code point U+000A LINE FEED so they should always be correct (technically there exists documents out there that exclusively use U+000D CARRIAGE RETURN or other exotic characters to denote new lines but catering to this would represent an unreasonable increase complexity that I was not willing to write).

This parser (as it's currently written) does represent an API change for CBA_fnc_parseJSON as instead of returning nil on error it throws an exception with an appropriate string message (though the old behaviour can be easily restored by wrapping the internal parse function call in a try-catch, this will effectively eliminate the purpose of the row/column tracking feature however).

When merged this pull request will:

  • Turn CBA_fnc_parseJSON into a fully validating parsing function to deserialize objects from JSON strings, throwing exceptions at any parsing errors (including row/column numbers and error reasoning).
  • Enable parsing of more JSON documents that could not previously be handled due to missing grammar implementations.
  • Causes some invalid JSON documents that were previously acceptable to be rejected (given the validating nature of the parser).

Side notes:

  1. I wrote this parser predominately for myself (i.e. for my own mod) but since I drew inspiration from the original CBA function I decided to adapt it to fit the CBA API. If this code represents an unreasonable increase in complexity and/or seems unacceptable, feel free to disregard/close this PR.
  2. There is a technical inefficiency in the number tokenizer where we repeatedly run a regex over an increasingly growing string (in order to find the longest token match without requiring the regex to be run over the entire document). This is done to avoid writing a FSM exclusively for number matching and relying on pure regex. There are ways to get around this limitation to make the tokenization single-pass but I considered those solutions to be overly complex or messy. In any case only pathological examples (documents consisting entirely of extremely long numbers [by character count]) would this really be an issue for.
  3. I made up the error messages without really consulting any official documentation as to what, if any, canonical error messages should be reported, so they shouldn't be considered authoritative (i.e. I am willing to change the message text if my existing choices are not acceptable).

@initramfs initramfs force-pushed the rewrite-parse-json branch from 5118207 to 8148521 Compare July 7, 2023 07:04
@commy2
Copy link
Contributor

commy2 commented Jul 11, 2023

Grew by almost 300 lines, more than doubled. Not a fan of try/throw/catch in SQF. Does it still pass test_parseJSON.sqf?

@initramfs initramfs force-pushed the rewrite-parse-json branch from 394eca0 to 2dca726 Compare July 12, 2023 14:45
@initramfs
Copy link
Contributor Author

initramfs commented Jul 12, 2023

@commy2 As stated in my sidenotes I wrote this parser for my own mod, as I am not on the CBA team I cannot gauge if the increase in complexity is worth the functionality/feature improvement. I would like to point out however that the original implementation does not fulfill it's contractual obligations in the preamble:

Returns:
_object - The deserialized JSON object or nil if JSON is invalid.
<LOCATION, ARRAY, STRING, NUMBER, BOOL, HASHMAP, NIL>

There are quite a few cases where the script will error out due to invalid parameters being passed into scripting commands (e.g. into parseNumber) or unacceptable strings being accepted (e.g. no code point check) or valid JSON being rejected/misparsed (e.g. missing hex escape codes), which for my usecase makes it completely unacceptable.

My motivation for writing my own parser came from my frustration when debugging a script that I wrote where I spent way too long trying to pinpoint the issue down, never expecting it to be from a CBA function (see PR #1580).


With respect to the usage of SQF exceptions, I don't really see any other way to pass information back to the caller in a structured way without further API changes (e.g. changing the return value to a n-tuple [array] representing if the parsing was successful + the JSON object [if successful] or error message and line numbers [if unsuccessful]). This is especially true given the output of this function could be almost all data types SQF supports (especially when you consider the additional _objectType parameter).

The exception type being thrown is always a string in this case but I can always change it to something more structured (e.g. array or hashmap) if the caller wants to be able to structurally parse the exception object.

As an additional note: as stated above the current implementation of the parser can cause runtime errors that are uncatchable (given those are errors from invalid parameters to the scripting engine) which is, in my opinion, worse than exceptions.

Of course another solution to this entire issue would be to force in the preconditions of the function that the JSON string must always be valid, though I would argue that represents the largest API change/deviation among all the ones I have mentioned.


With respect to the unit tests:

21:28:47 File x\cba\addons\hashes\test.sqf..., line 18
21:28:47 [CBA] (hashes) Test OK: (CBA_fnc_parseJSON is defined) x\cba\addons\hashes\test_parseJSON.sqf:14
21:28:47 [CBA] (hashes) Test OK: (not (_result)) x\cba\addons\hashes\test_parseJSON.sqf:20
21:28:47 [CBA] (hashes) Test OK: (_result isEqualTo _expected) x\cba\addons\hashes\test_parseJSON.sqf:25
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:29
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:33
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:37
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:41
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:45
21:28:47 [CBA] (hashes) Test OK: (not (_result)) x\cba\addons\hashes\test_parseJSON.sqf:48
21:28:47 [CBA] (hashes) Test OK: (isNull _result) x\cba\addons\hashes\test_parseJSON.sqf:51
21:28:47 [CBA] (hashes) Test OK: (_result) x\cba\addons\hashes\test_parseJSON.sqf:57
21:28:47 [CBA] (hashes) Test OK: (_result isEqualTo _expected) x\cba\addons\hashes\test_parseJSON.sqf:62
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:66
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:70
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:74
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:78
21:28:47 [CBA] (hashes) Test OK: (_result == _expected) x\cba\addons\hashes\test_parseJSON.sqf:82
21:28:47 [CBA] (hashes) Test OK: (not (_result)) x\cba\addons\hashes\test_parseJSON.sqf:85
21:28:47 [CBA] (hashes) Test OK: (isNull _result) x\cba\addons\hashes\test_parseJSON.sqf:88
21:28:47 [CBA] (hashes) Test OK: (CBA_fnc_encodeJSON is defined) x\cba\addons\hashes\test_parseJSON.sqf:95
21:28:47 "null"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "true"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "1.2"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 """Hello, World!"""
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "{}"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[null, true, 1.2, ""Hello, World!""]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": []}]}]}]}]}]}]}]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "null"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "true"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "1.2"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 """Hello, World!"""
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "{}"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[null, true, 1.2, ""Hello, World!""]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": []}]}]}]}]}]}]}]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "null"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "true"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "1.2"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 """Hello, World!"""
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "{}"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[null, true, 1.2, ""Hello, World!""]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": []}]}]}]}]}]}]}]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "null"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "true"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "1.2"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 """Hello, World!"""
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "{}"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[null, true, 1.2, ""Hello, World!""]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": []}]}]}]}]}]}]}]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "null"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "true"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "1.2"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 """Hello, World!"""
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "{}"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[null, true, 1.2, ""Hello, World!""]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 "[{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": [{""nested"": []}]}]}]}]}]}]}]"
21:28:47 [CBA] (hashes) Test OK: (_input == _output) x\cba\addons\hashes\test_parseJSON.sqf:118
21:28:47 [CBA] (hashes) Test OK: (count _properties == 6) x\cba\addons\hashes\test_parseJSON.sqf:126
21:28:47 [CBA] (hashes) Test OK: (typeName _value == _x) x\cba\addons\hashes\test_parseJSON.sqf:129
21:28:47 [CBA] (hashes) Test OK: (typeName _value == _x) x\cba\addons\hashes\test_parseJSON.sqf:129
21:28:47 [CBA] (hashes) Test OK: (typeName _value == _x) x\cba\addons\hashes\test_parseJSON.sqf:129
21:28:47 [CBA] (hashes) Test OK: (typeName _value == _x) x\cba\addons\hashes\test_parseJSON.sqf:129
21:28:47 [CBA] (hashes) Test OK: (typeName _value == _x) x\cba\addons\hashes\test_parseJSON.sqf:129
21:28:47 [CBA] (hashes) Test OK: (typeName _value == _x) x\cba\addons\hashes\test_parseJSON.sqf:129

So it appears it still passes all the unit tests.

@commy2
Copy link
Contributor

commy2 commented Jul 12, 2023

CBA never uses exceptions. I get the possibilities compared to just using the ERROR_N macro and returning nil, but meh.

Probably needs to think about what value to throw, because that would become part of the public API.

The helper function part should be simplified and optimized to:

    private ["_createObject", "_objectSet"];

    switch (_objectType) do {
        case false;
        case 0: {
            _createObject = {[] call CBA_fnc_createNamespace};
            _objectSet = {
                params ["_obj", "_key", "_val"];
                _obj setVariable [_key, _val];
            };
        };

        case true;
        case 1: {
            _createObject = {[] call CBA_fnc_hashCreate};
            _objectSet = {
                params ["_obj", "_key", "_val"];
                [_obj, _key, _val] call CBA_fnc_hashSet;
            };
        };

        case 2: {
            _createObject = {createHashMap};
            _objectSet = {
                params ["_obj", "_key", "_val"];
                _obj set [_key, _val];
            };
        };
    };

No need to look up _objectType repeatedly when it never changes.

forceUnicode 0;

/**
* Helper function returning the code used for object creation based on the "_objectType" parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment explains what the code does. That's just duplication and not useful. If you want to keep it as separator, just use // helper function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the small helper functions with the original code so there are no additional function declarations anymore. The original rationale for the comment is merely my codestyle of always having a preamble for any function declaration.

I believe the preambles for _tokenize, _reduce definitely provide additional information (wrt. output format, preconditions, etc). To a lesser extent _shift and _parse do as well, and it seems a little inconsistent to me for some functions to get a full descriptor and some to not.

@initramfs
Copy link
Contributor Author

initramfs commented Jul 12, 2023

CBA never uses exceptions. I get the possibilities compared to just using the ERROR_N macro and returning nil, but meh.

Probably needs to think about what value to throw, because that would become part of the public API.

The helper function part should be simplified and optimized to:

    private ["_createObject", "_objectSet"];

    switch (_objectType) do {
        case false;
        case 0: {
            _createObject = {[] call CBA_fnc_createNamespace};
            _objectSet = {
                params ["_obj", "_key", "_val"];
                _obj setVariable [_key, _val];
            };
        };

        case true;
        case 1: {
            _createObject = {[] call CBA_fnc_hashCreate};
            _objectSet = {
                params ["_obj", "_key", "_val"];
                [_obj, _key, _val] call CBA_fnc_hashSet;
            };
        };

        case 2: {
            _createObject = {createHashMap};
            _objectSet = {
                params ["_obj", "_key", "_val"];
                _obj set [_key, _val];
            };
        };
    };

No need to look up _objectType repeatedly when it never changes.

I've essentially restored what the original code did which does a single check and sets _createObject and _objectSet as function-local variables. The reason for the change was merely legal, as I was doing a cleanroom implementation (but I had already studied what the original implementation did) I didn't want to copy any part of GPL-ed code from the original. Now that I'm submitting the code back I guess that point is moot.


With respect to exceptions, this is the first piece of CBA code that I have touched so I'm not familiar with what kind of public API is expected (for example, I didn't know that there were no CBA functions that threw exceptions). As for what type of object to throw on parse error, I really don't have any input as to what is appropriate, I simply wrote something that was appropriate for myself.

In essence, I really don't have any stake in the matter and I can change the exception (and the exception documentation) to anything you want (or even just try/catch and return nil). The reason I didn't add the exception throwing behaviour to the function documentation was because I didn't know how to format it and was expecting directions there.

I suggest perhaps discussion with other CBA maintainers for input, or if you're in a position to make decisions yourself, just tell me what you want me to do.

@commy2
Copy link
Contributor

commy2 commented Jul 12, 2023

As for what type of object to throw on parse error, I really don't have any input as to what is appropriate, I simply wrote something that was appropriate for myself.

Idk either, because there was never any need to do it. Just saying that we should think about what is useful, because changing it later could break stuff.

I suggest perhaps discussion with other CBA maintainers for input

They may speak below:

@veteran29 veteran29 self-requested a review July 30, 2023 11:36
@jonpas
Copy link
Member

jonpas commented Sep 3, 2023

While I see the goals and pros of this approach, I dislike the use of try/catch/throw in SQF and the significant increase in codebase and complexity with that.

@BaerMitUmlaut Would be interesting to see your take on that.

@BaerMitUmlaut
Copy link
Contributor

If you want to know my opinion, this is way too complex, the function is now twice the size. I don't like full rewrites in general. There's also no tests for the new functionality.

@dedmen
Copy link
Contributor

dedmen commented Sep 5, 2023

(as of the time of writing: version 2.12)

Well we have 2.14 now ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants