Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For enum values starting with a digit, prefix the generated symbol with an underscore. #3961

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

deven
Copy link
Contributor

@deven deven commented Jan 3, 2025

Motivation and Context

If an enum uses values which start with a digit, the generated symbols are invalid, causing both Smithy errors and Rust errors.

Description

Prefixes invalid symbols starting with a digit with underscores to make the symbols valid.

Testing

This commit fixes the Smithy error that I was getting for an enum using a value of "2DBarcode", and also fixes the generated Rust code to use "_2DBarcode" as the enum value instead of "2DBarcode".

Checklist

  • For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "client," "server," or both in the applies_to key.
  • For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the .changelog directory, specifying "aws-sdk-rust" in the applies_to key.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@deven deven requested review from a team as code owners January 3, 2025 14:45
@landonxjames
Copy link
Contributor

Hi @deven, thanks for the contribution! I have kicked off the CI tests and expect them all to pass. Would it be possible to get a test added for this to ensure that it isn't reverted at some later point and maybe a comment on the regex replacements just to indicate their purpose?

@drganjoo
Copy link
Contributor

drganjoo commented Jan 3, 2025

It is unclear which error is this PR trying to resolve. Enum variant names follow the same rules as identifiers in Smithy, and Smithy does not allow identifiers to begin with a digit. For example, this is invalid in Smithy:

enum BarCodes {
    2DBarCode
    3DBarCode
}

However, this restriction does not apply to enum values. For example, the following is perfectly valid and will not raise any errors:

enum BarCodes {
    TwoDBarCode = "2d"
    ThreeDBarCode = "3d"
}

@deven
Copy link
Contributor Author

deven commented Jan 4, 2025

Hi @deven, thanks for the contribution! I have kicked off the CI tests and expect them all to pass. Would it be possible to get a test added for this to ensure that it isn't reverted at some later point and maybe a comment on the regex replacements just to indicate their purpose?

Will do, thanks!

@deven
Copy link
Contributor Author

deven commented Jan 4, 2025

It is unclear which error is this PR trying to resolve. Enum variant names follow the same rules as identifiers in Smithy, and Smithy does not allow identifiers to begin with a digit. For example, this is invalid in Smithy:

enum BarCodes {
    2DBarCode
    3DBarCode
}

However, this restriction does not apply to enum values. For example, the following is perfectly valid and will not raise any errors:

enum BarCodes {
    TwoDBarCode = "2d"
    ThreeDBarCode = "3d"
}

I tried to keep it simple, but I guess I need to walk through the gory details.

I've been working on creating AWS-style Rust SDKs for various sections of Amazon SP-API, using a custom script I wrote to translate the original Swagger IDL files into Smithy IDL files, then building the SDKs using smithy-rs. I've had to resolve many problems in this project, but that's a long story.

In this particular case, I was working on the Fulfillment Inbound API. Here is one of the enums in the original fulfillmentInboundV0.json Swagger IDL file:

    "BoxContentsSource": {
      "type": "string",
      "description": "Where the seller provided box contents information for a shipment.",
      "enum": [
        "NONE",
        "FEED",
        "2D_BARCODE",
        "INTERACTIVE"
      ],
      "x-docgen-enum-table-extension": [
        {
          "value": "NONE",
          "description": "There is no box contents information for this shipment. Amazon will manually process the box contents information. This may incur a fee."
        },
        {
          "value": "FEED",
          "description": "Box contents information is provided through the _POST_FBA_INBOUND_CARTON_CONTENTS_ feed."
        },
        {
          "value": "2D_BARCODE",
          "description": "Box contents information is provided by a barcode on the shipment. For more information, see Using 2D barcodes for box content information on Seller Central."
        },
        {
          "value": "INTERACTIVE",
          "description": "Box contents information is provided by an interactive source, such as a web tool."
        }
      ]
    },

My script translated this Swagger enum into the following Smithy enum in the generated Smithy IDL file:

/// Where the seller provided box contents information for a shipment.
enum BoxContentsSource {
    /// There is no box contents information for this shipment. Amazon will manually process the box contents information. This may incur a fee.
    NONE

    /// Box contents information is provided through the _POST_FBA_INBOUND_CARTON_CONTENTS_ feed.
    FEED

    /// Box contents information is provided by a barcode on the shipment. For more information, see Using 2D barcodes for box content information on Seller Central.
    _2D_BARCODE = "2D_BARCODE"

    /// Box contents information is provided by an interactive source, such as a web tool.
    INTERACTIVE
}

Of course, simply using 2D_BARCODE would be an invalid enum variant, since it doesn't conform to identifier rules because of the leading digit. I ran into that issue earlier, so I modified my script to use _2D_BARCODE = "2D_BARCODE" instead, to make the Smithy IDL valid without changing the wire value. This is analogous to TwoDBarCode = "2d" in your example. In theory, it should have worked, but it didn't. When I ran ./gradlew assemble in the smithy-rs directory to generate the Rust code, it failed with the following exception:

Projection fulfillmentinboundv0 failed: software.amazon.smithy.model.shapes.ShapeIdSyntaxException: Invalid shape ID member: 2DBarcode
software.amazon.smithy.model.shapes.ShapeIdSyntaxException: Invalid shape ID member: 2DBarcode
        at software.amazon.smithy.model.shapes.ShapeId.withMember(ShapeId.java:263)
        at software.amazon.smithy.rust.codegen.core.smithy.generators.EnumMemberModel$Companion.toEnumVariantName(EnumGenerator.kt:108)
        at software.amazon.smithy.rust.codegen.core.smithy.generators.EnumMemberModel.name(EnumGenerator.kt:119)
        at software.amazon.smithy.rust.codegen.client.smithy.generators.InfallibleEnumType.renderForwardCompatibilityNote(ClientEnumGenerator.kt:245)
        at software.amazon.smithy.rust.codegen.client.smithy.generators.InfallibleEnumType.access$renderForwardCompatibilityNote(ClientEnumGenerator.kt:29)
        at software.amazon.smithy.rust.codegen.client.smithy.generators.InfallibleEnumType$additionalDocs$1.invoke(ClientEnumGenerator.kt:165)
        at software.amazon.smithy.rust.codegen.client.smithy.generators.InfallibleEnumType$additionalDocs$1.invoke(ClientEnumGenerator.kt:164)
        at software.amazon.smithy.rust.codegen.core.smithy.generators.EnumGenerator.renderEnum(EnumGenerator.kt:253)
        at software.amazon.smithy.rust.codegen.core.smithy.generators.EnumGenerator.renderNamedEnum(EnumGenerator.kt:204)
        at software.amazon.smithy.rust.codegen.core.smithy.generators.EnumGenerator.render(EnumGenerator.kt:191)
        at software.amazon.smithy.rust.codegen.client.smithy.ClientCodegenVisitor$stringShape$1.invoke(ClientCodegenVisitor.kt:283)
        at software.amazon.smithy.rust.codegen.client.smithy.ClientCodegenVisitor$stringShape$1.invoke(ClientCodegenVisitor.kt:282)
        at software.amazon.smithy.rust.codegen.core.smithy.RustCrate.withModule$lambda$6(CodegenDelegator.kt:211)
        at software.amazon.smithy.codegen.core.WriterDelegator.useFileWriter(WriterDelegator.java:175)
        at software.amazon.smithy.rust.codegen.core.smithy.RustCrate.withModule(CodegenDelegator.kt:210)
        at software.amazon.smithy.rust.codegen.core.smithy.RustCrate.inPrivateModuleWithReexport(CodegenDelegator.kt:252)
        at software.amazon.smithy.rust.codegen.client.smithy.ClientCodegenVisitor.stringShape(ClientCodegenVisitor.kt:282)
        at software.amazon.smithy.rust.codegen.client.smithy.ClientCodegenVisitor.stringShape(ClientCodegenVisitor.kt:61)
        at software.amazon.smithy.model.shapes.ShapeVisitor.enumShape(ShapeVisitor.java:70)
        at software.amazon.smithy.model.shapes.EnumShape.accept(EnumShape.java:115)
        at software.amazon.smithy.rust.codegen.client.smithy.ClientCodegenVisitor.execute(ClientCodegenVisitor.kt:165)
        at software.amazon.smithy.rust.codegen.client.smithy.RustClientCodegenPlugin.executeWithDecorator(RustClientCodegenPlugin.kt:79)
        at software.amazon.smithy.rust.codegen.client.testutil.ClientDecoratableBuildPlugin.execute(ClientCodegenIntegrationTest.kt:61)
        at software.amazon.smithy.build.SmithyBuildImpl.applyPlugin(SmithyBuildImpl.java:432)
        at software.amazon.smithy.build.SmithyBuildImpl.applyProjection(SmithyBuildImpl.java:364)
        at software.amazon.smithy.build.SmithyBuildImpl.executeSerialProjection(SmithyBuildImpl.java:281)
        at software.amazon.smithy.build.SmithyBuildImpl.lambda$applyAllProjections$4(SmithyBuildImpl.java:201)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
        at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754)
        at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
        at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
        at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
        at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
        at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)

This exception is generated because "2DBarcode" is not a valid identifier because of the leading digit. However, the exact string "2DBarcode" was NOT in the Smithy IDL file at all! Some algorithm inside smithy-rs apparently took the wire value of "2D_BARCODE" and applied transformations to turn it into "2DBarcode", which then triggered the exception. If it had started with the "_2D_BARCODE" Smithy enum variant name instead, presumably it would have transformed that into "_2DBarcode", which would have been valid.

My commit uses the following code fragment to fix the invalid generated symbols starting with a digit:

.replace(Regex("^(_*\\d)"), "_$1")

This will prefix the symbol name with an underscore if it begins with a digit, making the generated symbol valid. To guarantee that this modification can't conflict with another enum variant, it also matches any number of leading underscores before that leading digit. Otherwise, prefixing 2DBarcode with an underscore could potentially conflict with a different _2DBarcode variant. Since this regex matches any number of leading underscores along with the leading digit, that means that 2DBarcode becomes _2DBarcode and _2DBarcode becomes __2DBarcode (adding a second leading underscore), preventing a possible collision from this prefixing step.

Originally, I only added this code fragment to EnumGenerator.kt, but it was insufficient to solve the problem. The exception above did go away, and the smithy-rs build was "successful", but the build output did include a new warning:

[WARNING] Failed to run cargo fmt: [com.amazon.sellingpartnerapi.fulfillmentinbound.v0#FulfillmentInbound_v0]
Unexpected exception thrown when executing subprocess:
CommandError(output=Command Error
error[E0585]: found a documentation comment that doesn't document anything
  --> /home/deven/git/smithy-rs/aws/sdk/build/smithyprojections/sdk/fulfillmentinboundv0/rust-client-codegen/src/types/_box_contents_source.rs:45:5
   |
44 | pub enum BoxContentsSource {
   |          ----------------- while parsing this enum
45 |     /// Box contents information is provided by a barcode on the shipment. For more information, see Using 2D barcodes for box content information on Seller Central.
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = help: doc comments must come before what they document, if a comment was intended use `//`
   = help: enum variants can be `Variant`, `Variant = <integer>`, `Variant(Type, ..., TypeN)` or `Variant { fields: Types }`

Error writing files: failed to resolve mod `_box_contents_source`: cannot parse /home/deven/git/smithy-rs/aws/sdk/build/smithyprojections/sdk/fulfillmentinboundv0/rust-client-codegen/src/types/_box_contents_source.rs

)

When I investigated, I found that the generated Rust code had the following enum definition:

pub enum BoxContentsSource {
    /// Box contents information is provided by a barcode on the shipment. For more information, see Using 2D barcodes for box content information on Seller Central.
    2DBarcode,
    /// Box contents information is provided through the _POST_FBA_INBOUND_CARTON_CONTENTS_ feed.
    Feed,
    /// Box contents information is provided by an interactive source, such as a web tool.
    Interactive,
    /// There is no box contents information for this shipment. Amazon will manually process the box contents information. This may incur a fee.
    None,
    /// `Unknown` contains new variants that have been added since this code was generated.
    #[deprecated(note = "Don't directly match on `Unknown`. See the docs on this enum for the correct way to handle unknown variants.")]
    Unknown(crate::primitives::sealed_enum_unknown::UnknownVariantValue)
}

Since 2DBarcode as an enum variant name is just as invalid in Rust as it is in Smithy, this generated Rust code is invalid and cannot be compiled. Investigating further, I concluded that this invalid enum variant name in the Rust code was being returned from the toSymbol() method. Further debugging showed that the fakeMemberShape object WAS being created with the underscore prefix, with an id value like com.amazon.sellingpartnerapi.fulfillmentinbound.v0#BoxContentsSource$_2DBarcode, but the toSymbol() method was nevertheless returning 2DBarcode as the symbol.

I needed to investigate the toSymbol() implementation, but it wasn't clear which concrete class implementation was being used. Using reflection and debugging print statements, I determined that the RustReservedWordSymbolProvider class contained the toSymbol() implementation in question. When I eventually determined the right way to integrate my new code fragment into the toSymbol() method, it finally resolved the problem complete. With this pull request, the generated Rust code now looks like this instead:

pub enum BoxContentsSource {
    /// Box contents information is provided by a barcode on the shipment. For more information, see Using 2D barcodes for box content information on Seller Central.
    ///
    /// _Note: `::2DBarcode` has been renamed to `::_2DBarcode`._
    _2DBarcode,
    /// Box contents information is provided through the _POST_FBA_INBOUND_CARTON_CONTENTS_ feed.
    Feed,
    /// Box contents information is provided by an interactive source, such as a web tool.
    Interactive,
    /// There is no box contents information for this shipment. Amazon will manually process the box contents information. This may incur a fee.
    None,
    /// `Unknown` contains new variants that have been added since this code was generated.
    #[deprecated(note = "Don't directly match on `Unknown`. See the docs on this enum for the correct way to handle unknown variants.")]
    Unknown(crate::primitives::sealed_enum_unknown::UnknownVariantValue),
}

This now contains a valid enum variant name (_2DBarcode), and even includes a note in the documentation comment about the symbol being renamed.

@aajtodd
Copy link
Contributor

aajtodd commented Jan 6, 2025

Have you tried changing your original transformation to not use underscore as a prefix and something like X_ or x2DBarCode or similar? The algorithm for converting the enum name is likely dropping underscores as part of converting along word boundaries. You may have better results with a different prefix and possibly negate the need for this change.

@drganjoo
Copy link
Contributor

drganjoo commented Jan 6, 2025

@deven Thanks for the detailed explanation. @aajtodd I agree that not appending _ should solve this issue, but we need to fix this overall, as the following is a valid Smithy model, and we should generate valid Rust code for it:

enum Dimensions {
    _2D = "2D"
    _3D = "3D"
}

However, I would like to dig a bit deeper to understand why we need to use PascalCase in the EnumGenerator.kt in the first place.

@deven
Copy link
Contributor Author

deven commented Jan 7, 2025

Have you tried changing your original transformation to not use underscore as a prefix and something like X_ or x2DBarCode or similar? The algorithm for converting the enum name is likely dropping underscores as part of converting along word boundaries. You may have better results with a different prefix and possibly negate the need for this change.

For the record, using x2D_BARCODE instead of _2D_BARCODE does work with the original code, generating a Rust enum member X2DBarcode. Not that I want to use an alphabetic prefix like that! As @drganjoo said, using the underscore prefix makes a valid Smithy model that should generate valid Rust code. My patch resolves this issue.

@deven deven force-pushed the prefix_leading_digit branch from e16a2af to c14bf4d Compare January 7, 2025 06:49
@deven deven force-pushed the prefix_leading_digit branch from c14bf4d to fbd70dd Compare January 7, 2025 06:50
@deven
Copy link
Contributor Author

deven commented Jan 7, 2025

Hi @deven, thanks for the contribution! I have kicked off the CI tests and expect them all to pass. Would it be possible to get a test added for this to ensure that it isn't reverted at some later point and maybe a comment on the regex replacements just to indicate their purpose?

Hi @landonxjames, I've replaced the original commit with a newer one which includes comments and test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants