Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: apple support #89

Draft
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

pollend
Copy link
Contributor

@pollend pollend commented Aug 29, 2024

No description provided.

Comment on lines 33 to 54
if (strstr(m_Desc.adapterDesc.name, "Apple"))
{
m_Desc.adapterDesc.vendor = nri::Vendor::APPLE;
} else {
const uint64_t regID = [m_Device registryID];
if (regID)
{
io_registry_entry_t entry = IOServiceGetMatchingService(kIOMasterPortDefault, IORegistryEntryIDMatching(regID));
if (entry)
{
// That returned the IOGraphicsAccelerator nub. Its parent, then, is the actual PCI device.
io_registry_entry_t parent;
if (IORegistryEntryGetParentEntry(entry, kIOServicePlane, &parent) == kIOReturnSuccess)
{
m_Desc.adapterDesc.vendor = GetVendorFromID(GetEntryProperty(parent, CFSTR("vendor-id")));
m_Desc.adapterDesc.deviceId = GetEntryProperty(parent, CFSTR("device-id"));
IOObjectRelease(parent);
}
IOObjectRelease(entry);
}
}
}
Copy link
Contributor Author

@pollend pollend Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something interesting, I guess apple doesn't report the device/vendor.

    "Apple": {
      "id": "0x106b",

      "_comment": [
        "Apple GPUs do not report a DeviceID via the Metal API, and as such the typical device",
        "pattern matching does not work for them. The recommended approach is to find the highest",
        "supported 'common' family supported and report it as the architecture.",
        "Examples: 'common-1', 'common-3'",
        "https://developer.apple.com/documentation/metal/gpu_devices_and_work_submission/detecting_gpu_features_and_metal_software_versions"
      ]
    },

Copy link
Contributor

@vertver vertver Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a problem. Just do what Metal's layer says and just pass the single device.

@pollend pollend mentioned this pull request Aug 29, 2024
@pollend pollend force-pushed the feature/apple-metal-support branch from 20992dc to a5806a5 Compare August 30, 2024 15:08
Comment on lines +1452 to +1295
// defined in apple framework
#undef INTEL
#undef AMD
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like these are #defined in apple framework.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind to add these undefs to the interface.

Comment on lines 58 to 165
constexpr std::array<MTLDataType, (size_t)DescriptorType::MAX_NUM> DESCRIPTOR_TYPES = {
MTLDataTypeSampler, // SAMPLER
MTLDataTypeNone, // CONSTANT_BUFFER
MTLDataTypeTexture, // TEXTURE
MTLDataTypeNone, // STORAGE_TEXTURE
MTLDataTypeStruct, // BUFFER
MTLDataTypeStruct, // STORAGE_BUFFER
MTLDataTypeArray, // STRUCTURED_BUFFER
MTLDataTypeStruct, // STORAGE_STRUCTURED_BUFFER
MTLDataTypePrimitiveAccelerationStructure // ACCELERATION_STRUCTURE
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how this will work in practice.

@pollend pollend force-pushed the feature/apple-metal-support branch from 69abd41 to 51a6f96 Compare September 5, 2024 04:59
@pollend
Copy link
Contributor Author

pollend commented Sep 5, 2024

@dzhdanNV just an observation you nearly have a pure C interface any reason for the C++ code to abstract across that. you can also just union a bunch of stuff together for dx12/dx11/vulkan/metal. You also have all this logic to build tables of functions etc ... avoid having to use a C++ so it should be easier to distribute. I guess you could just access that data based on the api.

struct CmdBuffer {
   union {
      struct {...}dx11;
      struct {...}dx12;
      struct {...}metal;
   };
}

struct NriDevice {
   union {
      struct {...}dx11;
      struct {...}dx12;
      struct {...}metal;
   };
   int (*Method1)(NriDevice*...);
}

indexBufferOffset: m_CurrentIndexCmd.m_Offset];
}
void CommandBufferMTL::DrawIndirect(const Buffer& buffer, uint64_t offset, uint32_t drawNum, uint32_t stride, const Buffer* countBuffer, uint64_t countBufferOffset) {}
void CommandBufferMTL::DrawIndexedIndirect(const Buffer& buffer, uint64_t offset, uint32_t drawNum, uint32_t stride, const Buffer* countBuffer, uint64_t countBufferOffset) {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping me when you're done with the base implementation, I'll try to get samples to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, argument buffers can be used to emulate root constants/push constants. But yes, you can also use it for count buffers

Copy link
Contributor Author

@pollend pollend Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, argument buffers can be used to emulate root constants/push constants. But yes, you can also use it for count buffers

feel free to jump in if you know what to do, you can drop me a PR and i can merge it. still looking around at the metal docs and trying to roughly get something together, will take quite a bit there is a lot here to cover.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vertver I think i have the core pieces in place still not sure about the overall implementation.

Comment on lines 28 to 68
Result CommandBufferMTL::Begin(const DescriptorPool* descriptorPool) {

}
Result CommandBufferMTL::End() {

}
void CommandBufferMTL::SetPipeline(const Pipeline& pipeline) {
if (m_CurrentPipeline == (PipelineMTL*)&pipeline)
return;
PipelineMTL& pipelineImpl = (PipelineMTL&)pipeline;
m_CurrentPipeline = &pipelineImpl;


}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how these will fit together i only know about encoder once the pipeline is set. the attachment is know only with BeginRendering

https://developer.apple.com/documentation/metal/mtlcommandencoder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have something rough for this CommandBufferMTL. hopefully I can get something rough.

indexBufferOffset: m_CurrentIndexCmd.m_Offset];
}
void CommandBufferMTL::DrawIndirect(const Buffer& buffer, uint64_t offset, uint32_t drawNum, uint32_t stride, const Buffer* countBuffer, uint64_t countBufferOffset) {}
void CommandBufferMTL::DrawIndexedIndirect(const Buffer& buffer, uint64_t offset, uint32_t drawNum, uint32_t stride, const Buffer* countBuffer, uint64_t countBufferOffset) {}
Copy link
Contributor Author

@pollend pollend Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, argument buffers can be used to emulate root constants/push constants. But yes, you can also use it for count buffers

feel free to jump in if you know what to do, you can drop me a PR and i can merge it. still looking around at the metal docs and trying to roughly get something together, will take quite a bit there is a lot here to cover.

@pollend pollend force-pushed the feature/apple-metal-support branch from b2f4953 to 56f38db Compare October 1, 2024 20:04
@pollend
Copy link
Contributor Author

pollend commented Oct 22, 2024

been busy with something else, I'll have to catch up to this once i finish up what i want to do with this game project im working on.

@dzhdanNV
Copy link
Collaborator

Sure, no prob. And tons of thanks for working on native Apple support!

@pollend pollend force-pushed the feature/apple-metal-support branch from b727121 to b027a75 Compare November 14, 2024 04:03
@dzhdanNV
Copy link
Collaborator

Thanks for the progress. Very appreciated.

@pollend
Copy link
Contributor Author

pollend commented Nov 29, 2024

so i guess you encode the data to an argument buffer. I guess to reproduce descriptor sets the pool would allocate an argument buffer for n number of descriptor sets?

fragmentShader(       RasterizerData            in                 [[ stage_in ]],
               device FragmentShaderArguments & fragmentShaderArgs [[ buffer(AAPLFragmentBufferIndexArguments) ]])

@dzhdanNV
Copy link
Collaborator

To be honest I don't follow. Please, explain the problem and a solution. Is the question about indirect drawing or something else?

@vertver
Copy link
Contributor

vertver commented Nov 30, 2024

I guess to reproduce descriptor sets the pool would allocate an argument buffer for n number of descriptor sets?

Correct. Metal requires to fill argument buffers even for non-indirect draw calls, so you need to store descriptors in it.

@pollend
Copy link
Contributor Author

pollend commented Dec 1, 2024

any clue how these properties line up with DescriptorRangeDesc?
https://developer.apple.com/documentation/metal/mtlargumentdescriptor?language=objc

for(size_t r = 0; r < descriptorSetDesc.rangeNum; r++) {
            MTLArgumentDescriptor* argDescriptor = [MTLArgumentDescriptor argumentDescriptor];
            const DescriptorRangeDesc* range = &descriptorSetDesc.ranges[r];
            argDescriptor.arrayLength = range->descriptorNum;
            argDescriptor.access = MTLBindingAccessReadWrite;
            argDescriptor.index = range->baseRegisterIndex;
            switch(range->descriptorType) {
                case DescriptorType::TEXTURE:
                    argDescriptor.dataType = MTLDataTypeTexture;
                    argDescriptor.textureType = MTLTextureType2D; // descriptor type does not have this
                    break;
                case DescriptorType::SAMPLER:
                    argDescriptor.dataType = MTLDataTypeSampler;
                    break;
                case DescriptorType::CONSTANT_BUFFER:
                case DescriptorType::STORAGE_TEXTURE:
                case DescriptorType::BUFFER:
                case DescriptorType::STORAGE_BUFFER:
                case DescriptorType::STRUCTURED_BUFFER:
                case DescriptorType::STORAGE_STRUCTURED_BUFFER:
                    argDescriptor.dataType = MTLDataTypeStruct;
                    break;
                case DescriptorType::ACCELERATION_STRUCTURE:
                    argDescriptor.dataType = MTLDataTypePrimitiveAccelerationStructure;
                    break;
            }
        }

@dzhdanNV
Copy link
Collaborator

dzhdanNV commented Dec 2, 2024

An obvious suggestion: you can look at MoltenVK sources and work this way:

NRI => open NRI/VK => open corresponding place in MoltenVK => analyze, rearrange, copy & paste to MTL implementation

(looking at MTL docs...)

@pollend
Copy link
Contributor Author

pollend commented Dec 2, 2024

An obvious suggestion: you can look at MoltenVK sources and work this way:

NRI => open NRI/VK => open corresponding place in MoltenVK => analyze, rearrange, copy & paste to MTL implementation

(looking at MTL docs...)

I've been looking at the MoltenVK source, thanks for the suggestion. umm, ill have to do a bit of research i don't want to have to extend DescriptorType. maybe there is a more ideal solution?

@dzhdanNV
Copy link
Collaborator

dzhdanNV commented Dec 2, 2024

Why do you need to potentially extend DescriptorType? Because of this:

                case DescriptorType::TEXTURE:
                    argDescriptor.dataType = MTLDataTypeTexture;
                    argDescriptor.textureType = MTLTextureType2D; // descriptor type does not have this
                    break;

?

@pollend
Copy link
Contributor Author

pollend commented Dec 2, 2024

Why do you need to potentially extend DescriptorType? Because of this:

                case DescriptorType::TEXTURE:
                    argDescriptor.dataType = MTLDataTypeTexture;
                    argDescriptor.textureType = MTLTextureType2D; // descriptor type does not have this
                    break;

?

https://developer.apple.com/documentation/metal/mtltexturetype

defines 1D/1DArray/2D/2dArray/Cube/CubeArray etc ... ill have to see NRISamples when I get there. just something to note later on. Ummm, I still need to finish metal cmd buffers.

@vertver
Copy link
Contributor

vertver commented Dec 2, 2024

You can do the same trick as here -

switch (textureViewDesc.viewType) {
Just identify your texture type by texture parameters, that's all you need.

@pollend
Copy link
Contributor Author

pollend commented Dec 2, 2024

You can do the same trick as here -

switch (textureViewDesc.viewType) {

Just identify your texture type by texture parameters, that's all you need.

in what way?

@vertver
Copy link
Contributor

vertver commented Dec 2, 2024

AFAIK, MTLTextureDescriptor is a direct analogue of NriDescriptor, so it shouldn't be hard to identify the texture type when creating the descriptor. If I'm right, you can use TextureViewType to identify the MTLTextureType.

@dzhdanNV
Copy link
Collaborator

dzhdanNV commented Dec 2, 2024

The problem is that there are no descriptors in DescriptorRangeDesc. In other places - yes. In this and hopefully the only one place - no.

@vertver
Copy link
Contributor

vertver commented Dec 2, 2024

@pollend pollend force-pushed the feature/apple-metal-support branch from b4079f3 to a84371e Compare December 24, 2024 17:48
Comment on lines 220 to 334
void CommandBufferMTL::ClearAttachments(const ClearDesc* clearDescs, uint32_t clearDescNum, const Rect* rects, uint32_t rectNum) {


Restore();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there isn't a similar implementation in metal for this. umm I guess it would have to cache out a pipeline and render rectangles to the various layers then restore some of the command buffer state?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you could look at MoltenVK (VK to Metal) for vkCmdClearAttachments.

P.S. (since I'm here) Really appreciate your work!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see that umm, that does make this complicated. i guess i have to cache all the pipeline state and submit it in one batch. I think you can only have one encoder at any one time for a cmd buffer. does make the data structure a bit fat.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can try suggesting API changes simplifying Metal implementation. Very undesired to be honest :)

Copy link
Contributor Author

@pollend pollend Dec 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the moment im tracking any state for the vieport and only start a rendering encoder when a pipeline is set. some of the other state is reliant on knowing what the pipeline is before being able to set descriptor sets. for vertex and index buffers I have a bitset to track which vertex stream are used. the whole thing is cleared when EndRendering is called. Any thoughts on this approach? vulkan lets you persist state outside of begin and end. for metal I have to restore it when you start a rendering encoder. I think from the documentation you can only ever have one active rendering encoder at any one moment for a command buffer.

Comment on lines +420 to +429
MTLSize threadgroupCount = MTLSizeMake(dispatchDesc.x, dispatchDesc.y, dispatchDesc.z);
// MTLSize threadPerThreadGroup = MTLSizeMake(dispatchDesc.x, dispatchDesc.y, dispatchDesc.z);
// [m_ComputeEncoder
// dispatchThreadgroups: threadgroupCount
// threadsPerThreadgroup: threadPerThreadGroup];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass the number of dispatchThreadgroups and threadsPerThreadgroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any clue how this would be managed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MoltenVK pulls threadsPerThreadgroup from shaders and stores in pipelines. This value becomes available when a pipeline gets bound.

Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
@pollend pollend force-pushed the feature/apple-metal-support branch from 926b2b3 to 89637d6 Compare December 29, 2024 19:45
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
Signed-off-by: Michael Pollind <[email protected]>
if(m_RendererEncoder) {
if (m_barrierFlags & BarrierBits::BARRIER_FLAG_BUFFERS)
{
[m_RendererEncoder memoryBarrierWithScope:MTLBarrierScopeBuffers
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is sufficient at all.

Metal memory barriers only impact the pass they are in: https://developer.apple.com/videos/play/wwdc2022/10101/?time=1495
So it could still overlap with a different pass in the same command buffer or with another command buffer.

Meanwhile Vulkan barriers (and I assume the NRI barriers that look very similar) go beyond that and impact the whole queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants