-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Architecture Pattern Feature] Connect & Exchange Dataspace Usage Pattern #5
Comments
This pattern is still a draft, I am still detailing it. It has the potential of harmonizing our data exchange and usage of the EDC as a dataspace. |
Ok now I have refactored the "introduction" of the pattern. I mean the actual content of it will come in a PR. So I will not detail more this issue. |
More patterns or architecture usage recommendations will also come in the future. |
I added a disclaimer so it is more clear what was not included in the scope. |
I want to merge a first draft into the main so we have something already: The updates to the documentation can be done incrementally. |
Pending to merge the PR, waiting for approval from @ndr-brt and @lgblaumeiser |
Connect & Exchange Dataspace Usage Pattern
1. Context
This data space usage pattern comes from the idea of:
The same applies to:
Now a days there is a big overhead at the "Application" layer from Tractus-X. Every application is using the Eclipse Dataspace Connector (EDC) in their own way. In case of Tractus-X we are using the Tractus-X EDC. The problem is that there is no harmonization in between the consumer and provider applications how they can in the most optimized way, and specially in a "fast" way retrieve data from the network.
They first need to understand how the complete protocol and EDC works to finally adopt the network.
Therefore there was built the "EDR" (Endpoint Data Reference) Interface. It was built for simplifying the negotiation and transfer process for the application. Now it is stable since the Jupiter release from Catena-X (TX-EDC >v0.7.5) and is available out there, however there is no usage pattern yet provider for the application side.
Another point is, the EDC still needs to be optimized and several breaking changes are being planned for the future, and therefore all the applications will need to change their way they use the EDC.
For example: If in the future we want to move to the "bring your own identity approach" where everyone brings their DID we need then to expect the applications will care about this.
And that is a huge amount of work for applications to be adapting every release to the latest "base" data space components.
Therefore this pattern aims to reduce the complexity on the Application layer, easing the adoption of the dataspace for any application using the EDC.
In the end from the application side there is only two things that matter when consuming information:
The rest can be abstracted, and this is going to be defined here in this usage pattern which could be implemented in a reference implementation or at any application if followed correctly.
Right now the thing that is happening is that the connections are open and they are not used. The Eclipse Dataspace Connector was not designed to keep so many connections open, for every asset we want to retrieve a new connection is open, and just one data exchange is done, and then the connection keeps open. And the applications are not aweare of that.
2. The pattern
Architecture Design Patterns Used: Adapter + Proxy
2.1 Name
Connect & Exchange
2.2 Description
Often people don't really understand the sense of the Eclipse Dataspace Connector, in the end it is a "Infrastructure Enabler", it enables the infrastructure at your company, so information can be accessed, and what the EDC does is to set a "security layer" over the data endpoints.
But in the end you will ALWAYS have applications or services behind the EDC which will respond with information or confirm that the operation was executed.
The only thing that it creates is a "PIPE" in between the companies and the connection remains open.
Using the EDR interface applications are able to negotiate assets from the catalog without needing to execute a transfer to retrieve the authorization. That was often a problem because the applications required always a "DNS resolvable domain" for the EDR token to be callbacked into the application. But with the EDR interface now the application can be deployed in the local machine from the consumer or in a private infrastructure and there it can still interact with the EDC, retrieving data and communicating with the other EDC (provider). Using the EDR interface the EDC will keep the channel open until any of the conditions change, allowing data to be exchanged in a very efficient way.
Several applications right now are re doing the negotiation over and over again for every asset that wants to be retreived, causing the EDC to slow down affecting its performance.
I have already done a proof of concept using this pattern and it works really good! The point is that the negotiation does not needs to be redone every time, if the data "pipe" is open the data exchange can flow, at least until one from the following conditions change:
And once this connection is open I need only the "transferProcessId" to get the authorization from the EDC, because the EDR is already being cached at the EDC, and the token will be kept updated.
So the connection keeps open and is really really FAST (just depends on the content weight and the http overhead).
In the end the level of abstraction is to say, as a consumer, I want to be able to call a "proxy", telling my conditions Policies, Asset to Connect, and then I can make make "do_post" or "do_get" and all the connection details will be abstracted.
With this approach the application can shift from one application to another. For example "Digital Twin Registry" and "Submodel Service" or any other app that I have the connection open.
2.3 Story line
2.3.1 Prerequisites
The following prerequisites are needed, how they are found is out of the scope, and is defined is up to the use case to define that.
2.3.2 Connect Phase
Important
The application MUST store this information under:
If all this conditions stay the same the connection will remain open, until the EDC cuts the connection (because the contract agreement is not more valid)
2.3.3 Exchange Phase
Now when the application wants to retrieve data again, the "proxy" application will reuse the
transferProcessId
to query the token (step 6) and will concatenate any other specified "api path" to the data plane url (step 8) until the open connection expires, so the (step 2) will be repeated.In this way connections can be open 1 a year (10 seconds) between to companies for a specific application, and then it can be accessed over and over again, reusing this "tunnel" or "pipe" that is created between this two companies, retrieving data in last than "0,6253s" (tested by me) over and over again. And you can even open a second, third, fourth connection and shift in between applications and conditions.
3. Potential Users
Applications like the:
4. Advantages
5. Down Sides
- If it breaks, is discontinued or is not maintained it will not be more valid this pattern and the logic of the EDR will need to be reimplemented.
- The first EDR negotiation takes 10 seconds and requires a interactive intents to check when the "EDR" is available at the cache, maybe a callback can still be implemented to notify that the data is ready. But in this way a "domain" will be needed from the application side.
6. Tasks
7. Disclaimers
The text was updated successfully, but these errors were encountered: