-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for postgres-xl table distribution #1697
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jmorjsm thanks, that looks like a good start!
Here's a first round of review comments.
- I wouldn't worry too much about PostgreSQL-XL vs. Greenplum - the two syntaxes definitely look different enough to warrant separate APIs, even if one partially subsumes the other.
- Isn't there a missing closing curly brace in the PostgreSQL-XL spec, i.e. the one that's opened after DISTRIBUTED? I have a hard time understand which options exactly are mutually exclusive here.
- Somewhat related: the PostgreSQL-XL syntax (which I know nothing about!) seems to distinguish between DISTRIBUTE BY and DISTRIBUTED BY; the former for REPLICATION/ROUNDROBIN/<column> with/without function, the latter for RANDOMLY/<column>. Does that mean there are really two ways to define things with a column (one with DISTRIBUTE BY, one with DISTRIBUTED BY)? I don't know if any of this is significant, but ideally the builder's FluentAPI would match this the DDL syntax (so separate DistributeBy and DistributedBy)
- In theory, we should validate that there aren't two entities mapped to the same table ("table splitting") with different DISTRIBUTE BY configurations. This is pretty far-fetched, but if you want to do it, take a look at what I did in SqlServerModelValidator in https://github.com/dotnet/efcore/pull/23904/files.
src/EFCore.PG/Metadata/Internal/PostgresXlDistributeByAnnotationNames.cs
Outdated
Show resolved
Hide resolved
{ | ||
var distributeBy = new PostgresXlDistributeBy(operation); | ||
|
||
var strategy = distributeBy.DistributionStrategy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can define a Deconstruct method on PostgresXlDistributeBy and use it in one line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the standard ValueTuple Deconstruct method as described here.
Thanks for your feedback here! CREATE TABLE distribute_test.test_distribute_by_replication (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY REPLICATION ;
CREATE TABLE distribute_test.test_distribute_by_roundrobin (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY ROUNDROBIN ;
CREATE TABLE distribute_test.test_distribute_by_hash_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY HASH (id);
CREATE TABLE distribute_test.test_distribute_by_modulo_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY MODULO (id);
-- DOES NOT WORK: CREATE TABLE distribute_test.test_distribute_by_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTE BY (id);
CREATE TABLE distribute_test.test_distributed_by_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTED BY (id);
CREATE TABLE distribute_test.test_distributed_randomly (id int, first_name varchar(100), last_name varchar(100), age int) DISTRIBUTED RANDOMLY;
CREATE TABLE distribute_test.test_diststyle_even (id int, first_name varchar(100), last_name varchar(100), age int) DISTSTYLE EVEN;
CREATE TABLE distribute_test.test_diststyle_all (id int, first_name varchar(100), last_name varchar(100), age int) DISTSTYLE ALL;
CREATE TABLE distribute_test.test_diststyle_key_distkey_id (id int, first_name varchar(100), last_name varchar(100), age int) DISTSTYLE KEY DISTKEY (id); Looks like |
@@ -1821,6 +1884,118 @@ public IndexColumn(string name, string @operator, string collation, SortOrder so | |||
public NullSortOrder NullSortOrder { get; } | |||
} | |||
|
|||
private static void ValidateTableDistributionProperties( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a local method at the end of Generate(CreateTableOperation), as it's not called by anyone else.
Also, distributedByColumnName
doesn't actually seem to be check anywhere.
I've answered some questions, let me know if you're blocking on anything from my side (note there's still also my questions in #1697 (review) which may need addressing). |
Yeah, it seems... complicated :) But I think the fluent APIs we expose should correspond to what is actually supported on the database side. |
…Builder extension methods
@Jmorjsm just to say that when this is ready for another look, please re-request a review (preferably after this passes the build too). |
This adds support for specifying a distribution strategy (docs) when creating tables against postgres-xl.