Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cursor_context calls #68

Merged
merged 6 commits into from
Jan 31, 2023
Merged

Refactor cursor_context calls #68

merged 6 commits into from
Jan 31, 2023

Conversation

leo-schick
Copy link
Member

Implement two methods to directly connect to the DB API (PEP-249):

class DB:

    # [...]

    def connect(self) -> object:
        """
        Constructor for creating a connection to the database.
        The returned connection object is PIP-249 compatible (DB-API).

        See also: https://peps.python.org/pep-0249/#connection-objects
        """
        raise NotImplementedError(f'Please implement connect for type "{self.__class__.__name__}"')

    def cursor_context(self) -> object:
        """
        A single iteration with a cursor context. When the iteration is
        closed, a commit is executed on the cursor.

        Example usage:
           with db.cursor_context() as cursor:
             cursor.execute('UPDATE table SET table.c1 = 1 WHERE table.id = 5')
        """
        # [...]

With that, we mark the individual database calls deprecated.

@jankatins
Copy link
Member

Just putting this out: an alternative solutions would be to add a

@singledispatch
def mara_db_connect(db DB) -> ... :
   raise NotImplementedError(f'Please implement mara_db_connect for type "{db.__class__.__name__}"')

This would be more in line with the rest of the usage of DB, where DB is used as "dumb" data structure without and functionality. Functionality is usually added via functional API and singledispatch. I'm unsure how "connect" usage outside of mara_* package actually looks like and what would be easier for these kind of users. I'm happy with both :-)

@leo-schick
Copy link
Member Author

The purpose is internal indeed: I am planning to refactor some of the mara code to support multiple database systems for the main mara db.

The second possible use case I have in mind is that sometimes I think it would be nice to have the possiblity to execute SQL queries through python and not through a cli tool of the database provider. E.g. the databricks cli tool is already a python tool. Executing a python tool from a python script multiple times makes the whole execution slow. The psql is much better here in performance

The handling of the connections could be done with a custom execution context (see mara/mara-pipelines#68). I currently do not plan to do this but just an idea which jumped into my head...

@leo-schick leo-schick force-pushed the refactor-cursor-context branch from 97428a5 to 184c181 Compare January 31, 2023 13:33
@leo-schick leo-schick added this to the Version 4.9.0 milestone Jan 31, 2023
@leo-schick
Copy link
Member Author

I will merge this now so that I can get forward with improving the other libraries. Even tho we could discuss how far the functional api paradigm of mara should be used; see e.g. the run method in the subclasses of mara_pipelines.pipelines.Command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants