Skip to content

Commit

Permalink
issue #39 - SQLite support (#63)
Browse files Browse the repository at this point in the history
  • Loading branch information
pounard authored Nov 24, 2023
1 parent 7d5e492 commit b274696
Show file tree
Hide file tree
Showing 15 changed files with 281 additions and 61 deletions.
19 changes: 17 additions & 2 deletions dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -154,16 +154,30 @@ do_test_sqlsrv() {
do_test_sqlsrv2019
}

# SQLite version depends upon the PHP embeded version or linked
# library, we cannot target X or Y version.
do_test_sqlite() {
section_title "Running tests with SQLite"
docker compose -p db_tools_bundle_test exec \
-e DBAL_DRIVER=pdo_sqlite \
-e DBAL_DBNAME=test_db \
-e DBAL_HOST=127.0.0.1 \
-e DATABASE_URL="pdo-sqlite:///:memory:" \
phpunit vendor/bin/phpunit $@
}

# Run PHPunit tests for all database vendors
do_test_all() {
do_composer_install

do_test_mysql57
# @todo Temporary deactivated MySQL 5.7 due to a bug.
# do_test_mysql57
do_test_mysql80
do_test_mariadb11
do_test_postgresql10
do_test_postgresql16
do_test_sqlsrv2019
do_test_sqlite
}

do_test_notice() {
Expand All @@ -181,6 +195,7 @@ do_test_notice() {
printf "\n - ${GREEN}postgresql16${NC}: Launch test for PostgreSQL 16"
printf "\n - ${GREEN}sqlsrv${NC}: Launch test for SQL Server 2019"
printf "\n - ${GREEN}sqlsrv2019${NC}: Launch test for SQL Server 2019"
printf "\n - ${GREEN}sqlite${NC}: Launch test for SQLite"
printf "\n\nYou can then use PHPUnit option as usual:"
printf "\n${GREEN}./dev.sh test mysql --filter AnonymizatorFactoryTest${NC}"
printf "\n\n"
Expand All @@ -192,7 +207,7 @@ do_test() {
if [[ -n $@ ]];then shift;fi

case $suit in
mysql57|mysql80|mariadb11|mysql|postgresql10|postgresql16|postgresql|sqlsrv2019|sqlsrv) do_composer_install && do_test_$suit "$@";;
mysql57|mysql80|mariadb11|mysql|postgresql10|postgresql16|postgresql|sqlsrv2019|sqlsrv|sqlite) do_composer_install && do_test_$suit "$@";;
*) do_test_notice;;
esac
}
Expand Down
14 changes: 14 additions & 0 deletions docs/content/anonymization/core-anonymizers.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,15 @@ This page list all *Anonymizers* provided by *DbToolsBundle*.

## EmailAnonymizer

EmailAnonymizer uses a hash function on the original value to make each unique email
anonymization reproducible accross tables.

:::warning
SQLite does implement `MD5()` function, neither any hashing function: in order to get
around this, the `rowid` value is used instead which prevent email values anonymization
from being reproducible across tables.
:::

This *Anonymizer* will fill configured column with value looking like `[username]@[domain.tld]`
where:
* `[username]` is a md5 hash of the pre-anonymization value
Expand Down Expand Up @@ -213,6 +222,11 @@ user:

## Md5Anonymizer

:::warning
SQLite does implement `MD5()` function, neither any hashing function, this anonymizer
cannot be used with SQLite.
:::

This *Anonymizer* will fill configured column with a md5 hash of the pre-anonymization value.

::: code-group
Expand Down
34 changes: 34 additions & 0 deletions docs/content/anonymization/internals.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,40 @@ MySQL has many limitations:
In order to workaround this, we chose to create an index over the anonymizer
identifier sequence, which wouldn't be necessary otherwise.

### SQLite specifics

SQLite does not permit DDL statements that alter tables, it forces you to
create a new table with the new schema, copy data, then remove the previous
version and rename the new one.

In order to avoid data copy, we are going to use the `rowid` magic column
instead for joining, which allows us to work without adding the unique
identifier on tables.

Hence the following SQL variant:

```sql
UPDATE
"client"
SET
"nom" = "sample_1"."value",
"civilite" = "sample_2"."value"

FROM (
SELECT
*,
"rowid" AS "_db_tools_id"
FROM "client"
) AS "_target_table"

-- ...

WHERE
"client"."rowid" = "_target_table"."_db_tools_id"
```

Which then allows all anonymizers working consistently with SQLite.

### SQL Server specifics

When using the same table as the updated one in the `FROM` clause, SQL Server
Expand Down
3 changes: 1 addition & 2 deletions docs/content/database-vendors.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ Here is a matrix of the current state of support:
| MariaDB 11 or higher | <Badge type="tip" text="" title="Working"/> | <Badge type="tip" text="" title="Working"/> | <Badge type="tip" text="" title="Working"/> |
| MySQL 5.7 or higher | <Badge type="tip" text="" title="Working"/> | <Badge type="tip" text="" title="Working"/> | <Badge type="tip" text="" title="Working"/> |
| SQLite | <Badge type="danger" text="" title="Unsupported"/> | <Badge type="warning" text="~" title="Only unit-tested"/> | <Badge type="danger" text="" title="Unsupported"/> |
| Oracle | <Badge type="danger" text="" title="Unsupported"/> | <Badge type="warning" text="~" title="Only unit-tested"/> | <Badge type="danger" text="" title="Unsupported"/> |
| SQL Server | <Badge type="danger" text="" title="Unsupported"/> | <Badge type="warning" text="~" title="Only unit-tested"/> | <Badge type="danger" text="" title="Unsupported"/> |
| SQL Server | <Badge type="danger" text="" title="Unsupported"/> | <Badge type="warning" text="~" title="Only unit-tested"/> | <Badge type="danger" text="" title="Unsupported"/> |

<Badge type="tip" text="" /> Working - <Badge type="warning" text="~" /> Only unit-tested - <Badge type="danger" text="" /> Unsupported
1 change: 1 addition & 0 deletions docs/content/getting-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Currently supported database vendors:
- PostgreSQL 10 or higher (previous versions from 9.5 are untested but should work)
- MariaDB 11 or higher
- MySQL 5.7 or higher
- SQLite >= 3
- SQL Server 2019 or higher (previous versions from 2015 are untested but should work)

::: info
Expand Down
84 changes: 55 additions & 29 deletions src/Anonymization/Anonymizator.php
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
use Doctrine\DBAL\Connection;
use Doctrine\DBAL\Platforms\AbstractMySQLPlatform;
use Doctrine\DBAL\Platforms\SQLServerPlatform;
use Doctrine\DBAL\Platforms\SqlitePlatform;
use Doctrine\DBAL\Schema\Column;
use Doctrine\DBAL\Schema\SchemaDiff;
use Doctrine\DBAL\Schema\TableDiff;
Expand Down Expand Up @@ -240,40 +241,37 @@ protected function createUpdateQuery(string $table): DoctrineUpdate

$expr = $update->expression();

//
// Add target table a second time into the FROM statement of the
// UPDATE query, in order for anonymizers to be able to JOIN over
// it. Otherwise, JOIN would not be possible for RDBMS that speak
// standard SQL.
//
// This is the only and single hack regarding the UPDATE clause
// syntax, all RDBMS accept the following query:
//
// UPDATE foo
// SET val = bar.val
// FROM foo AS foo_2
// JOIN bar ON bar.id = foo_2.id
// WHERE foo.id = foo_2.id
//
// Except for SQL Server, which cannot deambiguate the foo table
// reference in the WHERE clause, so we have to write it this
// way:
//
// UPDATE foo
// SET val = bar.val
// FROM (
// SELECT *
// FROM foo
// ) AS foo_2
// JOIN bar ON bar.id = foo_2.id
// WHERE foo.id = foo_2.id
//
// Which by the way also works with other RDBMS, but is an
// optimization fence for some, because the nested SELECT becomes
// a temporary table (especially for MySQL...). For those we need
// to keep the original query, even if semantically identical.
//
if (AbstractBridge::SERVER_SQLSERVER === $builder->getServerFlavor()) {
// This is the only and single hack regarding the UPDATE clause
// syntax, all RDBMS accept the following query:
//
// UPDATE foo
// SET val = bar.val
// FROM foo AS foo_2
// JOIN bar ON bar.id = foo_2.id
// WHERE foo.id = foo_2.id
//
// Except for SQL Server, which cannot deambiguate the foo table
// reference in the WHERE clause, so we have to write it this
// way:
//
// UPDATE foo
// SET val = bar.val
// FROM (
// SELECT *
// FROM foo
// ) AS foo_2
// JOIN bar ON bar.id = foo_2.id
// WHERE foo.id = foo_2.id
//
// Which by the way also works with other RDBMS, but is an
// optimization fence for some, because the nested SELECT becomes
// a temporary table (especially for MySQL...). For those we need
// to keep the original query, even if semantically identical.
$update->join(
$builder->select($table),
$expr->where()->isEqual(
Expand All @@ -282,6 +280,23 @@ protected function createUpdateQuery(string $table): DoctrineUpdate
),
AbstractAnonymizer::JOIN_TABLE
);
} else if (AbstractBridge::SERVER_SQLITE === $builder->getServerFlavor()) {
// SQLite doesn't support DDL statements on tables, we cannot add
// the join column with an int identifier. But, fortunately, it does
// have a special ROWID column which is a unique int identifier for
// each row we can use instead.
// @see https://www.sqlite.org/lang_createtable.html#rowid
$update->join(
$builder
->select($table)
->column('*')
->column('rowid', AbstractAnonymizer::JOIN_ID),
$expr->where()->isEqual(
$expr->column("rowid", $table),
$expr->column(AbstractAnonymizer::JOIN_ID, AbstractAnonymizer::JOIN_TABLE),
),
AbstractAnonymizer::JOIN_TABLE
);
} else {
$update->join(
$table,
Expand Down Expand Up @@ -337,6 +352,17 @@ public function addAnonymizerIdColumn(string $table): void

$platform = $this->connection->getDatabasePlatform();

if ($platform instanceof SqlitePlatform) {
// Do nothing, SQLite doesn't support DDL statements, you need to
// recreate a new table with the new schema, then copy all data.
// That's not what we want.
// SQLite has a ROWID special column that does exactly what we need
// i.e. having a unique int identifier for each row on which we can
// join on.
// @see https://www.sqlite.org/lang_createtable.html#rowid
return;
}

if ($platform instanceof AbstractMySQLPlatform) {
$this->addAnonymizerIdColumnMySql($table);

Expand Down
22 changes: 22 additions & 0 deletions src/Anonymization/Anonymizer/AbstractAnonymizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,28 @@ final public function getColumnName(): string
return $this->columnName;
}

/**
* Get join column name.
*
* @internal
* Public for unit tests only, otherwise protected.
*/
final public function getJoinId(): string
{
return self::JOIN_ID;
}

/**
* Get join column expression.
*
* @internal
* Public for unit tests only, otherwise protected.
*/
protected function getJoinColumn(): Expression
{
return ExpressionFactory::column($this->getJoinId(), self::JOIN_TABLE);
}

/**
* Initialize your anonymizer.
*
Expand Down
4 changes: 2 additions & 2 deletions src/Anonymization/Anonymizer/AbstractEnumAnonymizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,14 @@ public function anonymize(Update $update): void
->range($targetCount) // Avoid duplicate rows.
;

$update->join(
$update->leftJoin(
$join,
$expr
->where()
->raw(
'? + 1 = ?',
[
$expr->mod($expr->column(self::JOIN_ID, self::JOIN_TABLE), $sampleCount),
$expr->mod($this->getJoinColumn(), $sampleCount),
$expr->column('rownum', $joinAlias),
]
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,12 @@ public function anonymize(Update $update): void
->range($targetCount) // Avoid duplicate rows.
;

$update->join(
$update->leftJoin(
$join,
$expr->where()->raw(
'? + 1 = ?',
[
$expr->mod($expr->column(self::JOIN_ID, self::JOIN_TABLE), $sampleCount),
$expr->mod($this->getJoinColumn(), $sampleCount),
$expr->column('rownum', $joinAlias),
]
),
Expand Down
9 changes: 8 additions & 1 deletion src/Anonymization/Anonymizer/Core/EmailAnonymizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

namespace MakinaCorpus\DbToolsBundle\Anonymization\Anonymizer\Core;

use Doctrine\DBAL\Platforms\SqlitePlatform;
use MakinaCorpus\DbToolsBundle\Anonymization\Anonymizer\AbstractAnonymizer;
use MakinaCorpus\DbToolsBundle\Attribute\AsAnonymizer;
use MakinaCorpus\QueryBuilder\Query\Update;
Expand All @@ -24,12 +25,18 @@ public function anonymize(Update $update): void
{
$expr = $update->expression();

if ($this->connection->getDatabasePlatform() instanceof SqlitePlatform) {
$emailHashExpr = $this->getJoinColumn();
} else {
$emailHashExpr = $expr->md5($expr->column($this->columnName, $this->tableName));
}

$update->set(
$this->columnName,
$this->getSetIfNotNullExpression(
$expr->concat(
'anon-',
$expr->md5($expr->column($this->columnName, $this->tableName)),
$emailHashExpr,
'@',
$this->options->get('domain', 'example.com'),
),
Expand Down
9 changes: 3 additions & 6 deletions src/Anonymization/Anonymizer/Core/FloatAnonymizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,10 @@ public function anonymize(Update $update): void
$this->columnName,
$this->getSetIfNotNullExpression(
$expr->raw(
'FLOOR(? * (? - ? + 1) + ?) / ?',
'cast(floor(?) / ? as float)',
[
$this->getRandomExpression(),
$expr->cast($max * $precision, 'int'),
$min * $precision,
$min * $precision,
$precision
$this->getRandomIntExpression($max * $precision, $min * $precision),
$expr->cast($precision, 'float'),
]
)
),
Expand Down
Loading

0 comments on commit b274696

Please sign in to comment.