Anonymization matching strategy #18

maks-operlejn-ds · 2023-09-11T09:46:02Z

No description provided.

mateusz-wosinski-ds

I'm not sure about current implementation. Shouldn't this method used rather for anonymize than denanonymize ? I guess we want to catch inaccuracies before anonymization will happen, so that the instance matching will be more precise 🤔

e.g. there may be both John Kennedy and John F. Kenedy in the input and we want to anonymize them to the same value. Am I missing something?

libs/experimental/langchain_experimental/data_anonymizer/deanonymizer_matching_strategies.py

docs/extras/guides/privacy/presidio_data_anonymization/reversible.ipynb

maks-operlejn-ds · 2023-09-14T14:01:07Z

I'm not sure about current implementation. Shouldn't this method used rather for anonymize than denanonymize ? I guess we want to catch inaccuracies before anonymization will happen, so that the instance matching will be more precise 🤔

e.g. there may be both John Kennedy and John F. Kenedy in the input and we want to anonymize them to the same value. Am I missing something?

@mateusz-wosinski-ds I would say we should focus on deanonymize, as in general, we want to pass some anonymized string to LLM and then, match entities of the response, that can be slightly different

But maybe we should think about doing it also in anonymization phase - this would require to rebuild anonymizing part from scratch and I guess at some point I will do it, but for now I would leave it like that. But we can discuss during daily 😄

libs/experimental/langchain_experimental/data_anonymizer/deanonymizer_matching_strategies.py

This reverts commit 20c29b7.

maks-operlejn-ds added 5 commits September 11, 2023 09:45

WIP fuzzy matching strategy

b7cb028

Add more strategies

83f996f

Adjust anonymizer to strategies

2d7f426

Docs update

d59b5e6

Lint

0dece6f

maks-operlejn-ds changed the title ~~WIP: fuzzy matching strategy~~ Anonymization matching strategy Sep 13, 2023

maks-operlejn-ds marked this pull request as ready for review September 13, 2023 14:47

Remove temporary file

509c674

mateusz-wosinski-ds reviewed Sep 14, 2023

View reviewed changes

libs/experimental/langchain_experimental/data_anonymizer/deanonymizer_matching_strategies.py Show resolved Hide resolved

maks-operlejn-ds added 8 commits September 14, 2023 19:56

Add better fuzzy matching method

15e1ef6

Add tests

0a43239

Lint

575efc5

Lint 2

319c1ed

Merge branch 'master' into better-deanonymizer-matching-strategy

b6e020b

Exact match strategy for anonymization

8adc212

Remove unnecessary fixtures from tests

20c29b7

Revert "Remove unnecessary fixtures from tests"

33e827c

This reverts commit 20c29b7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anonymization matching strategy #18

Anonymization matching strategy #18

maks-operlejn-ds commented Sep 11, 2023

mateusz-wosinski-ds left a comment

maks-operlejn-ds commented Sep 14, 2023

Anonymization matching strategy #18

Are you sure you want to change the base?

Anonymization matching strategy #18

Conversation

maks-operlejn-ds commented Sep 11, 2023

mateusz-wosinski-ds left a comment

Choose a reason for hiding this comment

maks-operlejn-ds commented Sep 14, 2023