Releases: WorksApplications/elasticsearch-sudachi
v3.3.0
Highlights
- Offset correction of
SudachiSplitFilter
now works properly with CharFilter #149 - SPI is changed to implement #149
- New methods are added to
MorphemeAttribute
- New methods are added to
- Add
allow_empty_morpheme
setting to the tokenizer (#151)- If false (default), when a char is split into multiple morphemes (e.g. ㍿), all morphemes will contain the char in their span.
- If true, only the first morpheme will contain the char and the span of other morphemes may be empty.
- Previously this was set true by default.
v3.2.3
Release v3.2.2
Highlights
- Use
lazyTokenizeSentences
for the analysis to fix the problem of input chunking (#137).
Breaking Changes
- Chunking behavior in v3.2.1 is fixed.
- Analysis works same as v.3.2.0.
Release v3.2.1
Highlights
- Fix OOM issue with a huge input by @kenmasumitsu in #132
- Huge input now split into relatively small (1M char) chunks now.
- Analysis maybe broken around the edge of chunks (open issue, see #131).
- Add documentation about Sudachi synonym dict by @sorami in #65
Breaking Changes
- Huge (>1M char) input now split into chunks to avoid OOM and the analysis may be broken around the edge of chunks (open issue, see #131).
Release v3.2.0
Release v3.1.1
Highlights
- Support Elasticsearch -8.13.4 and OpenSearch -2.14.0
- Fix dictionary caching problem
Release v3.1.0
Highlights
- OpenSearch support
- Fix trimming problems
- Extensibility support
OpenSearch support
We now support OpenSearch in addition to Elasticsearch. Plugins should work the same way as with Elasticsearch.
For the time being we test only on 2.6.0 and upper. There are no plans for supporting 1.* branch at the time being.
Because of OpenSearch support we changed the naming scheme of distribution zip to <engine kind>-<engine version>-analysis-sudachi-<plugin-version>.zip
Extensibility support
analysis-sudachi
plugin now support being extended by other plugins, both in OpenSearch and Elasticsearch.
When extending analysis-sudachi
please use sudachi-search-spi artifact as a provided dependency. We plan to have SPI stable, but the internal implementation of analysis-sudachi
will not be stable.
In Elasticsearch 8.3.*+ we utilize SPI-aware packaging and internal implementation will not be available to extending plugins.
Internal & Testing Improvements
For improving quality of releases we have greatly improved testing. Previously there were only unit tests for analysis logic, from 3.0.0 there are additionally (tier-2) integration tests which spawn full Elasticsearch instance and execute workload which perform parallel document indexing following by validation of results.
From 3.1.0 we have added tier-1 integration tests which perform relatively simple validation, however these tests are executed inside SecurityManager-present JVMs, simulating fully-fledged Elasticsearch instance. We hope that this procedure will increase quality of releases and help us to catch issues faster.
v3.0.1
v3.0.0
Highlights
- Sudachi is updated to 0.7.0
- Analysis results are cached within a single index
- All versions of ElasticSearch are supported by a single branch with some conditional compilation Gradle magic
- Implementation now uses Kotlin inside
Analysis cache
- Previous versions of ES Sudachi plugin were analyzing the input multiple times when using multiple analyzer chains (e.g. mode A, mode B, mode C, readings, dictionary form) for the same field. From this version, the underlying analysis is done only once, yielding n times speedup, where n is a number of configured analysis chains which stem from Sudachi.
v2.1.0 for Elasticsearch 5.6
- Added a new property
additional_settings
to write Sudachi settings directly in config - Added support for specifying Elasticsearch version at build time