Skip to content

Releases: WorksApplications/elasticsearch-sudachi

v3.3.0

13 Nov 08:22
Compare
Choose a tag to compare

Highlights

  • Offset correction of SudachiSplitFilter now works properly with CharFilter #149
  • SPI is changed to implement #149
    • New methods are added to MorphemeAttribute
  • Add allow_empty_morpheme setting to the tokenizer (#151)
    • If false (default), when a char is split into multiple morphemes (e.g. ㍿), all morphemes will contain the char in their span.
    • If true, only the first morpheme will contain the char and the span of other morphemes may be empty.
      • Previously this was set true by default.

v3.2.3

16 Oct 07:22
Compare
Choose a tag to compare

Highlights

  • support latest elasticsearch and opensearch versions (#144)
    • es: 8.14.3, 8.15.2, 7.17.24
    • os: 2.15.0, 2.16.0, 2.17.1

Release v3.2.2

04 Jul 06:29
Compare
Choose a tag to compare

Highlights

  • Use lazyTokenizeSentences for the analysis to fix the problem of input chunking (#137).

Breaking Changes

  • Chunking behavior in v3.2.1 is fixed.
    • Analysis works same as v.3.2.0.

Release v3.2.1

14 Jun 05:53
6b73b1a
Compare
Choose a tag to compare

Highlights

  • Fix OOM issue with a huge input by @kenmasumitsu in #132
    • Huge input now split into relatively small (1M char) chunks now.
    • Analysis maybe broken around the edge of chunks (open issue, see #131).
  • Add documentation about Sudachi synonym dict by @sorami in #65

Breaking Changes

  • Huge (>1M char) input now split into chunks to avoid OOM and the analysis may be broken around the edge of chunks (open issue, see #131).

Release v3.2.0

30 May 08:12
7d9f7da
Compare
Choose a tag to compare

Highlights

  • Explain with morpheme attribute (#121)
  • Synonym filter and Sudachi filters can be used in any order (#122)

Breaking Change

  • MorphemeConsumerAttribute is removed from SPI.
    • You can just remove related code to migrate.

Release v3.1.1

17 May 04:49
932df4e
Compare
Choose a tag to compare

Highlights

  • Support Elasticsearch -8.13.4 and OpenSearch -2.14.0
  • Fix dictionary caching problem

Release v3.1.0

26 Jun 01:49
Compare
Choose a tag to compare

Highlights

  • OpenSearch support
  • Fix trimming problems
  • Extensibility support

OpenSearch support

We now support OpenSearch in addition to Elasticsearch. Plugins should work the same way as with Elasticsearch.
For the time being we test only on 2.6.0 and upper. There are no plans for supporting 1.* branch at the time being.

Because of OpenSearch support we changed the naming scheme of distribution zip to <engine kind>-<engine version>-analysis-sudachi-<plugin-version>.zip

Extensibility support

analysis-sudachi plugin now support being extended by other plugins, both in OpenSearch and Elasticsearch.
When extending analysis-sudachi please use sudachi-search-spi artifact as a provided dependency. We plan to have SPI stable, but the internal implementation of analysis-sudachi will not be stable.

In Elasticsearch 8.3.*+ we utilize SPI-aware packaging and internal implementation will not be available to extending plugins.

Internal & Testing Improvements

For improving quality of releases we have greatly improved testing. Previously there were only unit tests for analysis logic, from 3.0.0 there are additionally (tier-2) integration tests which spawn full Elasticsearch instance and execute workload which perform parallel document indexing following by validation of results.

From 3.1.0 we have added tier-1 integration tests which perform relatively simple validation, however these tests are executed inside SecurityManager-present JVMs, simulating fully-fledged Elasticsearch instance. We hope that this procedure will increase quality of releases and help us to catch issues faster.

v3.0.1

10 Mar 00:46
Compare
Choose a tag to compare

Highlights

  • Upgrade Sudachi to 0.7.1 which contains serious fixes for streaming analysis

v3.0.0

19 Jan 09:05
Compare
Choose a tag to compare

Highlights

  • Sudachi is updated to 0.7.0
  • Analysis results are cached within a single index
  • All versions of ElasticSearch are supported by a single branch with some conditional compilation Gradle magic
  • Implementation now uses Kotlin inside

Analysis cache

  • Previous versions of ES Sudachi plugin were analyzing the input multiple times when using multiple analyzer chains (e.g. mode A, mode B, mode C, readings, dictionary form) for the same field. From this version, the underlying analysis is done only once, yielding n times speedup, where n is a number of configured analysis chains which stem from Sudachi.

v2.1.0 for Elasticsearch 5.6

29 Dec 05:00
Compare
Choose a tag to compare
  • Added a new property additional_settings to write Sudachi settings directly in config
  • Added support for specifying Elasticsearch version at build time