-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt to TiKV's raft grpc message observer #413
base: raftstore-proxy
Are you sure you want to change the base?
Conversation
ref tikv#16141 rearrange parts of metrics panel Signed-off-by: SpadeA-Tang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141 Add test to simulate insertion of 200MB (logical size) of TiDB unqiue index and secondary index records and measure SkiplistEngine memory usage. Test results: * For secondary index * The key-value encoding amplification is approximately 3.10 * SkiplistEngine amplification is approximately 7.66 * For unique index * The key-value encoding amplification is approximately 3.38 * SkiplistEngine amplification is approximately 8.19 Signed-off-by: Neil Shen <[email protected]>
…v#17629) ref tikv#17459 Track the number of locks of large txns in resolver Signed-off-by: ekexium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
) close tikv#12587, fix tikv#16001 To fix the issue where slow region destruction can block snapshot generation, this PR moves the snapshot generation logic out of the region worker. A new worker is added to handle snap gen requests but it reuses the existing snap generator pool, so the change doesn't introduce any new threads. This is a simpler approach than the earlier attempt because it doesn't deal with the interactions between snapshot apply and destroy. Since snapshot generation has always been an independent task handled by its own thread pool, this change does not add significant complexity. Signed-off-by: Bisheng Huang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990 Add fsm schedule related metrics Signed-off-by: Connor <[email protected]> Signed-off-by: Connor1996 <[email protected]> Co-authored-by: Bisheng Huang <[email protected]>
close tikv#12371 * switch kms to aws_sdk lib * switch s3 to aws_sdk lib Signed-off-by: Andrey Koshchiy <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…kv#17747) ref tikv#16141 use stop-load-threshold for loading new regions Signed-off-by: SpadeA-Tang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17711 Deprecate write_global_seq, since it is by default false. Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141 Signed-off-by: Neil Shen <[email protected]>
…17730) close tikv#17728 Use min_lock_ts-1 as the candidate of resolved-ts, to ensure resolved_ts < lock.min_commit_ts( <= commit_ts). Signed-off-by: ekexium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: you06 <[email protected]>
ref tikv#16141, close tikv#17762 Let in_memory_engine's config`evict-threshold` and `stop-load-threshold` default value generated from `capacity`. Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…v#17771) close tikv#17767 IME observes all peer destroy events to timely evict regions. By adding a new peer, the old and uninitialized peer will be destroyed and IME must not panic in this situation. Signed-off-by: Neil Shen <[email protected]>
close tikv#17572 Signed-off-by: RidRisR <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990 Raft waterfall metrics track the duration of individual requests, all beginning from the same starting point (when the async write request is scheduled) but ending at various stages of the write process. Previous descriptions did not make that clear and may confuse the readers. This commit improves the grafana descriptions for clarity. Signed-off-by: Bisheng Huang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
) close tikv#17696 * take cdc tasks into memory quota to prevent the TiKV OOM caused by too many pending tasks Signed-off-by: Neil Shen <[email protected]> Signed-off-by: 3AceShowHand <[email protected]> Co-authored-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tikv#17643) close tikv#17363 Allow leader transfer if conf change applied on transferee. Signed-off-by: hhwyt <[email protected]> Co-authored-by: Bisheng Huang <[email protected]>
…ikv#17765) close tikv#17383, close tikv#17760 To address the corner case where a read thread encounters a panic due to reading with a stale index from the `Memtable` in raft-engine, which has been updated by a background thread that has already purged the stale logs. Signed-off-by: lucasliang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17788 Avoid can `on_gc_finished` when a new GC task is not run because there is another unfinished task. Signed-off-by: glorv <[email protected]>
…ikv#17515) ref tikv#16141 This commit adjusts the following in-memory-engine defaults: * `capacity`: Now IME uses 10% of the block cache and takes an equal amount of memory from the system. This is based on tests showing that the IME rarely fills its full capacity. * `mvcc_amplification_threshold`: Change from 100 to 10 which benefit common workloads like TPCc (50 warehouse), saving approximately 20% of unified read pool CPU usage. Also, it addresses two security issues: * Remove ignore of RUSTSEC-2024-0006, as vulnerable shlex 0.1.1 is removed by tikv#13814 * Upgrade hashbrown from yanked 0.15.0 to 0.15.1 Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…d twice (tikv#17798) close tikv#17797 If the last call `prepare_for_region` returns `NotInCache`, `clear_written_regions` can be called twice in both `write_impl` and `clear`, which will cause panic. This pr changes `clear_written_regions` to consume `self.written_regions`to avoid this kind of duplicate clear. Signed-off-by: glorv <[email protected]>
ref tikv#16141 handle error when getting regions info Signed-off-by: SpadeA-Tang <[email protected]>
close tikv#17701 add write batch limit for raft command batch Signed-off-by: SpadeA-Tang <[email protected]> Signed-off-by: SpadeA-Tang <[email protected]>
close tikv#17631 Added a new crate named `compact-log-backup`. Now it can merge some log files generated by log backup and make them become SSTs. Signed-off-by: hillium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17836 This commit adds metrics to track Raft snapshots that are dropped during sending or receiving due to concurrency limits. These metrics help identify bottlenecks during scaling. Signed-off-by: Bisheng Huang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…kv#17625) close tikv#12410 This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`. Signed-off-by: lucasliang <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…17841) close tikv#17840 Skip handling remain raft messages after peer fsm is stopped. This can avoid potential panic if the raft message need to read raft log from raft engine. Signed-off-by: glorv <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17852 expr: fix panic when using radians and degree Signed-off-by: gengliqi <[email protected]>
) close tikv#17830 Signed-off-by: joccau <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…ax_batch_size. (tikv#17821) close tikv#17101 Increase the default raft_client_queue_size and raft_msg_max_batch_size. This PR addresses an issue where too many Raft messages can delay sending, increasing the commit log duration and the heartbeat latency. The delayed heartbeats can lead to leader drops, especially during PD restarts that trigger a surge of hibernated regions. About this scenario, see more details at: tikv#17101. We increased the raft_client_queue_size to prevent Raft messages from being dropped when the RaftClient queue becomes full under too many message workloads. Additionally, we increased the raft_msg_max_batch_size to improve the efficiency of Raft message sending. Signed-off-by: hhwyt <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…sts (tikv#17500) (tikv#17870) close tikv#17394 lock_manager: Skip updating lock wait info for non-fair-locking requests This is a simpler and lower-risky fix of the OOM issue tikv#17394 for released branches, as an alternative solution to tikv#17451 . In this way, for acquire_pessimistic_lock requests without enabling fair locking, the behavior of update_wait_for will be a noop. So that if fair locking is globally disabled, the behavior will be equivalent to versions before 7.0. Signed-off-by: MyonKeminta <[email protected]>
close tikv#17916 concurrency_manager: add safety boundary for max_ts updates Add `max_ts_limit` to prevent unreasonable timestamp updates. The limit is synchronized with PD timestamp periodically. Configure via max_ts_allowance_secs and max_ts_sync_interval_secs. Updates from PD bypass this limit. Signed-off-by: ekexium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…ead (tikv#18022) ref tikv#16141 This commit rollback PR tikv#17927 as it is not the root cause. This rollback can all use IME for following read scenario. NOTE: We decide not cherry-pick it back to v8.5 as there may be other potential issue. Signed-off-by: glorv <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#18046 Avoid loading region into IME when it is uninitialized to prevent panic on encoding region end key. This is because `MsgPreLoadRegionRequest` is sent before leader issue a transfer leader request. Signed-off-by: Neil Shen <[email protected]>
ref tikv#15990 build: bump tikv pkg version Signed-off-by: ti-chi-bot <[email protected]>
…by (tikv#18061) close tikv#18060 Use regex expression in panel seriesOverrides to let it compatible with the optional "additional_groupby" alias. Signed-off-by: glorv <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…f invalid max-ts update (tikv#18057) close tikv#18055 concurrency_manager: double check via PD TSO before reporting error of invalid max-ts update Signed-off-by: ekexium <[email protected]>
close tikv#17618 Fix a bug that wrongly truncates the string when the charset is gbk/gb18030 Signed-off-by: cbcwestwolf <[email protected]>
…tered (tikv#18066) close tikv#18065 Print more information in logs when default not found error is encounterred. Signed-off-by: cfzjywxk <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990 Export the number of currently running background jobs to help diagnose potential compaction bottlenecks. Signed-off-by: Neil Shen <[email protected]> Co-authored-by: Bisheng Huang <[email protected]>
…ction (tikv#18085) close tikv#18084 `min_input_ts` and `max_input_ts` will present in a log files compaction. Signed-off-by: hillium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990 Fixed a typo: `Migartion` -> `Migration`. Signed-off-by: hillium <[email protected]>
ref tikv#18055 When validating max-ts updates, do not report error or panic unless confirmed by PD TSO. This reduces both false positive and false negative cases. Signed-off-by: ekexium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17894 build: update Dockerfile for build and test Signed-off-by: wuhuizuo <[email protected]> Co-authored-by: Ti Chi Robot <[email protected]>
close tikv#18026 Added a new RPC endpoint `flush_now` for the service `LogBackup`. Signed-off-by: 山岚 <[email protected]> Signed-off-by: hillium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#18105, ref pingcap/tidb#58238 Adapt ignore rules to make the download can skip some keys larger then specify timestamp Signed-off-by: 3pointer <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tial risk to affect data correctness (tikv#18092) close tikv#18091 gc_worker: Do not do delete_files_in_range on lock cf which has potential risk to affect data correctness Signed-off-by: MyonKeminta <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17989 If tso fetch fails, skip updating last_pd_tso. Signed-off-by: ekexium <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#16818 Fix duplicated keys returned scanning locks. Signed-off-by: cfzjywxk <[email protected]>
…ikv#18095) close tikv#18117 Introduce a new field `use_one_pc` to the `Lock` struct to indicate whether the txn uses 1pc, and use it to prevent locks from being skipped when reading with max-ts. Signed-off-by: zyguan <[email protected]>
…8099) ref tikv#15990 * Increase task wait metrics upper limit from 2.5s to 42s to capture long task wait records that are crucial for investigating high latency issues * Add description for end-point-memory-quota configuration Signed-off-by: Neil Shen <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#14474 Fix the request source check logic for external or internal Signed-off-by: cfzjywxk <[email protected]>
…ikv#18102) close tikv#17995 Address clock-skew issues. Signed-off-by: lucasliang <[email protected]>
close tikv#18125 Fix incorrect mapped allocation per thread metric Not all thread builders are hooked by `thread_allocate_exclusive_arena`, so some threads are using shared arena, causing incorrect per thread allocation. Signed-off-by: Connor1996 <[email protected]>
close tikv#18111 Support scalar function from_unixtime in tikv Signed-off-by: wshwsh12 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#18113 Support customized raft message rejection logic Signed-off-by: Calvin Neo <[email protected]> Signed-off-by: Calvin Neo <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: glorv <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@CalvinNeo: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What is changed and how it works?
Issue Number: Close #xxx
What's Changed:
Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests
Side effects
Release note