-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Doris be restarts every day from 0 am to 1 am #46432
Comments
You can check the be.out log |
*** Query id: 77d107fac80347d9-b07bc57eb6d9ea79 *** StdoutLogger 2025-01-06 00:01:09,215 Start time: Mon Jan 6 00:01:09 CST 2025 |
I find a SQL exec error : (1105, 'errCode = 2, detailMessage = Backend process epoch changed, previous 1736012512920 now 1736092876420, means this be has already restarted, should cancel this coordinator,query id 77d107fac80347d9-b07bc57eb6d9ea79'). then retry succeeded after 30 seconds for this SQL. |
The same SQL and scheduled tasks did not have an exception in version 2.1.6, and this exception has only appeared since the upgrade to 2.1.7 |
Search before asking
Version
2.1.7
What's Wrong?
I have three BE, but every day at 0 am, one of BE has restarts.
Doris be restarts every day from 0 am to 1 am. The log is as follows, and no abnormal and out-of-memory monitoring information is found:
Restarts occur when I20250106 00:01:03.237107
log:
I20250106 00:01:02.727793 8210 query_context.cpp:188] Query bfc4d656ef82409a-b590bdb78c6f7de3 deconstructed, , deregister query/load memory tracker, queryId=bfc4d656ef82409a-b590bdb78c6f7de3, Limit=2.00 GB, CurrUsed=35.25 KB, PeakUsed=8.94 MB
I20250106 00:01:02.735662 7699 fragment_mgr.cpp:778] query_id: bbec01208ed45d5-af4e282d731740d7, coord_addr: TNetworkAddress(hostname=10.16.10.184, port=9020), total fragment num on current host: 4, fe process uuid: 1735867264753, query type: SELECT, report audit fe:TNetworkAddress(hostname=10.16.10.184, port=9020)
I20250106 00:01:02.735759 7699 fragment_mgr.cpp:819] Query/load id: bbec01208ed45d5-af4e282d731740d7, use workload group: TG[id = 1, name = normal, cpu_share = 1024, memory_limit = 8.44 GB, enable_memory_overcommit = true, version = 0, cpu_hard_limit = -1, scan_thread_num = 48, max_remote_scan_thread_num = 512, min_remote_scan_thread_num = 8, spill_low_watermark=50, spill_high_watermark=80, is_shutdown=false, query_num=3, read_bytes_per_second=-1, remote_read_bytes_per_second=-1], is pipeline: 1
I20250106 00:01:02.735778 7699 fragment_mgr.cpp:830] Register query/load memory tracker, query/load id: bbec01208ed45d5-af4e282d731740d7 limit: 0
I20250106 00:01:02.735824 7699 pipeline_x_fragment_context.cpp:207] PipelineXFragmentContext::prepare|query_id=bbec01208ed45d5-af4e282d731740d7|fragment_id=0|pthread_id=140121769277184
I20250106 00:01:02.742755 8209 fragment_mgr.cpp:730] Removing query bbec01208ed45d5-af4e282d731740d7 instance bbec01208ed45d5-af4e282d731740d8, all done? true
I20250106 00:01:02.742787 8209 fragment_mgr.cpp:730] Removing query bbec01208ed45d5-af4e282d731740d7 instance bbec01208ed45d5-af4e282d731740d9, all done? true
I20250106 00:01:02.742794 8209 fragment_mgr.cpp:730] Removing query bbec01208ed45d5-af4e282d731740d7 instance bbec01208ed45d5-af4e282d731740da, all done? true
I20250106 00:01:02.742800 8209 fragment_mgr.cpp:730] Removing query bbec01208ed45d5-af4e282d731740d7 instance bbec01208ed45d5-af4e282d731740db, all done? true
I20250106 00:01:02.742810 8209 fragment_mgr.cpp:736] Query bbec01208ed45d5-af4e282d731740d7 finished
I20250106 00:01:02.743234 8209 query_context.cpp:156] Query bbec01208ed45d5-af4e282d731740d7 deconstructed, , deregister query/load memory tracker, queryId=bbec01208ed45d5-af4e282d731740d7, Limit=2.00 GB, CurrUsed=453.50 KB, PeakUsed=9.72 MB
I20250106 00:01:02.743286 8209 query_context.cpp:188] Query bbec01208ed45d5-af4e282d731740d7 deconstructed, , deregister query/load memory tracker, queryId=bbec01208ed45d5-af4e282d731740d7, Limit=2.00 GB, CurrUsed=453.50 KB, PeakUsed=9.72 MB
I20250106 00:01:02.746565 7678 fragment_mgr.cpp:778] query_id: 20017784664247c2-855d4c4d960c8abc, coord_addr: TNetworkAddress(hostname=10.16.10.184, port=9020), total fragment num on current host: 4, fe process uuid: 1735867264753, query type: SELECT, report audit fe:TNetworkAddress(hostname=10.16.10.184, port=9020)
I20250106 00:01:02.746655 7678 fragment_mgr.cpp:819] Query/load id: 20017784664247c2-855d4c4d960c8abc, use workload group: TG[id = 1, name = normal, cpu_share = 1024, memory_limit = 8.44 GB, enable_memory_overcommit = true, version = 0, cpu_hard_limit = -1, scan_thread_num = 48, max_remote_scan_thread_num = 512, min_remote_scan_thread_num = 8, spill_low_watermark=50, spill_high_watermark=80, is_shutdown=false, query_num=3, read_bytes_per_second=-1, remote_read_bytes_per_second=-1], is pipeline: 1
I20250106 00:01:02.746672 7678 fragment_mgr.cpp:830] Register query/load memory tracker, query/load id: 20017784664247c2-855d4c4d960c8abc limit: 0
I20250106 00:01:02.746686 7678 pipeline_x_fragment_context.cpp:207] PipelineXFragmentContext::prepare|query_id=20017784664247c2-855d4c4d960c8abc|fragment_id=0|pthread_id=140121945523968
I20250106 00:01:02.761847 8207 fragment_mgr.cpp:730] Removing query 20017784664247c2-855d4c4d960c8abc instance 20017784664247c2-855d4c4d960c8abd, all done? true
I20250106 00:01:02.761895 8207 fragment_mgr.cpp:730] Removing query 20017784664247c2-855d4c4d960c8abc instance 20017784664247c2-855d4c4d960c8abe, all done? true
I20250106 00:01:02.761906 8207 fragment_mgr.cpp:730] Removing query 20017784664247c2-855d4c4d960c8abc instance 20017784664247c2-855d4c4d960c8abf, all done? true
I20250106 00:01:02.761912 8207 fragment_mgr.cpp:730] Removing query 20017784664247c2-855d4c4d960c8abc instance 20017784664247c2-855d4c4d960c8ac0, all done? true
I20250106 00:01:02.761919 8207 fragment_mgr.cpp:736] Query 20017784664247c2-855d4c4d960c8abc finished
I20250106 00:01:02.762681 8207 query_context.cpp:156] Query 20017784664247c2-855d4c4d960c8abc deconstructed, , deregister query/load memory tracker, queryId=20017784664247c2-855d4c4d960c8abc, Limit=2.00 GB, CurrUsed=81.38 KB, PeakUsed=8.32 MB
I20250106 00:01:02.762732 8207 query_context.cpp:188] Query 20017784664247c2-855d4c4d960c8abc deconstructed, , deregister query/load memory tracker, queryId=20017784664247c2-855d4c4d960c8abc, Limit=2.00 GB, CurrUsed=81.38 KB, PeakUsed=8.32 MB
I20250106 00:01:03.237107 8165 daemon.cpp:221] os physical memory 31.26 GB. process memory used 3.28 GB(= 3.69 GB[vm/rss] - 415.25 MB[tc/jemalloc_cache] + 0[reserved] + 0B[waiting_refresh]), limit 28.13 GB, soft limit 25.32 GB. sys available memory 22.96 GB(= 22.96 GB[proc/available] - 0[reserved] - 0B[waiting_refresh]), low water mark 1.56 GB, warning water mark 3.13 GB.
I20250106 00:01:09.373319 29186 doris_main.cpp:382] version doris-2.1.7-rc03(AVX2) RELEASE (build git://vm-36@443e87e)
Built on Wed, 06 Nov 2024 15:34:46 CST by vm-36
I20250106 00:01:11.235632 29186 doris_main.cpp:490] Doris backend JNI is initialized.
I20250106 00:01:11.236301 29186 mem_info.cpp:361] Physical Memory: 33565720576, BE Available Physical Memory(consider cgroup): 33565720576, Mem Limit: 28.13 GB, origin config value: 90%, System Mem Available Min Reserve: 1.56 GB, Vm Min Free KBytes: 66.00 MB, Vm Overcommit Memory: 0
I20250106 00:01:11.236337 29186 doris_main.cpp:508] Cpu Info:
Model: Intel(R) Xeon(R) Gold 6266C CPU @ 3.00GHz
Cores: 8
Max Possible Cores: 8
L1 Cache: 32.00 KB (Line: 64.00 B)
L2 Cache: 1.00 MB (Line: 64.00 B)
L3 Cache: 30.25 MB (Line: 64.00 B)
Hardware Supports:
ssse3
sse4_1
sse4_2
popcnt
avx
avx2
Numa Nodes: 1
Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 | 7->0 |
What You Expected?
Locating the cause of the anomaly
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: