Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](func) Fix precision loss in ST_GeometryFromWKB coordinate parsing #46661

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

felixwluo
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #46619

Root Cause:

1. Unnecessary floating-point number conversions in coordinate handling:
   1.1 converting double to string using absl::StrFormat
   1.2 converting string back to double using std::stod
each conversion caused precision loss.

2. Byte order handling issue in WKB parsing:
    2.1 using machine endian before properly reading WKB byte order flag
    2.2 this caused incorrect interpretation of coordinate values

Solution:

1. Remove unnecessary coordinate value conversions:
    1.1 directly use S2LatLng's degrees() value without string formatting
    1.2 increase output precision in print_s2point to 15 digits

2. Fix WKB byte order handling:
    2.1 read byte order flag first
    2.2 set correct byte order before parsing coordinates

Result:
before:

POINT (1.461652102e-231 3.34424828009e-59)

after:

POINT(117.194767000297 36.46326301008)

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@felixwluo
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32784 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b33354e29e883dd01e2a2679d33ccee3eda9282, data reload: false

------ Round 1 ----------------------------------
q1	17611	6179	6083	6083
q2	2050	312	167	167
q3	10583	1223	729	729
q4	10265	862	443	443
q5	7546	2283	2002	2002
q6	208	186	153	153
q7	911	749	644	644
q8	9244	1398	1226	1226
q9	5890	4903	4979	4903
q10	6783	2305	1843	1843
q11	487	277	252	252
q12	351	369	230	230
q13	17759	3674	3117	3117
q14	232	233	209	209
q15	570	512	489	489
q16	637	619	587	587
q17	570	838	336	336
q18	6820	6555	6335	6335
q19	3202	997	576	576
q20	305	310	183	183
q21	2802	2136	1980	1980
q22	365	326	297	297
Total cold run time: 105191 ms
Total hot run time: 32784 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6298	6223	6238	6223
q2	234	330	234	234
q3	2259	2611	2290	2290
q4	1413	1807	1312	1312
q5	4361	4767	4883	4767
q6	189	178	144	144
q7	2070	2008	1860	1860
q8	2662	2795	2721	2721
q9	7341	7273	7286	7273
q10	3045	3356	2726	2726
q11	594	501	490	490
q12	693	775	688	688
q13	3540	3869	3212	3212
q14	282	313	297	297
q15	565	510	518	510
q16	653	666	663	663
q17	1220	1736	1274	1274
q18	7737	7512	7485	7485
q19	871	1175	1094	1094
q20	1970	2071	1873	1873
q21	5771	5284	4959	4959
q22	625	652	592	592
Total cold run time: 54393 ms
Total hot run time: 52687 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.87% (10130/26061)
Line Coverage: 29.93% (85731/286485)
Region Coverage: 29.02% (43736/150711)
Branch Coverage: 25.56% (22323/87344)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3b33354e29e883dd01e2a2679d33ccee3eda9282_3b33354e29e883dd01e2a2679d33ccee3eda9282/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 195086 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b33354e29e883dd01e2a2679d33ccee3eda9282, data reload: false

query1	1308	906	944	906
query2	6423	2388	2303	2303
query3	10961	4526	4792	4526
query4	33016	23700	23671	23671
query5	4844	627	450	450
query6	296	205	197	197
query7	3991	492	312	312
query8	311	254	258	254
query9	9269	2630	2634	2630
query10	487	326	245	245
query11	18142	15302	15009	15009
query12	159	107	109	107
query13	1592	530	416	416
query14	10305	7096	7154	7096
query15	260	203	179	179
query16	8116	616	448	448
query17	1589	759	589	589
query18	2139	417	304	304
query19	205	181	147	147
query20	125	117	115	115
query21	204	124	107	107
query22	4560	4690	4306	4306
query23	34641	33477	33344	33344
query24	6570	2271	2301	2271
query25	462	448	395	395
query26	784	270	161	161
query27	2168	464	338	338
query28	5376	2500	2464	2464
query29	572	597	419	419
query30	208	181	152	152
query31	932	914	809	809
query32	90	57	54	54
query33	473	349	293	293
query34	779	840	506	506
query35	799	833	759	759
query36	1032	1067	954	954
query37	125	101	75	75
query38	4238	4264	4051	4051
query39	1520	1478	1448	1448
query40	204	114	97	97
query41	44	41	44	41
query42	120	102	104	102
query43	526	526	498	498
query44	1352	813	820	813
query45	181	173	164	164
query46	889	1066	652	652
query47	1917	1887	1833	1833
query48	398	418	325	325
query49	705	479	392	392
query50	661	675	392	392
query51	7098	6989	7073	6989
query52	106	102	90	90
query53	223	262	178	178
query54	505	492	413	413
query55	84	87	77	77
query56	256	255	233	233
query57	1183	1204	1140	1140
query58	244	238	217	217
query59	3225	3494	3287	3287
query60	271	263	256	256
query61	112	110	112	110
query62	833	800	721	721
query63	233	186	190	186
query64	3698	1048	658	658
query65	3305	3188	3212	3188
query66	782	409	309	309
query67	16399	15702	15450	15450
query68	8493	691	516	516
query69	484	289	261	261
query70	1206	1160	1138	1138
query71	429	276	260	260
query72	6318	3902	3897	3897
query73	660	736	353	353
query74	9794	8932	8964	8932
query75	3864	3142	2633	2633
query76	3698	1180	768	768
query77	774	370	273	273
query78	10808	10036	9444	9444
query79	3772	783	590	590
query80	712	557	433	433
query81	494	265	223	223
query82	604	152	213	152
query83	193	164	141	141
query84	288	94	75	75
query85	746	383	304	304
query86	353	329	298	298
query87	4295	4267	4232	4232
query88	4274	2139	2108	2108
query89	401	300	292	292
query90	1876	184	180	180
query91	136	134	103	103
query92	69	56	49	49
query93	1042	738	537	537
query94	639	401	286	286
query95	337	259	249	249
query96	486	593	275	275
query97	2826	2939	2872	2872
query98	221	200	192	192
query99	1572	1508	1399	1399
Total cold run time: 296489 ms
Total hot run time: 195086 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.63 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3b33354e29e883dd01e2a2679d33ccee3eda9282, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.04
query3	0.23	0.07	0.06
query4	1.62	0.11	0.11
query5	0.42	0.42	0.41
query6	1.15	0.66	0.66
query7	0.03	0.02	0.02
query8	0.05	0.03	0.04
query9	0.58	0.51	0.50
query10	0.55	0.57	0.55
query11	0.15	0.11	0.11
query12	0.14	0.12	0.11
query13	0.61	0.59	0.59
query14	2.72	2.83	2.75
query15	0.89	0.81	0.82
query16	0.40	0.39	0.39
query17	1.06	1.08	1.08
query18	0.24	0.22	0.20
query19	1.86	1.82	2.03
query20	0.02	0.00	0.01
query21	15.38	0.92	0.58
query22	0.76	0.79	0.77
query23	15.16	1.36	0.58
query24	2.53	1.00	1.17
query25	0.17	0.19	0.13
query26	0.22	0.14	0.13
query27	0.06	0.06	0.05
query28	14.12	1.57	1.05
query29	12.57	4.04	3.32
query30	0.26	0.09	0.06
query31	2.84	0.60	0.39
query32	3.23	0.55	0.46
query33	3.14	3.20	3.11
query34	16.79	5.10	4.45
query35	4.50	4.47	4.52
query36	0.76	0.50	0.51
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.04	0.03	0.03
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.79 s
Total hot run time: 31.63 s

@felixwluo
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32508 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b3e6a7f2b6d3d1b9923e8f0fe57a508572440f2c, data reload: false

------ Round 1 ----------------------------------
q1	17573	6138	6020	6020
q2	2046	296	163	163
q3	10632	1194	760	760
q4	10198	852	428	428
q5	7537	2137	1996	1996
q6	203	182	149	149
q7	908	746	606	606
q8	9224	1344	1136	1136
q9	5312	4843	4907	4843
q10	6778	2301	1848	1848
q11	477	275	248	248
q12	355	371	216	216
q13	17772	3661	3007	3007
q14	239	228	217	217
q15	571	505	522	505
q16	637	610	599	599
q17	587	855	329	329
q18	7074	6561	6400	6400
q19	2002	950	561	561
q20	306	310	180	180
q21	2791	2245	1992	1992
q22	358	338	305	305
Total cold run time: 103580 ms
Total hot run time: 32508 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6249	6152	6200	6152
q2	239	318	238	238
q3	2216	2631	2336	2336
q4	1391	1823	1364	1364
q5	4312	4743	4705	4705
q6	189	175	142	142
q7	2057	1970	1788	1788
q8	2608	2823	2652	2652
q9	7263	7199	7251	7199
q10	3058	3318	2875	2875
q11	572	505	494	494
q12	638	701	575	575
q13	3508	3799	3264	3264
q14	302	304	286	286
q15	576	513	504	504
q16	673	689	650	650
q17	1254	1720	1265	1265
q18	7626	7542	7301	7301
q19	834	1058	1117	1058
q20	2000	2069	1898	1898
q21	5816	5228	4845	4845
q22	604	650	591	591
Total cold run time: 53985 ms
Total hot run time: 52182 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.87% (10129/26061)
Line Coverage: 29.92% (85718/286503)
Region Coverage: 29.02% (43745/150721)
Branch Coverage: 25.57% (22334/87352)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b3e6a7f2b6d3d1b9923e8f0fe57a508572440f2c_b3e6a7f2b6d3d1b9923e8f0fe57a508572440f2c/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 195277 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b3e6a7f2b6d3d1b9923e8f0fe57a508572440f2c, data reload: false

query1	1279	950	919	919
query2	6154	2324	2341	2324
query3	11004	4637	4628	4628
query4	33344	23468	23753	23468
query5	5738	600	450	450
query6	293	191	176	176
query7	4190	481	297	297
query8	295	230	217	217
query9	9381	2646	2645	2645
query10	542	307	255	255
query11	18108	15134	15275	15134
query12	152	104	104	104
query13	1648	527	399	399
query14	10852	7425	7462	7425
query15	254	202	185	185
query16	7862	619	472	472
query17	1476	751	579	579
query18	2121	402	335	335
query19	210	185	156	156
query20	123	113	109	109
query21	207	125	115	115
query22	4381	4566	4346	4346
query23	34933	33187	33227	33187
query24	6119	2301	2273	2273
query25	454	453	391	391
query26	754	246	154	154
query27	2218	469	338	338
query28	5345	2488	2463	2463
query29	546	534	412	412
query30	203	186	147	147
query31	944	893	834	834
query32	84	61	64	61
query33	458	340	294	294
query34	748	865	510	510
query35	814	821	753	753
query36	974	1048	972	972
query37	113	101	76	76
query38	4279	4214	4369	4214
query39	1511	1478	1431	1431
query40	199	117	111	111
query41	46	42	42	42
query42	125	105	105	105
query43	540	563	502	502
query44	1316	816	847	816
query45	184	166	166	166
query46	881	1045	653	653
query47	1932	1897	1844	1844
query48	386	402	323	323
query49	700	510	389	389
query50	659	666	403	403
query51	7028	6956	7028	6956
query52	104	97	90	90
query53	225	249	179	179
query54	474	487	404	404
query55	82	79	79	79
query56	256	276	246	246
query57	1182	1187	1152	1152
query58	234	231	229	229
query59	3082	3247	2986	2986
query60	285	283	265	265
query61	137	132	132	132
query62	850	807	759	759
query63	240	193	206	193
query64	3298	1126	724	724
query65	3320	3252	3221	3221
query66	747	420	304	304
query67	16468	15803	15512	15512
query68	9590	690	526	526
query69	511	290	251	251
query70	1221	1151	1094	1094
query71	456	280	250	250
query72	6484	3926	3798	3798
query73	645	740	365	365
query74	10200	8896	8699	8699
query75	4746	3141	2654	2654
query76	4672	1166	753	753
query77	821	358	277	277
query78	10010	10037	9421	9421
query79	3436	791	570	570
query80	712	513	439	439
query81	487	282	227	227
query82	622	153	122	122
query83	180	157	145	145
query84	275	95	84	84
query85	799	359	310	310
query86	352	311	298	298
query87	4433	4393	4380	4380
query88	3729	2155	2159	2155
query89	415	327	296	296
query90	1904	182	184	182
query91	128	130	105	105
query92	61	58	52	52
query93	1325	865	539	539
query94	625	397	283	283
query95	326	265	247	247
query96	482	618	284	284
query97	2877	2948	2851	2851
query98	216	206	193	193
query99	1694	1470	1394	1394
Total cold run time: 299525 ms
Total hot run time: 195277 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.44 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b3e6a7f2b6d3d1b9923e8f0fe57a508572440f2c, data reload: false

query1	0.03	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.06
query4	1.61	0.10	0.11
query5	0.42	0.42	0.41
query6	1.14	0.65	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.60	0.50	0.51
query10	0.56	0.55	0.55
query11	0.14	0.10	0.10
query12	0.14	0.10	0.11
query13	0.60	0.61	0.60
query14	2.74	2.83	2.72
query15	0.90	0.81	0.82
query16	0.39	0.38	0.40
query17	1.05	1.02	1.00
query18	0.23	0.21	0.21
query19	1.96	1.75	2.02
query20	0.01	0.01	0.01
query21	15.36	0.92	0.60
query22	0.75	0.89	0.72
query23	15.11	1.47	0.53
query24	3.55	1.30	1.17
query25	0.21	0.17	0.11
query26	0.21	0.14	0.13
query27	0.05	0.05	0.06
query28	14.00	1.51	1.05
query29	12.55	3.99	3.28
query30	0.25	0.09	0.06
query31	2.83	0.59	0.39
query32	3.23	0.53	0.46
query33	3.08	3.10	3.08
query34	16.81	5.12	4.51
query35	4.49	4.47	4.44
query36	0.65	0.50	0.49
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.02
query42	0.03	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 106.52 s
Total hot run time: 31.44 s

@felixwluo felixwluo requested a review from LemonLiTree January 9, 2025 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] function ST_GeometryFromWKB result diff between doris and postgis
3 participants