Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Possible whitespace Issue with OnlineHash calculation for stats #756

Open
1 task done
bonimy opened this issue Oct 27, 2024 · 0 comments
Open
1 task done

Comments

@bonimy
Copy link

bonimy commented Oct 27, 2024

Is there an existing issue for this?

  • I have searched the existing issues and none matched.

Operating System

Windows 10

CPU

Intel

GPU

gefore 3080 ti

Storage

2TB SSD

Game Version

4.19.1

Game Mode

dance

Theme

default

Preabmle

As far as I can tell, OnlineHash is calculated by Steps::GetHash(). This function gets the ASCII string representation of the SM note data, and then calculates its crc32 checksums. The result is then assigned to OnlineHash.

I assume that the purpose of OnlineHash is to identify songs and charts so that score submissions are assigned to the correct versions of charts. Therefore, it is a requirement that two identical charts return the same OnlineHash value

Issue

It seems that m_sNoteDataCompressed can be populated in other ways than NoteDataUtil::GetSMNoteDataString. The result is that some different arrangements of whitespace characters occur in otherwise equivalents step patterns. For example a particular line could have either 0000\r\n or 0000\n. I've seen trailing whitespace show itself on some random lines also.

Because the OnlineHash is just the crc32 of this note data string, we're going to end up getting different hashes for equivalent charts.

Assuming this were fixed, there are other problems. OnlineHash uses only the step data, not the timing data.

Proposed Solution

Change all \r\n and \r to \n and strip all whitespace and comments for the note data string that gets hashed.

If we care a lot about reducing as many collisions as possible, then OnlineHash should also be updated to digest the timing data. In my opinion, the one exception would be Beat0Offset, since it does not degrade the fundamental nature of a chart.

Another minor concern is that crc32 hashes are not secure. However, there will never truly be a way to stop fake scores from being submitted, so this may be a moot point. But a high-bit hash algorithm will further reduce the chances of collisions. And it doesn't hurt to use a secure hash algorithm. Git commits weren't originally designed to be securely hashed, but it proved to be very useful. An ounce of prevention now could be worth a pound of cure in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant