optimise: stores char data as bytes instead of their code representation #107

Samy-33 · 2024-10-12T08:58:04Z

No description provided.

compiler+runtime/src/cpp/jank/runtime/obj/character.cpp

jeaye · 2024-10-12T18:57:46Z

compiler+runtime/test/cpp/jank/read/parse.cpp

+        // First two lex tokens are invalid characters i.e. \ne and \apple
+        for(int _ignored = 1; _ignored <= 2; _ignored++)


Two stylistic points.

We always use /* */ comments in jank. We also use correct grammar and punctuation, since comments are documentation. So we'll want a period at the end of this.

This for loop is more idiomatically written like this:

for(size_t i{}; i < 2; ++i)

In particular, a few things change:

We use size_t because we're counting up from 0

We use direct initialization, as we do in modern C++

We zero-base our indices/loop counters

We name the counter i, even if it's ignored

We use ++i instead of i++; while there's no practical difference here, the postfix increment has serious performance implications in other scenarios and we want to only use it when we actually want postfix; here, we actually want prefix increment

Could you point me to a resource for why i++ would have perf implications?

The ++ operator in C++ is overloadable. We do this regularly for custom iterator types. For more complex iterator types, copying the iterator may be very slow. i++ requires copying i so it can be returned and then incrementing the actual i. So, imagine an iterator which holds a vector and an index in it. By doing i++, you need to copy the iterator, which means deep copying the entire vector. If you use ++i, the iterator is updated in place.

As I said, in this situation, the compiler will treat i++ to mean ++i because clang knows that people like to write i++ when they mean ++i. But C++ is all about building good habits so that you don't get bitten. By using ++i as a default, and only i++ when you actually want a copy of the value before it was incremented, you won't be surprised down the road.

Since you asked for a reference:

https://stackoverflow.com/questions/5223950/stl-iterators-prefix-increment-faster

https://stackoverflow.com/questions/41765835/why-use-the-prefix-increment-form-for-iterators

jeaye · 2024-10-12T20:17:13Z

Nice work, Saket!

jeaye reviewed Oct 12, 2024

View reviewed changes

compiler+runtime/src/cpp/jank/runtime/obj/character.cpp Outdated Show resolved Hide resolved

jeaye reviewed Oct 12, 2024

View reviewed changes

optimise: stores char data as bytes instead of their code representation

ca45c72

Samy-33 force-pushed the perf/optimize-char-runtime-data branch from 47b3350 to ca45c72 Compare October 12, 2024 19:44

Samy-33 requested a review from jeaye October 12, 2024 19:47

jeaye merged commit a45f306 into jank-lang:main Oct 12, 2024
2 of 4 checks passed

Samy-33 deleted the perf/optimize-char-runtime-data branch October 12, 2024 20:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimise: stores char data as bytes instead of their code representation #107

optimise: stores char data as bytes instead of their code representation #107

Samy-33 commented Oct 12, 2024

jeaye Oct 12, 2024 •

edited

Loading

Samy-33 Oct 12, 2024

jeaye Oct 12, 2024

jeaye Oct 12, 2024

jeaye commented Oct 12, 2024

		// First two lex tokens are invalid characters i.e. \ne and \apple
		for(int _ignored = 1; _ignored <= 2; _ignored++)

optimise: stores char data as bytes instead of their code representation #107

optimise: stores char data as bytes instead of their code representation #107

Conversation

Samy-33 commented Oct 12, 2024

jeaye Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

Samy-33 Oct 12, 2024

Choose a reason for hiding this comment

jeaye Oct 12, 2024

Choose a reason for hiding this comment

jeaye Oct 12, 2024

Choose a reason for hiding this comment

jeaye commented Oct 12, 2024

jeaye Oct 12, 2024 •

edited

Loading