Replies: 7 comments 32 replies
-
What can be done?
Option 1 is the "right way to do it" but its hard to manage. A combo of option 2&3 seems reasonable, at the risk of introducing yet another tunable parameter into the system. (I already have several dozen I work with) I'm gonna look at option 4 real quick, right now, to see how hard it is. |
Beta Was this translation helpful? Give feedback.
-
Regarding 4, I have some ideas how to optimise it, among them:
- exploit the existing partial connector memory sharing (that is already
done in build_disjuncts) for tracon caching and comparison.
- implement hash caching (in connector_struct, I found how to do that in
its existing 32 bytes) in and reuse it in tracon_set(). The hashing is to
be done in build_disjincts().
- maintain the disjunct and connector count, to eliminate the need of
counting them (already partially implemented).
- some more ideas that I have to recall...
בתאריך יום ד׳, 27 במרץ 2024, 04:40, מאת Linas Vepštas <
***@***.***>:
… What can be done?
1. Work to create datasets that do not have this problem
2. Adjust max acceptable cost to trim away most disjuncts, so they are
never created
3. Perform random sampling of disjuncts, when there are too many.
4. Optimize the pack_sentence_for_pruning() code
5. Something else
Option 1 is the "right way to do it" but its hard to manage.
A combo of option 2&3 seems reasonable, at the risk of introducing yet
another tunable parameter into the system. (I already have several dozen I
work with)
I'm gonna look at option 4 real quick, right now, to see how hard it is.
—
Reply to this email directly, view it on GitHub
<#1479 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGWXGVSU4UH3N2NJIZFDRTY2IWSTAVCNFSM6AAAAABFKBG3LWVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DSMRSGYYDC>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
Can you provide me with the dict?
I would like to use it in order to identify the problematic code. |
Beta Was this translation helpful? Give feedback.
-
EDIT: More debugging shows the problem here is not the hash function. See an additional post below.
It seems so, as tracon_set::find_place is the heaviest library function in the profiling (besides the atomese code functions). static unsigned int hash_connectors(const Connector *c, unsigned int shallow)
{
unsigned int accum = shallow && c->shallow;
for (; c != NULL; c = c->next)
{
accum = (19 * accum) +
((c->desc->uc_num)<<18) +
(((unsigned int)c->multi)<<31) +
(unsigned int)c->desc->lc_letters;
}
return accum;
} The hash is 32 bits, so Apparently, the number of different connectors is high enough for this hash function to be extremely bad (I don't know how many there are). To see the hashing problem, in For |
Beta Was this translation helpful? Give feedback.
-
Compiling under the
It seems the disjunct list |
Beta Was this translation helpful? Give feedback.
-
Regarding the hashing collision problem, gdb shows another major bug:
The table size is only void tracon_set_reset(Tracon_set *ss)
{
#if 0
size_t ncount = MAX(ss->count, ss->ocount);
/* Table sizing heuristic: The number of tracons as a function of
* word number is usually first increasing and then decreasing.
* Continue the trend of the last 2 words. */
if (ss->count > ss->ocount)
ncount = ncount * 3 / 4;
else
ncount = ncount * 4 / 3;
unsigned int prime_idx = find_prime_for(ncount);
if (prime_idx < ss->prime_idx) ss->prime_idx = prime_idx;
ss->size = s_prime[ss->prime_idx];
ss->mod_func = prime_mod_func[ss->prime_idx];
#endif
memset(ss->table, 0, ss->size * sizeof(clist_slot));
#if 0
ss->ocount = ss->count;
ss->count = 0;
#endif
ss->available_count = MAX_TRACON_SET_TABLE_SIZE(ss->size);
}
...
(gdb) p *ss
$1 = {size = 916361, count = 484797, available_count = 341020, ocount = 0, table = 0x559b3a090190, prime_idx = 7, mod_func = 0x7fe2978239b9 <fprime7>, shallow = true}
...
^D
...
Bye.
Trace: prt_stat: tracon_set: 7769605 accesses, chain 33.3394 The table size is now bigger, but I don't know if it is big enough. More debugging is needed. As a first step, I will improve the tracon-set stats and debugging code. |
Beta Was this translation helpful? Give feedback.
-
Closing this discussion. Pull req #1487 seems to contain the last of the fixes under discussion. |
Beta Was this translation helpful? Give feedback.
-
I'm having a problem with
pack_sentence_for_pruning()
performance. The dataset I'm working with is pathological, and the correct answer might be "well, just don't use pathological datasets". That's easy to say, but hard to do. So here's he problem report.Here are some snippets of debug output:
Doing some math:
So that's a dramatic slowdown in processing rate, coupled to an explosion in the number of disjuncts to process.
How about connectors/sec?
so a dramatic slowdown in connectors per second.
Performance goes as$N^2$ for N the number of connectors or disjuncts.
Beta Was this translation helpful? Give feedback.
All reactions