Replies: 24 comments 2 replies
-
I think this is a great thing to support! This is a duplicate of #1416 There is a non-ideal work-around. You can make a word list that looks like this:
Adding |
Beta Was this translation helpful? Give feedback.
-
I've thought about flaggedWord, but it's based on recommendations, and my one-to-one is sometimes not in the first place or even at all, especially if the wrong word has a different character count than the recommended word, example:
I used to achieve one-to-one through map. I would go through map first and then take out the problem to the recommendation system. Now we are all going through the recommendation system, there is no option of one to one detection first. Is it possible for us to consider providing a one-to-one map, go through it first, and then go through the recommendation system for the rest? |
Beta Was this translation helpful? Give feedback.
-
I agree with you. Sometimes the words are not even similar, so they would not be suggested. |
Beta Was this translation helpful? Give feedback.
-
Are you caring more about the dictionary file format or supporting suggestions? |
Beta Was this translation helpful? Give feedback.
-
@Jason3S both。 |
Beta Was this translation helpful? Give feedback.
-
As a first pass, I was thinking of going for configuration based first before changing the dictionary format. Relatively speaking, I was assuming the number of suggestion sets to be much smaller than the number of words. So having a simple human readable format would be just fine. |
Beta Was this translation helpful? Give feedback.
-
I want the Trie to be more of an algorithmic consideration, but more of a security consideration。 for personal local use not available to the general public, or a small number of one-to-one recommendations I would consider configuration。 Is that why I say both, and prefer to provide services through dictionaries |
Beta Was this translation helpful? Give feedback.
-
So you have a lot of "sensitive" words to store. I would consider the In any case. I had been thinking about an efficient way to store attributes to words. Suggestions is a form of attribution as well as word usage frequency (can be used to sort suggestions). The idea was a separate section in the For suggestions, it might be like this:
A |
Beta Was this translation helpful? Give feedback.
-
I think your idea is brilliant。 And attributes are not like words, so I think we should use more direct storage, like hastable。 But with Trie's O(k) nature, I think it might be fun to do it with a Trie that has separate words and attributes, and it could be unified |
Beta Was this translation helpful? Give feedback.
-
Example Word Id: Trie from cspell-trie README.md
|
Beta Was this translation helpful? Give feedback.
-
I was talking about the file format. In memory attributes would be expanded to be a hash table for speed. Suggestions or word frequency are just attributes on a word. They should not be in the Trie structure, but exist in parallel. |
Beta Was this translation helpful? Give feedback.
-
I thought about it during my one-on-one recommendation yesterday, but I always thought it wasn't very clean. When we have multiple different types of properties the relationship should be many-to-many, will we be faced with multiple special symbol segmentation analyses when we design this way? And bad play can even affect the selection of recommendations。 I did not have a good understanding of the design idea of our project, nor did I find the developer documentation, so I read the code to get a general understanding. The only thing I am worried about now is whether your good suggestions will be misled by me because OF my incomplete understanding of the current structure of the project。 Is there any way that we can make a rough draft and try it out together to see if it works the way we want it to? |
Beta Was this translation helpful? Give feedback.
-
Would it be cleaner to set up two tries and write a file like the current Trie head and read it into memory and then change it to another structure than to use one trie and cram everything in? some thing like this:
|
Beta Was this translation helpful? Give feedback.
-
If you do use Head text to describe attribute information, I recommend using the Toml specification |
Beta Was this translation helpful? Give feedback.
-
I think we have a bit of a bias in our understanding of attributes. One-to-one recommendation, frequency of word use I don't think can be treated as attributes, or special treatment and need to be bundled with the source word. By attributes I mean a small number of different types of tags for a word:
etc. The Trie should point to the ID of these properties, not store it, that's what I mean |
Beta Was this translation helpful? Give feedback.
-
Can our tool provide developer guidance documentation? Help us learn, help us implement some functions when we can. For me, I can implement it in other languages, such as Python, Golan |
Beta Was this translation helpful? Give feedback.
-
Word attributes cannot be in the Trie because it will explode the size of the Trie. All attribute need to be in a separate data structure. This is because the Trie is not a pure Trie. On the outside, it looks like a standard Trie. But to preserve space, all words that share the same suffix set use the same trie node. In the Trie, the words |
Beta Was this translation helpful? Give feedback.
-
I think sharing substrings is a great optimization, but it's possible that other Trie features don't work as well。 But in the end I was a little hesitant about whether it was wise for us to play around with the Trie structure. After all, the two sides of the Trie structure are recommendations. We kind of force it to do other things and it forces ourselves I think our RFC is more powerful, and I think it's probably smarter for us to focus on that |
Beta Was this translation helpful? Give feedback.
-
Your diagram is an excellent example of the indexing. |
Beta Was this translation helpful? Give feedback.
-
Meaning to add words like: |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
For suggestions, I was planning on using a word list like this: Wikipedia:Lists of common misspellings/For machines - Wikipedia Notice that there can be multiple suggestions for a word. It would be possible to take:
and turn it into:
This is logicly the same as:
Which looks like:
There are a few challenges:
|
Beta Was this translation helpful? Give feedback.
-
Use {
"flagWords": [
["whitelist", "allowlist"],
["blacklist", "denylist"],
// other words
]
} I do not plan on supporting the use of RegExp replacements. It would make spell checking very very slow. That is more of a "find-and-replace" type of operation. For large word lists, that would be very expensive. In the longer term, I would rather have suggestion lists act like dictionaries. Dictionaries can be turned on / off or even replaced by defining a new dictionary with the same name. My plan was to go with simple text based suggestion lists that can be |
Beta Was this translation helpful? Give feedback.
-
See #2247 |
Beta Was this translation helpful? Give feedback.
-
I have my own collection of misspelled words, and how do I explicitly designate misspelled words through the Trie dictionary when I'm sure that 100% of the current word is misspelled and I know that 100% of it should be a certain word, rather than by traversing the binary tree?
For example:
If there is a way like this:
[adam,adem]
When encountering the word "Adam" and flagged as an error, its recommendation is adem
Beta Was this translation helpful? Give feedback.
All reactions