Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Word2VecKeyedVectors' object has no attribute 'get_mean_vector' #19

Open
shiv425 opened this issue Nov 3, 2022 · 4 comments
Open

'Word2VecKeyedVectors' object has no attribute 'get_mean_vector' #19

shiv425 opened this issue Nov 3, 2022 · 4 comments

Comments

@shiv425
Copy link

shiv425 commented Nov 3, 2022

while converting tokens to vector for complete sentence in preprocess_and_vectorize method ,got error "'Word2VecKeyedVectors' object has no attribute 'get_mean_vector'"

@shiv425
Copy link
Author

shiv425 commented Nov 3, 2022

i tried to convert each token in vector and then to take mean using np.mean..but while converting df['Text'] to vector form getting errors like "Key 'u.s.-based' not present","Key ' ' not present","Key '2018' not present" etc..please help.

@elandil2
Copy link

elandil2 commented Nov 3, 2022

I think he used old version of gensim library, from 3.8 to 4.0 a lot of attributes changed. I also facing same issues tried couple of thing but it didnt help at all. Poorly documentated library to be honest im seaching hours and couldnt find anything useful.

@shiv425
Copy link
Author

shiv425 commented Nov 4, 2022

`def preprocess_and_vectorize(text):
# remove stop words and lemmatize the text
doc = nlp(text)
filtered_tokens = []
arr = []
for token in doc:
if token.is_stop or token.is_punct:
continue
filtered_tokens.append(token.lemma_)
for token in filtered_tokens:
try:
arr.append(wv[token])
except:
continue

return np.mean(arr,axis=0)`

used this code.used try catch because many words have no vector in WV.

@meet5398
Copy link

meet5398 commented May 5, 2023

Solution to the problem

This is the alternative I have found for this problem and it's working

import spacy
import numpy as np
nlp=spacy.load("en_core_web_lg")
def preprocess_and_vectorize(text):
doc = nlp(text)
filtered_tokens = []
for token in doc:
if token.is_punct or token.is_stop:
continue
filtered_tokens.append(token.lemma_)
return np.mean(wv[filtered_tokens])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants