Counting tokens with the repeated words cause MemoryError #694

saiprabhakar · 2025-01-14T02:20:40Z

Version: 2.0.10
Python: 3.11.11

Counting tokens with repeated words causes MemoryError.

Code:

from langchain_google_vertexai import ChatVertexAI

llm = ChatVertexAI(
    project=project,
    location=location,
    model_name=model_name,
)
    
test_m1 = """Philosophy"""
test_m1 = " ".join([test_m1]*2000)
ntokens = llm.get_num_tokens(test_m1)

Error:

	"name": "MemoryError",
	"message": "",
	"stack": "---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
Cell In[47], line 1
----> 1 ntokens = llm.get_num_tokens(test_m1)
      2 # output = llm.invoke([HumanMessage(content=message)]).__dict__
      3 ntokens#, output

File ~/git/genai-patient-timeline/.venv/lib/python3.11/site-packages/langchain_google_vertexai/chat_models.py:1421, in ChatVertexAI.get_num_tokens(self, text)
   1418 \"\"\"Get the number of tokens present in the text.\"\"\"
   1419 if self._is_gemini_model:
   1420     # https://cloud.google.com/vertex-ai/docs/reference/rpc/google.cloud.aiplatform.v1beta1#counttokensrequest
-> 1421     _, contents = _parse_chat_history_gemini([HumanMessage(content=text)])
   1422     response = self.prediction_client.count_tokens(
   1423         {
   1424             \"endpoint\": self.full_model_name,
   (...)
   1427         }
   1428     )
   1429     return response.total_tokens

File ~/git/genai-patient-timeline/.venv/lib/python3.11/site-packages/langchain_google_vertexai/chat_models.py:307, in _parse_chat_history_gemini(history, project, convert_system_message_to_human)
    305 prev_ai_message = None
    306 role = \"user\"
--> 307 parts = _convert_to_parts(message)
    308 if system_parts is not None:
    309     parts = system_parts + parts

File ~/git/genai-patient-timeline/.venv/lib/python3.11/site-packages/langchain_google_vertexai/chat_models.py:261, in _parse_chat_history_gemini.<locals>._convert_to_parts(message)
    259 if isinstance(raw_content, str):
    260     try:
--> 261         raw_content = ast.literal_eval(raw_content)
    262     except SyntaxError:
    263         pass

File /opt/conda/envs/python311-env/lib/python3.11/ast.py:64, in literal_eval(node_or_string)
     55 \"\"\"
     56 Evaluate an expression node or a string containing only a Python
     57 expression.  The string or node provided may only consist of the following
   (...)
     61 Caution: A complex expression can overflow the C stack and cause a crash.
     62 \"\"\"
     63 if isinstance(node_or_string, str):
---> 64     node_or_string = parse(node_or_string.lstrip(\" \\t\"), mode='eval')
     65 if isinstance(node_or_string, Expression):
     66     node_or_string = node_or_string.body

File /opt/conda/envs/python311-env/lib/python3.11/ast.py:50, in parse(source, filename, mode, type_comments, feature_version)
     48     feature_version = -1
     49 # Else it should be an int giving the minor version for 3.x.
---> 50 return compile(source, filename, mode, flags,
     51                _feature_version=feature_version)

MemoryError: "
}

These two works though:

test_m1 = """Philosophy is"""
test_m1 = " ".join([test_m1]*2000)

test_m1 = """Philosophy."""
test_m1 = " ".join([test_m1]*2000)

The text was updated successfully, but these errors were encountered:

lkuligin · 2025-01-14T09:42:30Z

it looks like it's failing because of ast parser (and this is need for agentic workflows):

langchain-google/libs/vertexai/langchain_google_vertexai/chat_models.py

Line 252 in f94a27f

# If a user sends a multimodal request with agents, then the full input

we could add a flag "evaluate_expression=True", but I'm not sure it's worth it

@baskaryan @efriis any thoughts on that? have you run into issues like that with other integrations?

langcarl bot added the investigate label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Counting tokens with the repeated words cause MemoryError #694

Counting tokens with the repeated words cause MemoryError #694

saiprabhakar commented Jan 14, 2025

lkuligin commented Jan 14, 2025

Counting tokens with the repeated words cause MemoryError #694

Counting tokens with the repeated words cause MemoryError #694

Comments

saiprabhakar commented Jan 14, 2025

lkuligin commented Jan 14, 2025