-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Text2Speech: Keeping speech in memory #11
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, I would have different design in mind and more breaking changes - but it is mostly fine as removing temporary file is a gain so I am okay with those changes if it needs to stay that way, but please apply fixes for those other comments.
from pathlib import Path | ||
|
||
|
||
def save_audio(audio: bytes) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems bizzare for me, to have this function and use namedTemporaryFile, why it cannot be user provied path?
On the other hand I generally think that likely langchain does not need to have save/load audio functionality - it is not core and maintenance burden. Can we drop it?
Also it is in wrong place - should be moved to utilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example it only works with wav files, someone serious with playing sounds should use dedicated library which can handle many different formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would left those functions in docs - to show how to do it, but would not necessary make them langchain functions, but maybe it is used by some agents so it needs to stay...
f.write(speech) | ||
return f.name | ||
self.play(speech) | ||
return "Speech has been generated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should just return None or just speech as bytes?
After reading code, I see it was created as a tool - it means it was crafted to be directly used within agents etc. I removed my other comments.
I think the best usage would be to have a single tool which can have different implementation provided, similar to PythonREPLTool.
Then you can have utilities which use more useful API to operate with elevenlabs/azure, just an idea.
|
||
else: | ||
return f"Speech synthesis failed: {result.reason}" | ||
raise RuntimeError(f"Speech synthesis failed: {result.reason}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't you breaking agents by those changes?
add random policy and notebook example
Description:
Keeps speech in memory instead of saving it into the temporary file.
Twitter handle:
@deepsense_ai, @matt_wosinski