Refactor Text2Speech: Keeping speech in memory #11

mateusz-wosinski-ds · 2023-09-07T07:16:08Z

Description:

Keeps speech in memory instead of saving it into the temporary file.

Temporary files will probably stay there forever.
Demonstration notebooks have been updated according to the new implementation
Speech can still be generated and saved to the temporary file using auxiliary method, and also loaded from the path and played using the other.

Twitter handle:

@deepsense_ai, @matt_wosinski

libs/langchain/langchain/tools/audio_utils.py

piotr-grodek-dsai

I left some comments, I would have different design in mind and more breaking changes - but it is mostly fine as removing temporary file is a gain so I am okay with those changes if it needs to stay that way, but please apply fixes for those other comments.

piotr-grodek-dsai · 2023-09-14T12:02:14Z

libs/langchain/langchain/tools/audio_utils.py

+from pathlib import Path
+
+
+def save_audio(audio: bytes) -> str:


It seems bizzare for me, to have this function and use namedTemporaryFile, why it cannot be user provied path?
On the other hand I generally think that likely langchain does not need to have save/load audio functionality - it is not core and maintenance burden. Can we drop it?
Also it is in wrong place - should be moved to utilities.

For example it only works with wav files, someone serious with playing sounds should use dedicated library which can handle many different formats.

I would left those functions in docs - to show how to do it, but would not necessary make them langchain functions, but maybe it is used by some agents so it needs to stay...

piotr-grodek-dsai · 2023-09-14T12:03:26Z

libs/langchain/langchain/tools/eleven_labs/text2speech.py

-                f.write(speech)
-            return f.name
+            self.play(speech)
+            return "Speech has been generated"


~~I think it should just return None or just speech as bytes?~~
After reading code, I see it was created as a tool - it means it was crafted to be directly used within agents etc. I removed my other comments.

I think the best usage would be to have a single tool which can have different implementation provided, similar to PythonREPLTool.
Then you can have utilities which use more useful API to operate with elevenlabs/azure, just an idea.

piotr-grodek-dsai · 2023-09-14T12:16:28Z

libs/langchain/langchain/tools/azure_cognitive_services/text2speech.py


        else:
-            return f"Speech synthesis failed: {result.reason}"
+            raise RuntimeError(f"Speech synthesis failed: {result.reason}")


Aren't you breaking agents by those changes?

add random policy and notebook example

mateusz-wosinski-ds added 11 commits September 6, 2023 09:00

Initial commit for in-memory tts

d5af48d

Update docstring

ffc6e8d

Saving and loading utils

ed8e86a

Fix linters and update notebook

dcb2a20

Fix unit test

eb24bfa

Working version for AzureCogServices

dc44100

Add docstring

2bc6e38

Modify docstring in tts class

c79e399

Merge changes

eb9ebc7

Some changes after rebasing

34e1b7b

Fix linter error

18bd6e9

mateusz-wosinski-ds marked this pull request as ready for review September 13, 2023 10:53

mateusz-wosinski-ds added 2 commits September 13, 2023 12:54

Self-review

069f4c2

Fix linter

6ec0689

eryk-dsai approved these changes Sep 14, 2023

View reviewed changes

libs/langchain/langchain/tools/audio_utils.py Outdated Show resolved Hide resolved

CR comment

befb02d

piotr-grodek-dsai approved these changes Sep 14, 2023

View reviewed changes

mateusz-wosinski-ds added 2 commits September 14, 2023 14:55

CR comments

8bfcb84

Fix import

b525844

eryk-dsai pushed a commit that referenced this pull request Oct 6, 2023

Merge pull request #11 from VowpalWabbit/add_notebook

3a4c895

add random policy and notebook example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Text2Speech: Keeping speech in memory #11

Refactor Text2Speech: Keeping speech in memory #11

mateusz-wosinski-ds commented Sep 7, 2023 •

edited

Loading

piotr-grodek-dsai left a comment

piotr-grodek-dsai Sep 14, 2023

piotr-grodek-dsai Sep 14, 2023

piotr-grodek-dsai Sep 14, 2023

piotr-grodek-dsai Sep 14, 2023

piotr-grodek-dsai Sep 14, 2023

		from pathlib import Path


		def save_audio(audio: bytes) -> str:

Refactor Text2Speech: Keeping speech in memory #11

Are you sure you want to change the base?

Refactor Text2Speech: Keeping speech in memory #11

Conversation

mateusz-wosinski-ds commented Sep 7, 2023 • edited Loading

Description:

Twitter handle:

piotr-grodek-dsai left a comment

Choose a reason for hiding this comment

piotr-grodek-dsai Sep 14, 2023

Choose a reason for hiding this comment

piotr-grodek-dsai Sep 14, 2023

Choose a reason for hiding this comment

piotr-grodek-dsai Sep 14, 2023

Choose a reason for hiding this comment

piotr-grodek-dsai Sep 14, 2023

Choose a reason for hiding this comment

piotr-grodek-dsai Sep 14, 2023

Choose a reason for hiding this comment

mateusz-wosinski-ds commented Sep 7, 2023 •

edited

Loading