-
-
Notifications
You must be signed in to change notification settings - Fork 471
Getting Started
- Preparation
-
Tutorial
- Hello World!
- Official Solution
- Official Solution (GPU)
- 🚧 Other Topics
- Side Packet
- OutputStream API
This plugin requires native libraries (e.g. libmediapipe_c.so
, mediapipe_c.dll
, mediapipe_android.aar
, etc...) to work, but they are not included in the repository.
If you've not built them yet, go to https://github.com/homuler/MediaPipeUnityPlugin/wiki/Installation-Guide first.
Before using the plugin in your project, it's strongly recommended that you check if it works in this project.
First, open Assets/MediaPipeUnity/Samples/Scenes/Start Scene.unity
.
And play the scene.
If you've built the plugin successfully, the Face Detection sample will start after a while.
Once you've built the plugin, you can import it into your project. Choose your favorite method from the following options.
-
Open this project
-
Click
Tools > Export Unitypackage
-
MediaPipeUnity.[version].unitypackage
file will be created at the project root.
-
-
Open your project
- Install
npm
command - Build a tarball file
cd Packages/com.github.homuler.mediapipe npm pack # com.github.homuler.mediapipe-[version].tgz will be created mv com.github.homuler.mediapipe-[version].tgz your/favorite/path
- Install the package from the tarball file
⚠️ Development with Git submodules tends to be a bit more complicated.
- Add a submodule
mkdir Submodules cd Submodules git submodule add https://github.com/homuler/MediaPipeUnityPlugin
- Build the plugin
cd MediaPipeUnityPlugin python build.py build ...
- Install the package from
Submodules/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe
🔔 If you are not familiar with MediaPipe, you may want to read the Framework Concepts article first.
☠️ On Windows, some of the following code may crash your UnityEditor. See https://github.com/homuler/MediaPipeUnityPlugin/wiki/FAQ#unityeditor-crashes for more details.
Let's write our first program!
🔔 The following code is based on mediapipe/examples/desktop/examples/hello_world/hello_world.cc.
🔔 You can use Tutorial/Hello World as a template.
To run the Calculators provided by MediaPipe, we usually need to initialize a CalculatorGraph
, so let's do that first!
🔔 Each
CalculatorGraph
has its own config (CalculatorGraphConfig
).
var configText = @"
input_stream: ""in""
output_stream: ""out""
node {
calculator: ""PassThroughCalculator""
input_stream: ""in""
output_stream: ""out1""
}
node {
calculator: ""PassThroughCalculator""
input_stream: ""out1""
output_stream: ""out""
}
";
var graph = new CalculatorGraph(configText);
To run a CalculatorGraph
, call the StartRun
method.
graph.StartRun();
Note that the StartRun
method throws iff the result is not OK.
After starting, of course we want to give inputs to the CalculatorGraph
, right?
Let's say we want to give a sequence of 10 strings ("Hello World!") as input.
for (var i = 0; i < 10; i++)
{
// Send input to running graph
}
In MediaPipe, input is passed through a class called Packet
.
var input = Packet.CreateString("Hello World!");
To pass an input Packet
to the CalculatorGraph
, we can use CalculatorGraph#AddPacketToInputStream
.
Note that the only input stream name of this CalculatorGraph
is in
.
🔔 It depends on the
CalculatorGraphConfig
.CalculatorGraph
can multiple input streams
for (var i = 0; i < 10; i++)
{
var input = Packet.CreateString("Hello World!");
graph.AddPacketToInputStream("in", input);
}
CalculatorGraph#AddPacketToInputStream
may also throw an exception, for instance, when the input type is invalid.
After everything is done, we should
- close input streams
- dispose of the
CalculatorGraph
so let's do that.
graph.CloseInputStream("in");
graph.WaitUntilDone();
graph.Dispose();
For now, let's just run the code we've written so far.
Save the following code as HelloWorld.cs
, attach it to an empty GameObject
and play the scene.
using UnityEngine;
namespace Mediapipe.Unity.Tutorial
{
public class HelloWorld : MonoBehaviour
{
private void Start()
{
var configText = @"
input_stream: ""in""
output_stream: ""out""
node {
calculator: ""PassThroughCalculator""
input_stream: ""in""
output_stream: ""out1""
}
node {
calculator: ""PassThroughCalculator""
input_stream: ""out1""
output_stream: ""out""
}
";
var graph = new CalculatorGraph(configText);
graph.StartRun();
for (var i = 0; i < 10; i++)
{
var input = Packet.CreateString("Hello World!");
graph.AddPacketToInputStream("in", input);
}
graph.CloseInputStream("in");
graph.WaitUntilDone();
graph.Dispose();
Debug.Log("Done");
}
}
}
Oops, I see an error.
MediaPipeException: INVALID_ARGUMENT: Graph has errors:
; In stream "in", timestamp not specified or set to illegal value: Timestamp::Unset()
at Mediapipe.Status.AssertOk () [0x00014] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/Port/Status.cs:50
at Mediapipe.Unity.Tutorial.HelloWorld.Start () [0x00025] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Assets/MediaPipeUnity/Tutorial/Hello World/HelloWorld.cs:35
Each input packet should have a timestamp, but it does not appear to be set.
Let's fix the code that initializes a Packet
as follows.
// var input = Packet.CreateString("Hello World!");
var input = Packet.CreateStringAt("Hello World!", i);
This time it seems to work.
But wait, we are not receiving the CalculatorGraph
output!
To get output, we need to do more work before running the CalculatorGraph
.
Note that the only output stream name of this CalculatorGraph
is out
.
🔔 It depends on the
CalculatorGraphConfig
.CalculatorGraph
can multiple output streams.
var graph = new CalculatorGraph(configText);
// Initialize an `OutputStreamPoller`. Note that the output type is string.
var poller = graph.AddOutputStreamPoller<string>("out");
graph.StartRun();
Then, we can get output using the OutputStreamPoller#Next
.
Like inputs, outputs must be received through packets.
graph.CloseInputStream("in");
// Initialize an empty packet
var output = new Packet<string>();
while (poller.Next(output))
{
Debug.Log(output.Get());
}
graph.WaitUntilDone();
Now, our code would look like this.
Note that OutputStreamPoller
and Packet
also need to be disposed.
using UnityEngine;
namespace Mediapipe.Unity.Tutorial
{
public class HelloWorld : MonoBehaviour
{
private void Start()
{
var configText = @"
input_stream: ""in""
output_stream: ""out""
node {
calculator: ""PassThroughCalculator""
input_stream: ""in""
output_stream: ""out1""
}
node {
calculator: ""PassThroughCalculator""
input_stream: ""out1""
output_stream: ""out""
}
";
var graph = new CalculatorGraph(configText);
var poller = graph.AddOutputStreamPoller<string>("out");
graph.StartRun();
for (var i = 0; i < 10; i++)
{
var input = Packet.CreateStringAt("Hello World!", i);
graph.AddPacketToInputStream("in", input);
}
graph.CloseInputStream("in");
var output = new Packet<string>();
while (poller.Next(output))
{
Debug.Log(output.Get());
}
graph.WaitUntilDone();
poller.Dispose();
graph.Dispose();
output.Dispose();
Debug.Log("Done");
}
}
}
What happens if the config format is invalid?
var graph = new CalculatorGraph("invalid format");
Hmm, the constructor fails, which is probably the behavior it should be.
Let's check Editor.log
.
[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/text_format.cc:335] Error parsing text-format mediapipe.CalculatorGraphConfig: 1:9: Message type "mediapipe.CalculatorGraphConfig" has no field named "invalid".
MediaPipeException: Failed to parse config text. See error logs for more details
at Mediapipe.CalculatorGraphConfigExtension.ParseFromTextFormat (Google.Protobuf.MessageParser`1[T] _, System.String configText) [0x0001e] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/CalculatorGraphConfigExtension.cs:21
at Mediapipe.CalculatorGraph..ctor (System.String textFormatConfig) [0x00000] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/CalculatorGraph.cs:33
at Mediapipe.Unity.Tutorial.HelloWorld.Start () [0x00000] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Assets/MediaPipeUnity/Tutorial/Hello World/HelloWorld.cs:31
Not too bad, but it's inconvenient to check Editor.log
every time.
Let's fix it so that the logs are visible in the Console Window.
Protobuf.SetLogHandler(Protobuf.DefaultLogHandler);
var graph = new CalculatorGraph("invalid format");
Great!
But there's a minor but serious bug that can cause SIGSEGV.
Don't forget to restore the default LogHandler
when the application exits.
void OnApplicationQuit()
{
Protobuf.ResetLogHandler();
}
In this section, let's try running the Face Mesh Solution.
🔔 You can use Tutorial/Official Solution as a template.
First, let's display the Web Camera image on the screen.
🔔 The code in this section is already saved to
Tutorial/Official Solution/FaceMesh.cs
using System.Collections;
using UnityEngine;
using UnityEngine.UI;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
private WebCamTexture _webCamTexture;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_screen.texture = _webCamTexture;
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
}
}
}
If everything is fine, your screen will look like this.
Now let's try face_mesh_desktop_live.pbtxt
, the official Face Mesh sample!
⚠️ To run the graph, you must build native libraries with GPU disabled.
🔔
face_mesh_desktop_live.pbtxt
is saved asTutorial/Official Solution/face_mesh_desktop_live.txt
.
First, initialize a CalculatorGraph
as in the Hello World example.
var graph = new CalculatorGraph(_configAsset.text);
graph.StartRun();
In MediaPipe, image data on the CPU is stored in a class called ImageFrame
.
Let's initialize an ImageFrame
instance from the WebCamTexture
image.
🔔 On the other hand, image data on the GPU is stored in a class called
GpuBuffer
.
We can initialize an ImageFrame
instance using NativeArray<byte>
.
Here, although not the best from the perspective of the performance, we will copy the WebCamTexture
data to Texture2D
to obtain a NativeArray<byte>
.
Texture2D inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
Color32[] pixelData = new Color32[_width * _height];
while (true)
{
inputTexture.SetPixels32(_webCamTexture.GetPixels32(pixelData));
yield return new WaitForEndOfFrame();
}
Now we can initialize an ImageFrame
instance using inputTexture
.
⚠️ In theory, you can buildImageFrame
instances using various formats, but not allCalculator
s necessarily support all formats. As for official solutions, they often work only with RGBA32 format.
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, inputTexture.GetRawTextureData<byte>());
The 4th argument, widthStep
, may require some explanation.
It's the byte offset between a pixel value and the same pixel and channel in the next row.
In most cases, this is equal to the product of the width and the number of channels.
As usual, initialize a Packet
and send it to the CalculatorGraph
.
Note that the input stream name is "input_video"
and the input type is ImageFrame
this time.
graph.AddPacketToInputStream("input_video", Packet.CreateImageFrame(imageFrame));
We should stop the CalculatorGraph
on the OnDestroy
event.
With a little refactoring, the code now looks like this.
using System.Collections;
using UnityEngine;
using UnityEngine.UI;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
private CalculatorGraph _graph;
private WebCamTexture _webCamTexture;
private Texture2D _inputTexture;
private Color32[] _pixelData;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_screen.texture = _webCamTexture;
_inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_pixelData = new Color32[_width * _height];
_graph = new CalculatorGraph(_configAsset.text);
_graph.StartRun();
while (true)
{
_inputTexture.SetPixels32(_webCamTexture.GetPixels32(_pixelData));
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
_graph.AddPacketToInputStream("input_video", Packet.CreateImageFrame(imageFrame));
yield return new WaitForEndOfFrame();
}
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
if (_graph != null)
{
try
{
_graph.CloseInputStream("input_video");
_graph.WaitUntilDone();
}
finally
{
_graph.Dispose();
_graph = null;
}
}
}
}
}
Let's play the scene!
Well, it's not so easy, is it?
MediaPipeException: INVALID_ARGUMENT: Graph has errors: Calculator::Open() for node "facelandmarkfrontcpu__facelandmarkcpu__facelandmarksmodelloader__LocalFileContentsCalculator" failed: ; Can't find file: mediapipe/modules/face_landmark/face_landmark_with_attention.tflite
It looks like LocalFileContentsCalculator
failed to load face_landmark_with_attention.tflite
.
In the next section, we will resolve this error.
⚠️ If you get error messages like the following, go to Officla Solution (GPU).F20220418 11:58:05.626176 230087 calculator_graph.cc:126] Non-OK-status: Initialize(config) status: NOT_FOUND: ValidatedGraphConfig Initialization failed. No registered object with name: FaceLandmarkFrontCpu; Unable to find Calculator "FaceLandmarkFrontCpu" No registered object with name: FaceRendererCpu; Unable to find Calculator "FaceRendererCpu"
To load model files on Unity, we need to resolve their paths because they are hardcoded.
Not only that, we even need to save the file in a specific path because some calculators will read the required resources from the file system.
💡 The path to save is not fixed since we can translate each model path into an arbitrary path.
But don't worry. In most cases, all you need to do is initialize a ResourceManager
class and call the PrepareAssetAsync
method in advance.
💡
PrepareAssetAsync
method will save the specified file underApplication.persistentDataPath
.
For testing purposes, the LocalResourceManager
class is sufficient.
var resourceManager = new LocalResourceManager();
yield return resourceManager.PrepareAssetAsync("dependent_asset_name");
In development / production, you can choose either StreamingAssetResourceManager
or AssetBundleResourceManager
.
For example, StreamingAssetResourceManager
will load model files from Application.streamingAssetsPath
.
⚠️ To useStreamingAssetResourceManager
, you need to place dependent assets underAssets/StreamingAssets
. You can usually copy those assets fromPackages/com.github.homuler.mediapipe/Runtime/Resources
.
var resourceManager = new StreamingAssetsResourceManager();
yield return resourceManager.PrepareAssetAsync("dependent_asset_name");
⚠️ ResourceManager
class can be initialized only once. As a consequence, you cannot use bothStreamingAssetResourceManager
andAssetBundleResourceManager
in one application.
Now, let's get back to the code.
After trial and error, we find that we need to prepare files face_detection_short_range.tflite
and face_landmark_with_attention.tflite
.
Unity does not support .tflite
extension, so this plugin adopts the .bytes
extension instead.
🔔 A
.bytes
file has the same contents as the corresponding.tflite
file, just with a different extension.
Now the entire code will look like this.
using System.Collections;
using UnityEngine;
using UnityEngine.UI;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
private CalculatorGraph _graph;
private ResourceManager _resourceManager;
private WebCamTexture _webCamTexture;
private Texture2D _inputTexture;
private Color32[] _pixelData;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_screen.texture = _webCamTexture;
_inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_pixelData = new Color32[_width * _height];
_resourceManager = new LocalResourceManager();
yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");
_graph = new CalculatorGraph(_configAsset.text);
_graph.StartRun();
while (true)
{
_inputTexture.SetPixels32(_webCamTexture.GetPixels32(_pixelData));
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
_graph.AddPacketToInputStream("input_video", Packet.CreateImageFrame(imageFrame));
yield return new WaitForEndOfFrame();
}
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
if (_graph != null)
{
try
{
_graph.CloseInputStream("input_video");
_graph.WaitUntilDone();
}
finally
{
_graph.Dispose();
_graph = null;
}
}
}
}
}
What will be the result this time...?
Oops, once again I forgot to set the timestamp.
But what value should I set for the timestamp this time?
In the Hello World example, the loop variable i
was set to the value of timestamp.
In practice, however, MediaPipe assumes that the value of timestamp is in microseconds (cf.mediapipe/framework/timestamp.h).
🔔 There are calculators that care about the absolute value of the timestamp. Such calculators will behave unintentionally if the timestamp value is not in microseconds.
Let's use the elapsed time in microseconds since startup as a timestamp.
using Stopwatch = System.Diagnostics.Stopwatch;
var stopwatch = new Stopwatch();
stopwatch.Start();
var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
And the entire code:
using System.Collections;
using UnityEngine;
using UnityEngine.UI;
using Stopwatch = System.Diagnostics.Stopwatch;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
private CalculatorGraph _graph;
private ResourceManager _resourceManager;
private WebCamTexture _webCamTexture;
private Texture2D _inputTexture;
private Color32[] _pixelData;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_screen.texture = _webCamTexture;
_inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_pixelData = new Color32[_width * _height];
_resourceManager = new LocalResourceManager();
yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");
var stopwatch = new Stopwatch();
_graph = new CalculatorGraph(_configAsset.text);
_graph.StartRun();
stopwatch.Start();
while (true)
{
_inputTexture.SetPixels32(_webCamTexture.GetPixels32(_pixelData));
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
_graph.AddPacketToInputStream("input_video", Packet.CreateImageFrameAt(imageFrame, currentTimestamp));
yield return new WaitForEndOfFrame();
}
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
if (_graph != null)
{
try
{
_graph.CloseInputStream("input_video");
_graph.WaitUntilDone();
}
finally
{
_graph.Dispose();
_graph = null;
}
}
}
}
}
Now, it seems to be working.
But of course, we want to receive output next.
In the Hello World example, we initialized OutputStreamPoller
using CalculatorGraph#AddOutputStreamPoller
.
This time, to handle output more easily, let's use the OutputStream API provided by the plugin instead!
var graph = new CalculatorGraph(_configAsset.text);
var outputVideoStream = new OutputStreasm<ImageFrame>(graph, "output_video");
Before running the CalculatorGraph
, call StartPolling
.
outputVideoStream.StartPolling();
_graph.StartRun();
To get the next output, call WaitNextAsync
.
It returns when the next output is retrieved or some error occurred.
var task = outputVideoStream.WaitNextAsync();
yield return new WaitUntil(() => task.IsCompleted);
if (task.Result.ok)
{
// ...
}
This time, let's display the output image directly on the screen.
We can read the pixel data using ImageFrame#TryReadPixelData
.
// NOTE: TryReadPixelData is implemented in `Mediapipe.Unity.ImageFrameExtension`.
// using Mediapipe.Unity;
var outputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
var outputPixelData = new Color32[_width * _height];
_screen.texture = outputTexture;
var task = outputVideoStream.WaitNextAsync();
yield return new WaitUntil(() => task.IsCompleted);
if (!task.Result.ok)
{
throw new Exception("Something went wrong");
}
var outputPacket = task.Result.packet;
if (outputPacket != null)
{
var outputVideo = outputPacket.Get();
if (outputVideo.TryReadPixelData(outputPixelData))
{
outputTexture.SetPixels32(outputPixelData);
outputTexture.Apply();
}
}
Now our code should look something like this.
using System.Collections;
using UnityEngine;
using UnityEngine.UI;
using Stopwatch = System.Diagnostics.Stopwatch;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
private CalculatorGraph _graph;
private OutputStream<ImageFrame> _outputVideoStream;
private ResourceManager _resourceManager;
private WebCamTexture _webCamTexture;
private Texture2D _inputTexture;
private Color32[] _inputPixelData;
private Texture2D _outputTexture;
private Color32[] _outputPixelData;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_inputPixelData = new Color32[_width * _height];
_outputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_outputPixelData = new Color32[_width * _height];
_screen.texture = _outputTexture;
_resourceManager = new LocalResourceManager();
yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");
var stopwatch = new Stopwatch();
_graph = new CalculatorGraph(_configAsset.text);
_outputVideoStream = new OutputStream<ImageFrame>(_graph, "output_video");
_outputVideoStream.StartPolling();
_graph.StartRun();
stopwatch.Start();
while (true)
{
_inputTexture.SetPixels32(_webCamTexture.GetPixels32(_inputPixelData));
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
_graph.AddPacketToInputStream("input_video", Packet.CreateImageFrameAt(imageFrame, currentTimestamp));
var task = _outputVideoStream.WaitNextAsync();
yield return new WaitUntil(() => task.IsCompleted);
if (!task.Result.ok)
{
throw new Exception("Something went wrong");
}
var outputPacket = task.Result.packet;
if (outputPacket != null)
{
var outputVideo = outputPacket.Get();
if (outputVideo.TryReadPixelData(outputPixelData))
{
outputTexture.SetPixels32(outputPixelData);
outputTexture.Apply();
}
}
}
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
_outputVideoStream?.Dispose();
_outputVideoStream = null;
if (_graph != null)
{
try
{
_graph.CloseInputStream("input_video");
_graph.WaitUntilDone();
}
finally
{
_graph.Dispose();
_graph = null;
}
}
}
}
}
Let's try running!
Hmm, it seems to be working, but the top and bottom appear to be reversed.
In Unity, the pixel data is stored from bottom-left to top-right, whereas MediaPipe assumes the pixel data is stored from top-left to bottom-right.
Therefore, if you send the pixel data to MediaPipe as is, MediaPipe will receive an upside-down image.
🔔
ImageFrame#TryReadPixelData
automatically reads pixels upside down, so the output image is received correctly.
You can flip the input image vertically by yourself, but here we will use ImageTransformationCalculator
.
node: {
calculator: "ImageTransformationCalculator"
input_stream: "IMAGE:throttled_input_video"
output_stream: "IMAGE:transformed_input_video"
node_options: {
[type.googleapis.com/mediapipe.ImageTransformationCalculatorOptions] {
flip_vertically: true
}
}
}
Don't forget to replace throttled_input_video
with transformed_input_video
.
# Subgraph that detects faces and corresponding landmarks.
node {
calculator: "FaceLandmarkFrontCpu"
- input_stream: "IMAGE:throttled_input_video"
+ input_stream: "IMAGE:transformed_input_video"
input_side_packet: "NUM_FACES:num_faces"
input_side_packet: "WITH_ATTENTION:with_attention"
output_stream: "LANDMARKS:multi_face_landmarks"
# Subgraph that renders face-landmark annotation onto the input image.
node {
calculator: "FaceRendererCpu"
- input_stream: "IMAGE:throttled_input_video"
+ input_stream: "IMAGE:transformed_input_video"
input_stream: "LANDMARKS:multi_face_landmarks"
input_stream: "NORM_RECTS:face_rects_from_landmarks"
input_stream: "DETECTIONS:face_detections"
This time it should work correctly.
This time, let's try to get landmark positions from the "multi_face_landmarks"
stream.
The output type is List<NormalizedLandmarkList>
(std::vector<NormalizedLandmarkList>
in C++), so we can initialize the OutputStream
like this.
var multiFaceLandmarksStream = new OutputStream<List<NormalizedLandmarkList>>(_graph, "multi_face_landmarks");
multiFaceLandmarksStream.StartPolling();
As with the "output_video"
stream, we can receive the result using OutputStream#WaitNextAsync
.
var task = multiFaceLandmarksStream.WaitNextAsync();
yield return new WaitUntil(() => task.IsCompleted);
Note that the output values are in the Image Coordinate System.
To convert them to the Unity local coordinates, you can use methods defined in ImageCoordinateSystem
.
🔔 Which coordinate system the output is based on depends on the solution. For example, The Objectron Graph uses the Camera Coordinate System instead of the Image Coordinate System.
// using Mediapipe.Unity.CoordinateSystem;
var screenRect = _screen.GetComponent<RectTransform>().rect;
var position = screenRect.GetPoint(normalizedLandmark);
Here is the sample code and the result.
using System.Collections;
using System.Collections.Generic;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.UI;
using Mediapipe.Unity.CoordinateSystem;
using Stopwatch = System.Diagnostics.Stopwatch;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
private CalculatorGraph _graph;
private OutputStream<ImageFrame> _outputVideoStream;
private OutputStream<List<NormalizedLandmarkList>> _multiFaceLandmarksStream;
private ResourceManager _resourceManager;
private WebCamTexture _webCamTexture;
private Texture2D _inputTexture;
private Color32[] _inputPixelData;
private Texture2D _outputTexture;
private Color32[] _outputPixelData;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_inputPixelData = new Color32[_width * _height];
_outputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_outputPixelData = new Color32[_width * _height];
_screen.texture = _outputTexture;
_resourceManager = new LocalResourceManager();
yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");
var stopwatch = new Stopwatch();
_graph = new CalculatorGraph(_configAsset.text);
_outputVideoStream = new OutputStream<ImageFrame>(_graph, "output_video");
_multiFaceLandmarksStream = new OutputStream<List<NormalizedLandmarkList>>(_graph, "multi_face_landmarks");
_outputVideoStream.StartPolling();
_multiFaceLandmarksStream.StartPolling();
_graph.StartRun();
stopwatch.Start();
var screenRect = _screen.GetComponent<RectTransform>().rect;
while (true)
{
_inputTexture.SetPixels32(_webCamTexture.GetPixels32(_inputPixelData));
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
_graph.AddPacketToInputStream("input_video", Packet.CreateImageFrameAt(imageFrame, currentTimestamp));
var task1 = _outputVideoStream.WaitNextAsync();
var task2 = _multiFaceLandmarksStream.WaitNextAsync();
var task = Task.WhenAll(task1, task2);
yield return new WaitUntil(() => task.IsCompleted);
if (!task1.Result.ok || !task2.Result.ok)
{
throw new System.Exception("Something went wrong");
}
var outputVideoPacket = task1.Result.packet;
if (outputVideoPacket != null)
{
var outputVideo = outputVideoPacket.Get();
if (outputVideo.TryReadPixelData(_outputPixelData))
{
_outputTexture.SetPixels32(_outputPixelData);
_outputTexture.Apply();
}
}
var multiFaceLandmarksPacket = task2.Result.packet;
if (multiFaceLandmarksPacket != null)
{
var multiFaceLandmarks = multiFaceLandmarksPacket.Get(NormalizedLandmarkList.Parser);
if (multiFaceLandmarks != null && multiFaceLandmarks.Count > 0)
{
foreach (var landmarks in multiFaceLandmarks)
{
// top of the head
var topOfHead = landmarks.Landmark[10];
Debug.Log($"Unity Local Coordinates: {screenRect.GetPoint(topOfHead)}, Image Coordinates: {topOfHead}");
}
}
}
}
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
_outputVideoStream?.Dispose();
_outputVideoStream = null;
_multiFaceLandmarksStream?.Dispose();
_multiFaceLandmarksStream = null;
if (_graph != null)
{
try
{
_graph.CloseInputStream("input_video");
_graph.WaitUntilDone();
}
finally
{
_graph.Dispose();
_graph = null;
}
}
}
}
}
You may want to render annotations on the screen to verify that the graph is working correctly.
This section describes how to do that using the plugin.
First, create an empty object (let us call it Annotation Layer
) below the Screen
object.
🔔 If you want to render multiple annotations, this
Annotation Layer
object will be the parent of all those annotations.
Remove RectTransform
from Annotation Layer
, since it will be configured automatically later.
Second, drag and drop the Multi FaceLandmarkList Annotation
object below Annotation Layer
.
🔔 Various annotation objects are placed under
Packages/com.github.homuler.mediapipe/Runtime/Objects
. Choose the appropriate one depending on the output type.
Finally, attach MultiFaceLandmarkListAnnotationController
to Annotation Layer
and set the child Multi FaceLandmarkList Annotation
to the Annotation
attribute.
Now you can render annotations using MultiFaceLandmarkListAnnotationController
.
// [SerializeField] private MultiFaceLandmarkListAnnotationController _multiFaceLandmarksAnnotationController;
_multiFaceLandmarksAnnotationController.DrawNow(multiFaceLandmarks);
It is no longer necessary to view the MediaPipe output image.
After refactoring, the code should look like this.
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.UI;
using Stopwatch = System.Diagnostics.Stopwatch;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
[SerializeField] private MultiFaceLandmarkListAnnotationController _multiFaceLandmarksAnnotationController;
private CalculatorGraph _graph;
private OutputStream<List<NormalizedLandmarkList>> _multiFaceLandmarksStream;
private ResourceManager _resourceManager;
private WebCamTexture _webCamTexture;
private Texture2D _inputTexture;
private Color32[] _inputPixelData;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_inputPixelData = new Color32[_width * _height];
_screen.texture = _webCamTexture;
_resourceManager = new LocalResourceManager();
yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");
var stopwatch = new Stopwatch();
_graph = new CalculatorGraph(_configAsset.text);
_multiFaceLandmarksStream = new OutputStream<List<NormalizedLandmarkList>>(_graph, "multi_face_landmarks");
_multiFaceLandmarksStream.StartPolling();
_graph.StartRun();
stopwatch.Start();
var screenRect = _screen.GetComponent<RectTransform>().rect;
while (true)
{
_inputTexture.SetPixels32(_webCamTexture.GetPixels32(_inputPixelData));
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
_graph.AddPacketToInputStream("input_video", Packet.CreateImageFrameAt(imageFrame, currentTimestamp));
var task = _multiFaceLandmarksStream.WaitNextAsync();
yield return new WaitUntil(() => task.IsCompleted);
var result = task.Result;
if (!result.ok)
{
throw new Exception("Something went wrong");
}
var multiFaceLandmarksPacket = result.packet;
if (multiFaceLandmarksPacket != null)
{
var multiFaceLandmarks = multiFaceLandmarksPacket.Get(NormalizedLandmarkList.Parser);
_multiFaceLandmarksAnnotationController.DrawNow(multiFaceLandmarks);
}
else
{
_multiFaceLandmarksAnnotationController.DrawNow(null);
}
}
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
_multiFaceLandmarksStream?.Dispose();
_multiFaceLandmarksStream = null;
if (_graph != null)
{
try
{
_graph.CloseInputStream("input_video");
_graph.WaitUntilDone();
}
finally
{
_graph.Dispose();
_graph = null;
}
}
}
}
}
By default, the plugin outputs INFO level logs.
You can change the log level at runtime.
Mediapipe.Unity.Logger.MinLogLevel = Logger.LogLevel.Debug;
MediaPipe uses Google Logging Library(glog) internally.
You can configure it by setting flags.
Glog.Logtostderr = true; // when true, log will be output to `Editor.log` / `Player.log`
Glog.Minloglevel = 0; // output INFO logs
Glog.V = 3; // output more verbose logs
To enable those flags, call Glog.Initialize
.
Glog.Initialize("MediaPipeUnityPlugin");
☠️
Glog.Initialize
can be called only once. The second call will crash your application or UnityEditor.
💡 If you look closely at
Editor.log
, you will notice the following warning log output.\WARNING: Logging before InitGoogleLogging() is written to STDERRTo suppress it, you need to call
Glog.Initialize
.
However, without settingtrue
toGlog.Logtostderr
, glog won't output toEditor.log
/Player.log
.
In this section, we used ImageFrame#TryReadPixelData
and Texture2D#SetPixels32
to render the output image.
Using Texture2D#LoadRawTextureData
, we can do it a little faster.
_outputTexture.LoadRawTextureData(outputVideo.MutablePixelData(), outputVideo.PixelDataSize());
_outputTexture.Apply();
In this case, the orientation of the image must be adjusted as well.
⚠️ To test the code in this section, you need to build native libraries with GPU enabled.
:note: The code in this section is based on Official Solution (CPU) - Get ImageFrame.
💡 See GPU Compute for more details.
When you built native libraries with GPU enabled, you need to initialize GPU resources before running the CalculatorGraph
.
yield return GpuManager.Initialize();
if (!GpuManager.IsInitialized)
{
throw new System.Exception("Failed to initialize GPU resources");
}
_graph = new CalculatorGraph(_configAsset.text);
_graph.SetGpuResources(GpuManager.GpuResources);
Note that you need to dispose of GPU resources when the program exits.
private void OnDestroy()
{
GpuManager.Shutdown();
}
The rest is the same as in the case of the CPU.
Here is a sample code.
Before running, don't forget to set face_mesh_desktop_live_gpu.txt
to _configAsset
.
🔔
face_mesh_desktop_live_gpu.pbtxt
is saved asTutorial/Official Solution/face_mesh_desktop_live_gpu.txt
.
using System.Collections;
using UnityEngine;
using UnityEngine.UI;
using Stopwatch = System.Diagnostics.Stopwatch;
namespace Mediapipe.Unity.Tutorial
{
public class FaceMesh : MonoBehaviour
{
[SerializeField] private TextAsset _configAsset;
[SerializeField] private RawImage _screen;
[SerializeField] private int _width;
[SerializeField] private int _height;
[SerializeField] private int _fps;
private CalculatorGraph _graph;
private OutputStream<ImageFrame> _outputVideoStream;
private ResourceManager _resourceManager;
private WebCamTexture _webCamTexture;
private Texture2D _inputTexture;
private Color32[] _inputPixelData;
private Texture2D _outputTexture;
private Color32[] _outputPixelData;
private IEnumerator Start()
{
if (WebCamTexture.devices.Length == 0)
{
throw new System.Exception("Web Camera devices are not found");
}
var webCamDevice = WebCamTexture.devices[0];
_webCamTexture = new WebCamTexture(webCamDevice.name, _width, _height, _fps);
_webCamTexture.Play();
yield return new WaitUntil(() => _webCamTexture.width > 16);
yield return GpuManager.Initialize();
if (!GpuManager.IsInitialized)
{
throw new System.Exception("Failed to initialize GPU resources");
}
_screen.rectTransform.sizeDelta = new Vector2(_width, _height);
_inputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_inputPixelData = new Color32[_width * _height];
_outputTexture = new Texture2D(_width, _height, TextureFormat.RGBA32, false);
_outputPixelData = new Color32[_width * _height];
_screen.texture = _outputTexture;
_resourceManager = new LocalResourceManager();
yield return _resourceManager.PrepareAssetAsync("face_detection_short_range.bytes");
yield return _resourceManager.PrepareAssetAsync("face_landmark_with_attention.bytes");
var stopwatch = new Stopwatch();
_graph = new CalculatorGraph(_configAsset.text);
_graph.SetGpuResources(GpuManager.GpuResources);
_outputVideoStream = new OutputStream<ImageFrame>(_graph, "output_video");
_outputVideoStream.StartPolling();
_graph.StartRun();
stopwatch.Start();
while (true)
{
_inputTexture.SetPixels32(_webCamTexture.GetPixels32(_inputPixelData));
var imageFrame = new ImageFrame(ImageFormat.Types.Format.Srgba, _width, _height, _width * 4, _inputTexture.GetRawTextureData<byte>());
var currentTimestamp = stopwatch.ElapsedTicks / (System.TimeSpan.TicksPerMillisecond / 1000);
_graph.AddPacketToInputStream("input_video", Packet.CreateImageFrameAt(imageFrame, currentTimestamp));
var task = _outputVideoStream.WaitNextAsync();
yield return new WaitUntil(() => task.IsCompleted);
if (!task.Result.ok)
{
throw new Exception("Something went wrong");
}
var outputPacket = task.Result.packet;
if (outputPacket != null)
{
var outputVideo = outputPacket.Get();
if (outputVideo.TryReadPixelData(outputPixelData))
{
outputTexture.SetPixels32(outputPixelData);
outputTexture.Apply();
}
}
}
}
private void OnDestroy()
{
if (_webCamTexture != null)
{
_webCamTexture.Stop();
}
_outputVideoStream?.Dispose();
_outputVideoStream = null;
if (_graph != null)
{
try
{
_graph.CloseInputStream("input_video");
_graph.WaitUntilDone();
}
finally
{
_graph.Dispose();
_graph = null;
}
}
GpuManager.Shutdown();
}
}
}
Let's try to run the sample scene.
MediaPipeException: INVALID_ARGUMENT: Graph has errors:
Packet type mismatch on calculator outputting to stream "input_video": The Packet stores "mediapipe::ImageFrame", but "mediapipe::GpuBuffer" was requested.
at Mediapipe.Status.AssertOk () [0x00014] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Packages/com.github.homuler.mediapipe/Runtime/Scripts/Framework/Port/Status.cs:149
at Mediapipe.Unity.Tutorial.FaceMesh+<Start>d__12.MoveNext () [0x00281] in /home/homuler/Development/unity/MediaPipeUnityPlugin/Assets/MediaPipeUnity/Tutorial/Official Solution/FaceMesh.cs:67
at UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) [0x00020] in /home/bokken/buildslave/unity/buil
d/Runtime/Export/Scripting/Coroutines.cs:17
It seems that the official solution graph expects mediapipe::GpuBuffer
, but we're putting mediapipe::ImageFrame
to the input stream.
So let's convert the input using ImageFrameToGpuBufferCalculator
.
node: {
calculator: "ImageFrameToGpuBufferCalculator"
input_stream: "throttled_input_video"
output_stream: "throttled_input_video_gpu"
}
We also need to convert the output from mediapipe::GpuBuffer
to mediapipe::ImageFrame
.
node: {
calculator: "GpuBufferToImageFrameCalculator"
input_stream: "output_video_gpu"
output_stream: "output_video"
}
🔔 You need to change some
Calculator
inputs and outputs as well, but this is left as an exercise.
If everything is fine, the result would be like this.
See Coordinate System for information on how to correct the orientation of the image.