diff --git a/_posts/2024-09-30-aardvark.md b/_posts/2024-09-30-aardvark.md new file mode 100644 index 0000000..baec666 --- /dev/null +++ b/_posts/2024-09-30-aardvark.md @@ -0,0 +1,156 @@ +--- +layout: post +title: Lessons Learned from Visualizing Multimodal Data... with Aardvarks... +date: 2024-09-30 14:00:00 +categories: blog +type: blog +# Authors in the "database" can be used with just their person "key" +authors: + - lange +publication_key: 2024_vis_aardvark +# redirect_from: +# - "/event/2020/07/20/state-dashboards/" +# Use the abstract to provide a high-level overview of the blog post and main takeaways. +abstract: How do you analyze data with multiple modalities – say, images, trees and time-series? If you ask a visualization researcher, we will tell you visualizations are the solution, specifically composite visualizations! This blog shares some lessons we learned designing and developing composite visualizations to help understand cancer cell development. Regarding the aardvarks... this is clickbait, mostly. But we did get a best paper award for this project at IEEE VIS. +# Create a lead image that is <500k so that it shows up on twitter link preview +lead-image: /assets/images/posts/2024_aardvark_fight.jpeg +lead-image-alt-text: A cartoon anthropomorphized aardvark with boxing gear faces off against a gross cancer cell. The aardvark is determined to defeat the cancer cell. The cancer cell is afraid because it knows it is about to be whomped by the aardvark. +--- + +# Understanding cancer is the key to fighting it. + +Cancer is a terrible disease caused by your cells growing out of control. If we can understand exactly how cancer cells grow, move, and divide, we can develop strategies to prevent, diagnose, and treat cancer. + +A fundamental way to further our understanding is to collect data to represent how cancer cells are growing. It turns out that this is pretty hard because... + +# Cancer is complex — we need complex ✨ multimodal ✨ data to represent it. + +What is ✨ multimodal ✨ data? For us, it is data in different formats that represent different aspects of the same phenomenon. Specifically, we work with 🌌 **images**, 🌳 **trees**, and 📈 **time-series** data. + + + +## How is this data collected? + +The datasets we worked with are from live-cell microscopy imaging; in other words, 🌌 **images** of cancer cells are recorded over time as those cells grow and divide. + + +![Animation showing a sequence of cells growing, moving, and dividing over time.]({{site.base_url}}/assets/images/posts/2024_aardvark_imaging.gif){:#aardvark_imaging} + + + + +Algorithms can track individual cells over time based on their position and other characteristics, such as size. + +![Animation showing how a single cell is tracked across the sequence of images and grows larger.]({{site.base_url}}/assets/images/posts/2024_aardvark_tracking.gif){:#aardvark_tracking} + + +Then, derived attributes based on these images can be computed, such as the cell size or the amount of a specific protein in a cell. These attributes are calculated over time, resulting in 📈 **time-series** data. + +![Animation showing a single attribute increasing over time for a single tracked cell.]({{site.base_url}}/assets/images/posts/2024_aardvark_time-series.gif){:#aardvark_time-series} + + +During these experiments cells might divide into two daughter cells. If we record these divisions, we can construct a 🌳 **tree** of cell relationships or cell lineage. + +![animation that shows one cell dividing into two, and indicating that those two divide into four in a tree of divisions.]({{site.base_url}}/assets/images/posts/2024_aardvark_tree.gif){:#aardvark_tree} + + +Now that we have the data, we need to try to understand it. It turns out that... + +# Understanding multimodal data requires us to think about all the modalities together... and this is hard. + +Each of the data modalities (🌳🌌📈) is necessary because they capture a different aspect of how cancer cells develop: + +* The 🌌 **images** show the spatial relationships between cells. +* The 📈 **time-series** data shows how cells grow and change over time. +* The 🌳 **tree** captures how cell attributes propagate across generations. + +But one thing that quickly became clear was that to fully understand the phenomenon of interest (the spread of cancer cells), we needed to synthesize all three of these modalities together. + +Right now, researchers are synthesizing or combining data modalities manually. In other words, they look at an image, then at time-series data, then back at an image, then at a tree, and mentally link data elements together. In the best case, this is tedious and taxing. In the worst case, it is impossible to relate elements from one modality to another. + + + +So now what? Well, now is the part where the *visualization nerds* get to cheer and applaud as the hero of this story 🦸♀️ **visualizations** 🦸♂️ get to swoop in and save the day\! This is because... + +# 🍢 Composite visualizations 🍢 can show different data modalities together\! + + + +What are 🍢 composite visualizations🍢? In short, composite visualizations combine multiple visualizations together into the same view. *(If you want a longer/better answer, Javed and Elmqvist do a great job “[Exploring the design space of composite visualization](https://www.doi.org/10.1109/PacificVis.2012.6183556).”)* + +This is great for our purposes because we have three different data modalities that are each represented best by different visualizations, and we are trying to link the elements across these modalities. + +What’s the catch? Even though they are powerful... + +# Designing composite visualizations is... (you guessed it) ...hard. Here’s our approach in **three simple steps**. + +1. **Selected a primary data modality**. Even though all data modalities are needed, specific tasks prioritize certain modalities. +2. **Choose the best visual encoding** for it. This visualization serves as the host representation. +3. **Embed secondary data modalities** as client visualizations. + +Of course, it’s a bit more complicated than this, but we’ll get to that. First, let’s give an example. + +# Here’s one of the composite visualizations we designed (this one is my 🤗 favorite 🤗). + +1. First, we select the **tree** as the **primary data type** and, by extension, name this composite visualization the tree-first visualization. +2. We **encode** this **tree** as a **node-link diagram**. +3. Then, we **embed** **the time-series** data by nesting it within the nodes of the tree and **superimpose the images** of cells above the nodes either automatically or on demand. + +![An animation that shows a schematic of the tree first diagram, first with the primary data type shown and then with the secondary data types added.]({{site.base_url}}/assets/images/posts/2024_aardvark_tree-first.gif){:#aardvark_tree-first} + + +# Here are the other two composite visualizations\! + +We call these the time-series-first visualization and the image-first visualization. + + + +# What we learned. + +Ok, I have a confession. These **three simple steps** for constructing composite visualizations are hiding a lot of complexity. Specifically, step three is carrying a lot of weight here. In reality, this single step is a much more **iterative process.** + +The design space of composite visualizations can be intimidatingly large. You have to choose a visual encoding for each data type AND how those encodings get combined together. The choice of one affects the other, and there will be some back and forth while exploring this space. + + + +That said, we found that the first two steps help anchor this exploration and reduce the design space. Selecting a primary data type and prioritizing a good encoding for that data type will help ensure that tasks for that primary data type can be done effectively. With that settled, figuring out how the other pieces fit into the puzzle becomes much more manageable. + +# We also made a tool! + +We didn’t just design these visualizations; we implemented them into a tool\! It’s an excellent tool\! Our collaborators like it\! I like it\! You can [try it out yourself](https://aardvark.sci.utah.edu/)\! You can watch a video that demonstrates it\! + + +