-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarifications about Cell Painting data #32
Comments
Feature values can indeed be negative. In fact they can have very different distributions at the single-cell level |
Thanks @shntnu! |
@sasgari Note that this is for single-cell level data of course. Aggregated or "psuedo-bulk" profiles will have a different distribution (they would have sampling distributions of the corresponding statistics e.g. mean or median) |
@jatinarora-upmc asked - what are Costes features? These are features used to measure the correlation between channels (in Cell Painting, each channel corresponds to one stain, except for the AGP channel, which corresponds to two stains). There are many methods to measure correlation between channels. The Costes' method evaluates the correlation in pixels below each threshold in the data, and then selects the threshold with the minimum correlation or highest threshold with a non-positive correlation (from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5200903/). |
@jatinarora-upmc asked - Nucleus is identified using DNA channel, but cell is identified using nucleus and cytoplasmic RNA channel. I wonder why cells are not identified using plasma membrane channel? From: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5223290/:
We did use AGP in the past but switched to RNA later. |
What do Cell Painting features mean? Learn more here. |
I assume you are looking at single cells? Because that won't hold at the aggregate level. For single cells data, I looked at the sample of 4994 cells in this repo, I found this anomaly is observed only once. Do you see that more often? If so, I can probe further. sampled_cells %>%
select(
Cells_Neighbors_NumberOfNeighbors_Adjacent,
Cells_Neighbors_PercentTouching_Adjacent
) %>%
filter(
xor(
Cells_Neighbors_NumberOfNeighbors_Adjacent == 0,
Cells_Neighbors_PercentTouching_Adjacent == 0
)
) %>%
pivot_longer(everything())
|
Actually, I averaged the data from single cells to donor level for each plate individually, and Cells_Neighbors_PercentTouching_Adjacent is non-0 for isolate cells on all plates. |
That's definitely odd, but I wonder if it might be something in your code? As you see below, that anomaly occurs only once in the 287 isolated cells (I can't explain that without more digging, but it is certainly is a rare event; < 0.5% in this sample) sampled_cells %>% tally()
|
You are right. I checked one plate, cmqtlpl261-2019, and it has ~22k isolate cells (Cells_Neighbors_NumberOfNeighbors_Adjacent == 0) and 145 cells with anomaly (Cells_Neighbors_PercentTouching_Adjacent != 0) present in all donor/cell lines. When I average the single cell level features to donor level, Cells_Neighbors_PercentTouching_Adjacent becomes non-0. So, all set for now. BTW, what is the reason for this anomaly? |
I think it's to do with their position. This one cell seems to lie on the edge of the image and something funky must be happening to the calculation of the percentage. You can safely ignore this case (i.e. consider sampled_cells %>%
filter(Cells_Neighbors_NumberOfNeighbors_Adjacent == 0) %>%
ggplot(aes(Cells_Neighbors_PercentTouching_Adjacent == 0,
Cells_Location_Center_X)) +
geom_boxplot() |
So based on this conversation, this is my guess-
Those cells DO have neighbors, but those neighbors are cells that are
ultimately excluded for touching the edge of the image, so the cell does
indeed have 1) some % of its border touching another cell but also 2) 0
"accepted" neighbors.
…On Fri, Jun 12, 2020 at 9:04 PM Shantanu Singh ***@***.***> wrote:
I think it's to do with their position. This one cell seems to lie on the
edge of the image and something funky must be happening to the calculation.
sampled_cells %>%
filter(Cells_Neighbors_NumberOfNeighbors_Adjacent == 0) %>%
ggplot(aes(Cells_Neighbors_PercentTouching_Adjacent == 0,
Cells_Location_Center_X)) +
geom_boxplot()
[image: image]
<https://user-images.githubusercontent.com/1210428/84556359-24568580-acf0-11ea-95ad-efcf150f07a8.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#32 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTI72ZZK4ZIMNNE7KRWY7TRWLGALANCNFSM4LVLEQLA>
.
--
Beth Cimini, PhD
Senior Computational Biologist, Imaging Platform
Broad Institute
415 Main St Room 5011
Cambridge, MA 02142
Current office number- (617) 714-8189
Pronouns - She/her/hers
I will sometimes send or respond to emails outside of my local office
hours, but I never expect responses outside of your local office hours.
|
Thanks @bethac07 Here's the measureobjectneighbors documentation for our reference.
@jatinarora-upmc Given that this is an edge case (literally as well!), it doesn't really matter how we handle it. But if you wanted to be really rigorous, you'd modify the definition of isolated to be |
Soumya had asked what Zernike features mean. Here are my quick notes that I sent via email. Briefly, these features represent subtle properties of shape, and the higher the index, the more nuanced the shape (e.g. Zernike 9 is more nuanced than Zernike 8). Less briefly: Take any cell below, and
Cells AreaShape Zernike 9 1 is a shape feature, so you have shape – a binary (0 or 1) 2D function – that needs to be decomposed into its components using the Zernike basis. CellProfiler does that for you and gives you the coefficients as shape features. Another intuition that's helpful is the regular notion of moments in stats: you can use higher-order moments to describe more nuanced aspects of a distribution; same thing with shape. Yet another (precise) intuition is that you are doing a power series expansion of a 2D function @AnneCarpenter's explanation from #63 (comment) Q2: Here is a guide to the Zernikes: https://en.wikipedia.org/wiki/File:Zernike_polynomials2.png |
Hi @shntnu , I was wondering if i could skip RadialDistribution features (all or some such as FractAd), as they show distribution of total intensity, but i can not decide since i don’t have much functional interpretation of these features. What would be your recommendation? |
|
This thread is to address general questions about Cell Painting data. Discuss dataset-specific and analysis-specific issues in a separate thread.
@sasgari asked:
cc @jatinarora-upmc
The text was updated successfully, but these errors were encountered: