Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LUAD dataset as BBBC043 #52

Open
shntnu opened this issue Sep 7, 2018 · 19 comments
Open

Add LUAD dataset as BBBC043 #52

shntnu opened this issue Sep 7, 2018 · 19 comments

Comments

@shntnu
Copy link
Contributor

shntnu commented Sep 7, 2018

@jccaicedo Creating this issue as a placeholder so that we have an id for the LUAD dataset

@N3llz
Copy link
Contributor

N3llz commented Sep 7, 2018

BBBC041 is already taken by the malaria dataset! https://data.broadinstitute.org/bbbc/BBBC041/

so is 042 by another I'm working on.

Can you go with 043?

@shntnu shntnu changed the title Add LUAD dataset as BBBC041 Add LUAD dataset as BBBC043 Sep 7, 2018
@bledford87
Copy link
Contributor

@shntnu Did this dataset ever get finalized? It was never added to BBBC.

@shntnu
Copy link
Contributor Author

shntnu commented Aug 11, 2020

Not yet AFAIK. This is at least a few months out (it's ok @jccaicedo's plate)

@shntnu
Copy link
Contributor Author

shntnu commented May 27, 2022

@AnneCarpenter said:

I'm checking that the LUAD paper, now published, has its info up to date in our Imaging Platform Project Profiler and Image-Based Profiling Datasets

In the paper, we say image data and profiles are available here:
https://registry.opendata.aws/cell-painting-image-collection/ with the path: cytodata/datasets/LUAD- BBBC043-Caicedo/

So I added that to cell M21 "data_location_other" and I added the paper to the publications tab.

The fact that it has a BBBC number in that URL is interesting tho, because that number has been skipped at the BBBC site - do either of you know what's up?

@AnneCarpenter @jccaicedo

We set aside a BBBC identifier for it back in 2018 but didn't proceed to create a page at the time. The good news is that we've never said anywhere that the data is available on BBBC, but I know it's odd that we have an id for it but not a page.

Going forward, we will follow a different process for profiling datasets going forward (below)

@ErinWeisbart and I are making steady progress here https://github.com/orgs/broadinstitute/projects/27/views/3 and once we reach this awslabs/open-data-registry#1003, we will have settled on a process that will lead to a BBBC entry getting created for LUAD


---------- Forwarded message ---------
From: Shantanu Singh
Date: Fri, Nov 19, 2021 at 12:41 PM
Subject: Re: Question regarding potential BBBC contribution

Here is now the adapted version of the 4 steps (from my email to AWS); Beth will be looped in at step 3

1 C-S lab will add a row to https://broad.io/profiling_dataset
2 C-S lab will upload all components of the dataset to s3://cellpainting-gallery and update the RODA landing page with a RODA identifier (and possibly doi)
3 C-S lab will let C lab know that a new dataset is up on RODA, so they can update BBBC as appropriate, and refer to it by its RODA identifier
4 C-S lab will submit the dataset to IDR once we have a manuscript draft (that's their requirement), which will include pointers to the dataset in RODA

We will refer to the dataset on IDR if it exists, otherwise RODA

  • using the IDR identifier (e.g. idr0080) in the text and DOI (e.g. https://doi.org/10.17867/10000153) in the references
  • using the RODA identifier (tbd) and its DOI (also tbd) in the references

@shntnu
Copy link
Contributor Author

shntnu commented Jul 16, 2022

Our plan for managing Cell Painting Gallery has been settled! https://new.ipwiki.app/project_profiler_and_datasets

Profiling datasets will be listed only Cell Painting Gallery, and not on BBBC.

However, for a few datasets for which

  1. a BBBC entry already exists, or
  2. a BBBC identifier was created AND we referred to that identifier in a paper (like BBBC043 and BBBC047),

we should indeed create an entry in BBBC

@AnneCarpenter
Copy link

Great, I added the following to the https://new.ipwiki.app/project_profiler_and_datasets wiki page:

"Relationship to BBBC
BBBC has pages for some historical profiling datasets but will not be used for them in the future."

and then added your comment above to the page linked as "BBBC". Hope this was a good approach.

@bethac07
Copy link
Contributor

I thought we decided it was fine for BBBC to continue to point to Gallery data sets going forward, such that BBBC still has an ongoing record of good benchmark datat sets

@shntnu
Copy link
Contributor Author

shntnu commented Jul 16, 2022

That was the initial plan

#52 (comment)

C-S lab will let C lab know that a new dataset is up on RODA, so they can update BBBC as appropriate, and refer to it by its RODA identifier

Erin might recollect better, but I think we concluded it is wisest to avoid creating yet another identifier for a dataset and instead just point to RODA as a whole. I like that idea because it avoids redundancy.

@bethac07
Copy link
Contributor

From email thread "Re: Question regarding potential BBBC contribution"

Sounds good!
Here is now the adapted version of the 4 steps (from my email to AWS); Beth will be looped in at step 3
1 C-S lab will add a row to https://broad.io/profiling_dataset
2 C-S lab will upload all components of the dataset to s3://cellpainting-gallery and update the RODA landing page with a RODA identifier (and possibly doi)
3 C-S lab will let C lab know that a new dataset is up on RODA, so they can update BBBC as appropriate, and refer to it by its RODA identifier
4 C-S lab will submit the dataset to IDR once we have a manuscript draft (that's their requirement), which will include pointers to the dataset in RODA
We will refer to the dataset on IDR if it exists, otherwise RODA

  • using the IDR identifier (e.g. idr0080) in the text and DOI (e.g. https://doi.org/10.17867/10000153) in the references
  • using the RODA identifier (tbd) and its DOI (also tbd) in the references

@bethac07
Copy link
Contributor

As long as we aren't giving it a new identifier, and refer to it on the BBBC page as "cpg-whatever" instead of "BBBC-whatever", I don't see why we WOULDN'T more broadly advertise that these projects exist :)

@shntnu
Copy link
Contributor Author

shntnu commented Jul 16, 2022

As long as we aren't giving it a new identifier, and refer to it on the BBBC page as "cpg-whatever" instead of "BBBC-whatever", I don't see why we WOULDN'T more broadly advertise that these projects exist :)

You're right – I think I had implicitly assumed that such a plan would require creating a new BBBC identifier for each new CPG dataset, but (you're right) there is no need to

So you're saying you'd not only point to https://registry.opendata.aws/cellpainting-gallery/ (similar to the way we point to BBBC here, screenshot below) but also (selectively) list CPG datasets in the Profiling section of the BBBC index page?

Sounds good to me; worth getting Erin to sign off on it because she has thought through everything

image

@AnneCarpenter
Copy link

AnneCarpenter commented Jul 18, 2022 via email

@AnneCarpenter
Copy link

AnneCarpenter commented Oct 11, 2022 via email

@bethac07
Copy link
Contributor

bethac07 commented Oct 11, 2022

I like doing that because then we need not list everything in the gallery on BBBC, but for data sets we care a lot about I also like listing it as a "real" thing on BBBC, albeit fine to be with a CPG identifier rather than a BBBC one, because CPG is not super "browsable", esp for a biologist vs a CS person.

@shntnu
Copy link
Contributor Author

shntnu commented Jan 30, 2024

We should add a new entry to "Image-based Profiling" table

Accession Description Mode Fields per sample Total Fields Total Images Ground truth
BBBC043 Cell Painting overexpression profiles in human A459 cells of 325 lung adenocarcinoma-associated variants across 50 genes Fluorescent 6 55296 276480  B

@ErinWeisbart
Copy link
Member

I'm afraid I'm being dense in understanding what those columns are. What is the difference between total fields and total images?
This dataset is 16 plates of 384-well plates with 9 sites acquired per well.

@shntnu
Copy link
Contributor Author

shntnu commented Mar 8, 2024

I'm afraid I'm being dense in understanding what those columns are. What is the difference between total fields and total images? This dataset is 16 plates of 384-well plates with 9 sites acquired per well.

https://bbbc.broadinstitute.org/image_sets

total fields = ~ 384 * 9 * no. of plates
total images = ~ total fields * 5 (for cell painting, unless we have brightfield)

@ErinWeisbart
Copy link
Member

ha! right, number of channels. time for another cup of coffee...

for this dataset
fields per sample = 9
total fields = 55296
total images = 276480

@shntnu
Copy link
Contributor Author

shntnu commented Mar 8, 2024

for this dataset
fields per sample = 9
total fields = 55296
total images = 276480

Great

I've updated my previous comment. Now it's over to @bethac07 who can tag someone to update BBBC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants