Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Create JSON files for frontend consumption #29

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dhimmel
Copy link
Member

@dhimmel dhimmel commented Oct 5, 2016

Work in progress (WIP).

@dhimmel
Copy link
Member Author

dhimmel commented Oct 5, 2016

This pull request creates a mapping from gene (as Entrez GeneIDs) to the list of mutated samples (as TCGA sample IDs). This dictionary/JSON obejct is called gene_to_mutated_samples. As a JSON text file, it was 20.68 MB and 2.97 MB when gzip compressed.

This pull request also creates disease_to_samples, a dictionary of disease acronym to sample ID. This files is small (0.17 MB) and thus not a concern.

The goal of disease_to_samples and gene_to_mutated_samples was to allow the frontend to load these entire objects and then perform efficient set operations to get sample/positive/negative counts. For example, the user may have selected diseases = {'GBM', 'COAD', 'LUNG'} and mutations = {2641, 340024}.

The frontend would do the equivalent of this python in javascript:

mutated_samples = set()
for mutation in mutations:
    mutated_samples |= gene_to_mutated_samples[mutation]

selected_samples = set()
for disease in diseases:
    selected_samples |= disease_to_samples[disease]

# counts
n_samples = len(selected_samples)
n_positives = len(selected_samples & mutated_samples)
n_negatives = n_samples - n_positives

Alerting @bdolly, @awm33, @cgreene for discussion on how to proceed.

My questions are:

  • Is 20.68 MB too big to pass to a browser?
  • Will the payload be compressed in transit?
  • Will the payload be cached?
  • Will this consume too much browser memory (RAM)?
  • Should we switch to an int ID for samples to cut down this size?
  • Or should we just have the frontend query the backend for these stats?

@awm33
Copy link
Member

awm33 commented Oct 6, 2016

Is 20.68 MB too big to pass to a browser?

Depends, I assume most people will be using this from a desktop with Wifi or a wired connection. So, from a pure transmitting bytes standpoint alone, no.

Will the payload be compressed in transit?

We can / should set up gzip compression on the server

Will the payload be cached?

If the correct headers are set by the server, yes. Other methods could be used as well, beyond HTTP caching, like localStorage.

Will this consume too much browser memory (RAM)?

Maybe. I'd be more worried about the access time. JavaScript is single threaded, if we were to calculate something like this client-side, I would use a web worker.

Should we switch to an int ID for samples to cut down this size?

It's the access performance, which should be hashmaps in JS, I don't think that would buy you much, if anything.

Or should we just have the frontend query the backend for these stats?

I would lean towards this for performance and API reasons. If we are also thinking of others using our API, this would make it easier for them. We're already using the django filter plugin which allows for querying on related model fields. This would be added to the /samples endpoint. We may want to use the field selection plugin to limit how much data is returned, assuming you just need the ids.

@bdolly
Copy link
Member

bdolly commented Oct 7, 2016

@awm33 so I like the idea of using the field selection plugin to do this with rather than a large json file on app load. I think firing off small request on user keystroke doing search will be effecient as the plugin will return smaller faster responses

@awm33
Copy link
Member

awm33 commented Oct 9, 2016

@bdolly Cool

I created an issue/task for this cognoma/core-service#33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants