-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to read remote bed.bgz #76
Comments
It seems that the issue only occurs when the file is remote. Downloading the entire file first does technically circumvent the issue, though this approach isn't ideal: URL <- "https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/E099_15_coreMarks_dense.bed.bgz"
tmp <- file.path(tempfile(),basename(URL))
dir.create(dirname(tmp),showWarnings = FALSE, recursive = TRUE)
utils::download.file(url = URL, destfile = tmp)
res <- rtracklayer::import(tmp) |
@lawremi @sanchit-saini, this error is still persisting. Do you know what might be happening here? |
@hpages I know you're not the primary maintainer for this repo, but I was wondering if you nevertheless have any ideas about this. Thank you! |
Hi @bschilder , I didn't investigate much but this is how
Major problem is that
Minor problem is that the last row has an unexpected
Compare this to what
So it looks to me that |
Hi @hpages thanks for taking a look at this, this definitely gives me some clues. I actually found the below approach does a decent job. Of course, this is just a temporary solution as it still has to download the whole file first, which is less efficient than making a query. But at least this approach consistently reads in all 265,577 rows , and can handle situations where there's some NAs. path <- "https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/E099_15_coreMarks_dense.bed.bgz"
tmp <- file.path(tempdir(), basename(path))
## Only download it if you have to
if(!file.exists(tmp)){
utils::download.file(url = path,
destfile = tmp)
}
dat <- data.table::fread(text=readLines(con = tmp))
gr <- GenomicRanges::makeGRangesFromDataFrame(df = dat, seqnames.field = "V1", start.field = "V2", end.field = "V3", keep.extra.columns = TRUE)
|
That doesn't solve the real problem which is that the remote file doesn't contain any NA so there shouldn't be one in the data.frame obtained by importing the file. One workaround is to not try to import the remote file directly, but to download it to a temporary file first, like you did manually, except that this would need to be baked in FWIW I adopted that approach in |
Ah ok, I see what you mean. Very strange that the NA gets introduced somehow.
Nice! Thanks for sharing this. @sanchit-saini @lawremi something along these lines would be a great feature for |
Hello,
I'm trying to query subsets of Roadmap bed.gz files. However there seems to be some issues here:
Reprex
Code
Console output
Traceback
Session info
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] utf8_1.2.2 reticulate_1.25 R.utils_2.12.0
[4] tidyselect_1.1.2 RSQLite_2.2.16 AnnotationDbi_1.58.0
[7] htmlwidgets_1.5.4 grid_4.2.1 BiocParallel_1.30.3
[10] XGR_1.1.8 munsell_0.5.0 codetools_0.2-18
[13] interp_1.1-3 DT_0.24 colorspace_2.0-3
[16] OrganismDbi_1.38.1 Biobase_2.56.0 filelock_1.0.2
[19] knitr_1.40 supraHex_1.34.0 rstudioapi_0.14
[22] stats4_4.2.1 DescTools_0.99.45 MatrixGenerics_1.8.1
[25] GenomeInfoDbData_1.2.8 bit64_4.0.5 echoconda_0.99.7
[28] basilisk_1.8.1 vctrs_0.4.1 generics_0.1.3
[31] xfun_0.32 biovizBase_1.44.0 BiocFileCache_2.4.0
[34] R6_2.5.1 GenomeInfoDb_1.32.3 AnnotationFilter_1.20.0
[37] bitops_1.0-7 cachem_1.0.6 reshape_0.8.9
[40] DelayedArray_0.22.0 assertthat_0.2.1 BiocIO_1.6.0
[43] scales_1.2.1 nnet_7.3-17 rootSolve_1.8.2.3
[46] gtable_0.3.0 lmom_2.9 ggbio_1.44.1
[49] ensembldb_2.20.2 rlang_1.0.4 echodata_0.99.12
[52] splines_4.2.1 lazyeval_0.2.2 rtracklayer_1.57.0
[55] dichromat_2.0-0.1 hexbin_1.28.2 checkmate_2.1.0
[58] BiocManager_1.30.18 yaml_2.3.5 reshape2_1.4.4
[61] GenomicFeatures_1.48.3 ggnetwork_0.5.10 backports_1.4.1
[64] Hmisc_4.7-1 RBGL_1.72.0 tools_4.2.1
[67] ggplot2_3.3.6 ellipsis_0.3.2 RColorBrewer_1.1-3
[70] proxy_0.4-27 BiocGenerics_0.42.0 Rcpp_1.0.9
[73] plyr_1.8.7 base64enc_0.1-3 progress_1.2.2
[76] zlibbioc_1.42.0 purrr_0.3.4 RCurl_1.98-1.8
[79] basilisk.utils_1.8.0 prettyunits_1.1.1 rpart_4.1.16
[82] deldir_1.0-6 S4Vectors_0.34.0 SummarizedExperiment_1.26.1
[85] ggrepel_0.9.1 cluster_2.1.4 fs_1.5.2
[88] crul_1.2.0 magrittr_2.0.3 data.table_1.14.2
[91] echotabix_0.99.7 dnet_1.1.7 openxlsx_4.2.5
[94] mvtnorm_1.1-3 ProtGenerics_1.28.0 matrixStats_0.62.0
[97] patchwork_1.1.2 hms_1.1.2 evaluate_0.16
[100] XML_3.99-0.10 jpeg_0.1-9 readxl_1.4.1
[103] IRanges_2.30.1 gridExtra_2.3 compiler_4.2.1
[106] biomaRt_2.52.0 tibble_3.1.8 crayon_1.5.1
[109] R.oo_1.25.0 htmltools_0.5.3 echoannot_0.99.7
[112] tzdb_0.3.0 Formula_1.2-4 tidyr_1.2.0
[115] expm_0.999-6 Exact_3.1 DBI_1.1.3
[118] dbplyr_2.2.1 MASS_7.3-58.1 rappdirs_0.3.3
[121] boot_1.3-28 Matrix_1.4-1 readr_2.1.2
[124] piggyback_0.1.4 cli_3.3.0 R.methodsS3_1.8.2
[127] parallel_4.2.1 igraph_1.3.4 GenomicRanges_1.48.0
[130] pkgconfig_2.0.3 GenomicAlignments_1.32.1 dir.expiry_1.4.0
[133] RCircos_1.2.2 foreign_0.8-82 osfr_0.2.8
[136] xml2_1.3.3 XVector_0.36.0 stringr_1.4.1
[139] VariantAnnotation_1.42.1 digest_0.6.29 graph_1.74.0
[142] httpcode_0.3.0 Biostrings_2.64.1 rmarkdown_2.16
[145] cellranger_1.1.0 htmlTable_2.4.1 gld_2.6.5
[148] restfulr_0.0.15 curl_4.3.2 Rsamtools_2.12.0
[151] rjson_0.2.21 lifecycle_1.0.1 nlme_3.1-159
[154] jsonlite_1.8.0 BSgenome_1.64.0 fansi_1.0.3
[157] pillar_1.8.1 lattice_0.20-45 GGally_2.1.2
[160] KEGGREST_1.36.3 fastmap_1.1.0 httr_1.4.4
[163] survival_3.4-0 glue_1.6.2 zip_2.2.0
[166] png_0.1-7 bit_4.0.4 Rgraphviz_2.40.0
[169] class_7.3-20 stringi_1.7.8 blob_1.2.3
[172] latticeExtra_0.6-30 memoise_2.0.1 dplyr_1.0.9
[175] e1071_1.7-11 ape_5.6-2
The text was updated successfully, but these errors were encountered: