Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with BED Files #12

Open
poddarharsh15 opened this issue Sep 26, 2024 · 7 comments
Open

Issues with BED Files #12

poddarharsh15 opened this issue Sep 26, 2024 · 7 comments

Comments

@poddarharsh15
Copy link

HI @Karenxzr

I'm experiencing issues with BED files when running the PhenoSV module, as illustrated in the attached errors. The errors are from bed files format probably, and I am unable to resolve them.

Could you please take a look and suggest possible solutions?

Thank you for your assistance!

python3 phenosv/model/phenosv.py --sv_file ~/structural_varinats/merged_vcfs/output.bed --target_folder test1/ --target_file_name Final_out

Traceback (most recent call last):
File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in
main()
File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 150, in main
pred = of.phenosv(None, None, None, None, sv_df, annotation_path, model, elements_path, feature_files, scaler_file,
File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/../model/operation_function.py", line 552, in phenosv
if sv.shape[1]==5:
AttributeError: 'NoneType' object has no attribute 'shape'

output.zip

@poddarharsh15
Copy link
Author

poddarharsh15 commented Sep 26, 2024

Hi @Karenxzr
I have tried several times with .csv format also please have a look, but i am still getting the same errors
output.csv

python3 phenosv/model/phenosv.py --sv_file ~/structural_varinats/merged_vcfs/output.csv --target_folder test1/ --target_file_name Final_out

Traceback (most recent call last):
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in <module>
    main()
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 150, in main
    pred = of.phenosv(None, None, None, None, sv_df, annotation_path, model, elements_path, feature_files, scaler_file,
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/../model/operation_function.py", line 552, in phenosv
    if sv.shape[1]==5:
AttributeError: 'NoneType' object has no attribute 'shape'

@Karenxzr
Copy link
Collaborator

Hi, I tested top 20 lines of your output.csv file and worked fine. please use absolute path for the path of --sv_file. It seems PhenoSV did not read your input data correctly.

python3 phenosv/model/phenosv.py --sv_file /Users/zhuoranx/Documents/ResearchProject/PhenoSV/PhenoSV/data/test2.csv --target_folder /Users/zhuoranx/Documents/ResearchProject/PhenoSV/PhenoSV/data --target_file_name test_out

@poddarharsh15
Copy link
Author

-target_folder /Users/zhuoranx/Documents/ResearchProject/PhenoSV/PhenoSV/dat

Hi @Karenxzr thank you for your fast response I have tried several runs using absolute path still gives the same errors please have a look :(( Do I need to use pip install .
python3 phenosv/model/phenosv.py --sv_file /home/tigem/h.poddar/structural_varinats/PhenoSV/data/output.csv --target_folder /home/tigem/h.poddar/structural_varinats/PhenoSV/data/ --target_file_name test1

Traceback (most recent call last):
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in <module>
    main()
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 150, in main
    pred = of.phenosv(None, None, None, None, sv_df, annotation_path, model, elements_path, feature_files, scaler_file,
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/../model/operation_function.py", line 552, in phenosv
    if sv.shape[1]==5:
AttributeError: 'NoneType' object has no attribute 'shape'

@poddarharsh15
Copy link
Author

UPDATE: I found that the issue only occurs when processing the entire output.csv file, which contains almost 13,000 structural variants (SVs). When working with a subset of 30 lines from the same file, everything functions correctly without any errors. It seems the problem arises when handling a larger dataset. Could you please advise on possible solutions to address this?

test_run results:-
test_out.csv

@Karenxzr
Copy link
Collaborator

Hi, as mentioned in the tutorial, you can actually split up the input csv file and run multiple small csv files simultaneously. An example is as below. You can just increase the number of 4 threads to like 32 or so.

bash phenosv/model/phenosv.sh 'path/to/sv/data.csv' 'folder/path/to/store/results' 4 'HP:0000707,HP:0007598'

In addition, the source code is here: https://github.com/WGLab/PhenoSV/blob/main/phenosv/model/phenosv.sh. If you use SLURM, you can split the input file as in the shell script and submit a job array.

One thing I am thinking is maybe there are some abnormal rows in your data caused this error. If you split the file, you might likely identify that observation.

@poddarharsh15
Copy link
Author

Hi @Karenxzr,
Do you have any suggestions for converting VCF files to CSV or BED formats? Currently, I am using vcf2bed to convert VCF files to BED format and then manipulating the data to create a CSV file, as shown in the sample data. Any advice or alternative approaches would be greatly appreciated.
Thank you!

@poddarharsh15
Copy link
Author

I have identified the issue with my input.csv file, which contained some unrecognized SVTYPE [i.e, ACGGGGCAGGGAGGGCCCCTCTAGAAGCCACCTGTGCAGAC like this ] entries. After removing those and ensuring the CSV file only includes known SVTYPE, I am still encountering an error. Could you please suggest some ideas or solutions for this issue?
PS: However the PhenoSV runs after emitting this error and generates a csv output with results, Please the csv file for reference.

combined.csv.out.csv

Thank you in advance for your help!

command applied using SLURM

eval "$(conda shell.bash hook)"
conda activate phenosv

CONFIG_FILE="/home/tigem/h.poddar/structural_varinats/PhenoSV/input_files.txt"
TARGET_FOLDER="/home/tigem/h.poddar/structural_varinats/PhenoSV/final_test"
phenosvsh="/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.sh"
THREADS=64                                       
mapfile -t INPUT_FILES < "$CONFIG_FILE"
SV_FILE="${INPUT_FILES[$SLURM_ARRAY_TASK_ID]}"


echo "Processing SV file: ${SV_FILE}"
echo "Target folder: ${TARGET_FOLDER}"

    bash "${phenosvsh}" "${SV_FILE}" "${TARGET_FOLDER}" "${THREADS}" 'HP:0000707,HP:0007598'

echo "PhenoSV processing completed for ${SV_FILE}!"

Traceback (most recent call last):
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 177, in <module>
    main()
  File "/net/192.168.120.240/home/tigem/h.poddar/structural_varinats/PhenoSV/phenosv/model/phenosv.py", line 122, in main
    sv_df.columns = ['CHR', 'START', 'END', 'ID', 'SVTYPE']
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/generic.py", line 5588, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/generic.py", line 769, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 214, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/home/tigem/h.poddar/miniconda3/envs/phenosv/lib/python3.10/site-packages/pandas/core/internals/base.py", line 69, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 1 elements, new values have 5 elements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants