-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDB writer does not reorder atoms as required by PDB standard #3452
Comments
If I understand the problem properly here, as much as I like to aim for as much standard compliance as possible - I'm tempted to treat this as wontfix. There is no way for us to know what the sequence of a protein is at write time, nor that what you have actually counts as a standard protein (e.g. cyclic peptides). In my opinion, the best thing to do here is assuming that user supplied ordering is the right ordering? One potential thing we could do is validate for non-sequential resid? But then we'd have to capture chain changes etc... I'm not sure, is that the source of the error here @btodac ? |
I agree with @IAlibay , that’s out of scope. We should document it, though. |
It is not the protein sequence that is the issue but the order of the atom records within each residue. ATOM 1 N SER X 1 150.950 201.230 48.160 1.00 0.00 seg_ N I am currently making a fork and will implement a method for ensuring the correct order as the atom group should contain all the information required to produce the PDB standard format. |
We're happy to have a look at proposed code. By default we will not write out PDB files with changed atom order as this will destroy most people's workflows. We could consider a keyword arg that reorders on writing. That's something we can discuss on a PR. |
It might be better to write a function that does the juggling and returns
an AtomGroup. Then the writer can just write that juggled AG. More
transparent this way.
…On Thu, Nov 4, 2021 at 21:41, Oliver Beckstein ***@***.***> wrote:
We're happy to have a look at proposed code.
By default we will not write out PDB files with changed atom order as this
will destroy most people's workflows.
We could consider a keyword arg that reorders on writing. That's something
we can discuss on a PR.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3452 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACGSGB7IAKSHCCUN2HXMDXLUKL4ZXANCNFSM5HMA72RA>
.
|
Just to add:
Note that the atom ordering will always be determined by GROMACS (or CHARMM/NAMD, ...) depending on the topology (force field) building blocks. I wanted to make clear that MDAnalysis's primary purpose is to act as a layer between simulation formats and numpy arrays. It's not our job to question the ordering of atoms in the topology. We will adhere to existing format specs as well as possible. However, "the" PDB format in particular has been so badly abused by the simulation community that we cannot walk the high ground and insist on all aspects of the format specs. Instead we have to support what the majority of users use. MDAnalysis would not be the right tool to prepare a PDB for the Protein Databank (in any case, they don't want anyone to use the PDB format anymore anyway, and rightly so). We are a bit slow on supporting the recommended new format PDBx/mmcif but see #2367 for using converters. All of this is just to give you an idea why we are not super-enthusiastic about adding code to MDAnalysis to change the order of atoms on output. |
This was is my intention, but it could be provided as an option to the pdb writer. The reason this has come to my attention is that I need to process gromacs outputs into inputs for another package (HADDock). I was unaware of this atom ordering until my HADDock job failed. It is rather finicky and quite specialised but this issue may occur elsewhere too. I can't imagine implementing the correct order would create problems with other software as either they don't care about the ordering or they expect the PDB standard. |
To get this issue to an actionable conclusion:
Atom order is really important when the files are used as topologies for trajectories. Changing the order will break so many things. Therefore, we will not by default change the output order for PDB files. I labeled the issue as "documentation" so we will add a note to the PDB writer docs. If anyone is interested in providing code to robustly reorder atoms then we can consider it as a utility function and perhaps as an option for the PDBWriter. However, I'd consider this issue closed once the behavior is documented. |
@orbeckst Hi! I am having the same issue here. Was this ever solved? Is there a way to fix it using a Python script? I found an indirect way of reordering atoms in PDB files to the canonical non-GROMACS order of N, CA, C, O..., which is to load the file in Pymol, and then selecting "Export Molecule" This brings atoms back from N, CA, ..., C, O but it is rather tedious. |
Hi @hector-s-m. Given the discussion above and that there is no open PR against this issue, I'd assume it has not been "fixed" (see the discussion above for why I used quotes). However, since you mentioned that PyMol does the re-ordering you want, does PyMol's Python API work too? It will still be a bit annoying to load/save the file with PyMol, but having it within a script might make it a bit less tedious (it it works, I haven't tried...). Something like
|
Yeah... Thanks for looking into this. I had this idea earlier but for some reason, despite installing Pymol in my conda environment, I can't get it to work. :( |
@hector-s-m no immediate code changes are planned to address this issue and as you see from the discussion, we probably won't implement anything directly in the PDBWriter. We will document the behavior. We will consider a utility function for reordering atoms in a Universe. That could be generally useful and can then be used with other formats, too. Something like u = mda.Universe("gromacs.pdb")
p = u.select_atoms("protein")
p_reordered = reorder(p) # reorder() needs to be written by someone!
not_protein = u.atoms - p
u_reordered = mda.Merge(p_reordered, not_protein)
u_reordered.atoms.write("reordered.gro") # can write as a GRO or PDB or ... |
Expected behavior
The PDB format states:
Actual behavior
The writer just outputs in the order given in the atom group (only tested using proteins, but the writer is the same)
Code to reproduce the behavior
I imported a GROMACS trajectory, which I assume records the atoms in a different order
Current version of MDAnalysis
2.0.0
Python 3.9.7
Ubuntu
The text was updated successfully, but these errors were encountered: