-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDBx/mmCIF Reader/Topology Reader #2367
Comments
Hi @JoaoRodrigues , new formats are a great addition. Can you just create a PR and then we can comment directly on the code, run tests, etc? |
Sounds good. Will do. Thanks! |
Pure python implementation: https://github.com/Electrostatics/mmcif_pdbx |
Also, chemfiles reads mmCIF and can be used inside MDAnalysis, see Reading trajectories with chemfiles. |
Note on trying to use chemfiles: With the 2.0.0b, u = mda.Universe("1ake.cif.gz", topology_format="chemfiles") fails with
I can't just read it with
either as it gives the same ValueError. Just trying something stupid with the topology as a PDB also fails u = mda.Universe("1ake.pdb.gz", "1ake.cif.gz", format="chemfiles") with
Conclusion: I couldn't get it to work with chemfiles. Maybe @Luthaf has some ideas but a key problem seems to be that our (EDIT) chemfiles converter do not work for topologies —— ?? |
I think that topology reading is not registered with MDA, since the chemfiles adapter is implemented as a coordinate reader/writer. Although the conversion from chemfiles to MDA topologies is already implemented, so it should be mostly a question of adding a new
This is a bit strange and probably a bug, is this 1ake from the wwwPDB? |
Ok, I understand this part. The core of the issue is that to relatively different formats want to use the same extension: mmCIF and crystallography CIF. While they both use the same STAR format, they specify data in different ways. Unfortunately, chemfiles is associating the There is a simple workaround though, since you can specify the format to use manually, with something like u = mda.Universe("1ake.pdb.gz", "1ake.cif.gz", format="chemfiles", chemfiles_format="mmCIF / GZ") Unfortunately, this still fails in the case of 1AKE (but should work for other files). I'll fix the 1AKE issue, it should be working in the next patch release. I would also like to introduce a better format guessing functionality, to decide between mmCIF and crystallography CIF on the fly instead of having the use specify it manually. |
Yes, I downloaded the file directly from the PDB: sorry I didn’t try any others.
… Am 7/8/21 um 01:26 schrieb Guillaume Fraux ***@***.***>:
This is a bit strange and probably a bug, is this 1ake from the wwwPDB?
|
Hi everyone, Under read suggestions, I've successfully created a Universe with the following lines, in the conda env 'mda-workshop2021' with 2.0.0b version.
With these lines I can see atoms and chains and other attributes. But I've just realised that the attribute dimensions had not been created. When I try to run the command
I'm getting this AttributeError: Glad if you can help! |
@flautodipan does your mmCIF file contain a box? If I do the following with 1ake.cif import MDAnalysis as mda
from simtk.openmm.app import pdbxfile
structure = pdbxfile.PDBxFile("1ake.cif")
u = mda.Universe(structure) (EDIT: Note that you normally do not need to explicitly import converters/parsers, i.e., you don't need I can do a selection (note that I added a name for resname) >>> u.select_atoms("around 2.2 resname ARG")
<AtomGroup with 52 atoms> without issues. Check that you have the dimensions with >>> u.dimensions
array([73.200005, 79.8 , 85. , 90. , 90. , 90. ], dtype=float32) and verify that you have the corresponding elements in the mmCIF file
|
Ok now effectively it is working and it is much simpler. So to resume: using the OpenMM topology generated by the PDBxfile module in simtk.openmm.pdbxfile do create a MDA universe from mmCIF files. |
We have to improve our documentation to make clearer that users almost never need to do anything directly with the Readers and Converters. @lilyminium @IAlibay @fiona-naughton this might be something to keep in mind going forward. Maybe start with a Interoperability in practice blog post (collecting some of @jbarnoud 's examples from the workshop, too) and then see how we can make a User Guide entry from that? |
The solution proposed by @orbeckst using |
The mmCIF/PDBx format would also be needed for alphafold #3377 . |
does anyone have a solution for writing an MDAnalysis universe as PDBx/mmCIF? |
you could use |
I just tried
|
Sorry to cross-post on a different project, but could you share an input file? |
I wonder if it'd be ok to use IMO it's one of the best supported crystallography-related libraries, is maintained by ccp4 and globalphasing, and allows very detailed cif parsing and writing. For example, reading all atom ID and coordinates could be done as simple as: # read only ATOM groups of chain
model = gemmi.read_structure('6rz6.cif')[0]
residues = [res for res in model['A'] if res.het_flag == 'A'] # reading non-heteroatom atoms for simplicity
arr = np.array([[at.serial, *at.pos.tolist()] for res in residues for at in res])
ids, xyzs = arr[:, 0].astype(int), arr[:, 1:] the whole discussion with devs here |
@marinegor I think @richardjgowers has some ideas in this area |
@richardjgowers could you share them (here, if it's appropriate place, or somewhere on discord)? |
@marinegor I'd done this at a hackathon: #4303 The problem with this approach is that it doesn't do the "table join" on conect records that mmcif relies on, so you won't get some data (bonds). I've since done this: https://github.com/OpenFreeEnergy/pdbinf which does do the "table join" to get bonds. It's into rdkit format, but conversion is trivial... |
Hi all,
I've been using OpenMM to run simulations for the past couple of years. Unlike other tools, OpenMM does not write a 'topology' file (e.g.
.gro
,.prmtop
) but instead creates a Topology object on the fly when loading structures. Because simulation systems tend to be quite large, I've take to use PDBx/mmCIF files as my default file format for writing structures that I use as topologies.I'd like to start using MDAnalysis but right now this involves jumping through a bunch of hoops to get my topologies in a format that is parseable. It'd be much easier if I could just load a PDBx/mmCIF file as a topology, specially since it's now the default file format for structures in the PDB.
To this end, I've started working on writing a simple
PDBxParser
class. I wouldn't mind extending it to a PDBxReader/Writer class but that is not necessarily a priority (specially the writer). Would this be an interesting feature to add in your opinion? See the code here, I modeled the class afterPDBParser
.Thanks for the great work with the library so far!
EDIT: Since I cannot label issues, I'm just editing the title of the issue for now to make it clear!
The text was updated successfully, but these errors were encountered: