Proposed fix for issue #47#48
Conversation
Add supprot for SEC (Selenocysteine) and UNK (unknown) residues encodings
|
|
||
| import numpy as np | ||
|
|
||
| protein_letters_3to1['UNK'] = 'X' |
There was a problem hiding this comment.
@alkorolyov-selvita Thanks for the fix! This does the job but has the side-effect of modifying the IUPACData dictionary, which "belongs to" BioPython (so that if this dictionary is used elsewhere in a user's code, the modification will also be visible). I would propose having an explicit fix: e.g. check for residue.resname == 'UNK' on the line below, and then insert X or the appropriate 1-letter code in the sequence:
Line 119 in 10dde1d
jvkersch
left a comment
There was a problem hiding this comment.
@alkorolyov-selvita Thanks for the fix. I've left one comment; could you also add a test to exercise the fix?
|
Hello @jvkersch thanks for your comments, sure I will update the code and try to add some basic tests for new changes. |
|
@alkorolyov-selvita Did you have time to look into this? I think there are two issues, which are easy to address. The first is that (apparently) PDB uses uppercase for the residues, whereas the keys in The second is that the _PROTEIN_LETTERS_3TO1_INTERNAL = {k.upper(): v for (k, v) in protein_letters_3to1.items()}
_PROTEIN_LETTERS_3TO1_INTERNAL["UNK"] = "X" |
|
Hello @jvkersch , Sorry for long reply, I was struggling a little bit to properly configure the dev environment. For the moment I just reimplemented the get_resiude_data() locally to include the necesseray residue codes. I will try to get back to this issue alter. |
|
@alkorolyov-selvita Let me know if you have any questions about setting up a dev environment. Personally, I use an empty Conda environment and then I install the package in editable mode ( |
Added support for residues from extended IUPAC set and UNK (unknown) residues encodings from 3 letter to 1 letter sequence.
Fixes #47