Proposed fix for issue #47 by alex-selvita · Pull Request #48 · jvkersch/tmtools

alex-selvita · 2024-10-27T05:15:39Z

Added support for residues from extended IUPAC set and UNK (unknown) residues encodings from 3 letter to 1 letter sequence.

Fixes #47

Add supprot for SEC (Selenocysteine) and UNK (unknown) residues encodings

jvkersch · 2024-10-27T05:29:23Z


 import numpy as np

+protein_letters_3to1['UNK'] = 'X'


@alkorolyov-selvita Thanks for the fix! This does the job but has the side-effect of modifying the IUPACData dictionary, which "belongs to" BioPython (so that if this dictionary is used elsewhere in a user's code, the modification will also be visible). I would propose having an explicit fix: e.g. check for residue.resname == 'UNK' on the line below, and then insert X or the appropriate 1-letter code in the sequence:

tmtools/tmtools/io.py

Line 119 in 10dde1d

seq.append(protein_letters_3to1[residue.resname])

.

jvkersch

@alkorolyov-selvita Thanks for the fix. I've left one comment; could you also add a test to exercise the fix?

alex-selvita · 2024-10-27T08:13:24Z

Hello @jvkersch thanks for your comments, sure I will update the code and try to add some basic tests for new changes.

jvkersch · 2024-12-15T21:15:07Z

@alkorolyov-selvita Did you have time to look into this? I think there are two issues, which are easy to address.

The first is that (apparently) PDB uses uppercase for the residues, whereas the keys in protein_letters_3to1_extended are lowercase.

>>> from Bio.PDB.Polypeptide import protein_letters_3to1
>>> protein_letters_3to1
{'ALA': 'A', 'CYS': 'C', 'ASP': 'D', 'GLU': 'E', 'PHE': 'F', 'GLY': 'G', 'HIS': 'H', 'ILE': 'I', 'LYS': 'K', 'LEU': 'L', 'MET': 'M', 'ASN': 'N', 'PRO': 'P', 'GLN': 'Q', 'ARG': 'R', 'SER': 'S', 'THR': 'T', 'VAL': 'V', 'TRP': 'W', 'TYR': 'Y'}
>>> from Bio.Data.IUPACData import protein_letters_3to1_extended
>>> protein_letters_3to1_extended
{'Ala': 'A', 'Cys': 'C', 'Asp': 'D', 'Glu': 'E', 'Phe': 'F', 'Gly': 'G', 'His': 'H', 'Ile': 'I', 'Lys': 'K', 'Leu': 'L', 'Met': 'M', 'Asn': 'N', 'Pro': 'P', 'Gln': 'Q', 'Arg': 'R', 'Ser': 'S', 'Thr': 'T', 'Val': 'V', 'Trp': 'W', 'Tyr': 'Y', 'Asx': 'B', 'Xaa': 'X', 'Glx': 'Z', 'Xle': 'J', 'Sec': 'U', 'Pyl': 'O'}

The second is that the protein_letters_3to1_extended dictionary should not be modified by this code. There are a few workarounds: one is to create a copy of protein_letters_3to1_extended internal to this library and modify that. That would also allow you to do something about the capitalization:

_PROTEIN_LETTERS_3TO1_INTERNAL = {k.upper(): v for (k, v) in protein_letters_3to1.items()}
_PROTEIN_LETTERS_3TO1_INTERNAL["UNK"] = "X"

alex-selvita · 2024-12-18T08:34:44Z

Hello @jvkersch ,

Sorry for long reply, I was struggling a little bit to properly configure the dev environment. For the moment I just reimplemented the get_resiude_data() locally to include the necesseray residue codes. I will try to get back to this issue alter.

jvkersch · 2024-12-18T10:04:59Z

@alkorolyov-selvita Let me know if you have any questions about setting up a dev environment. Personally, I use an empty Conda environment and then I install the package in editable mode (pip install -e . -v). My editor picks up the environment, and the only thing I have to do manually is recompile the C++ code when it changes (but this should not affect you).

Update io.py

10dde1d

Add supprot for SEC (Selenocysteine) and UNK (unknown) residues encodings

jvkersch reviewed Oct 27, 2024

View reviewed changes

jvkersch requested changes Oct 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed fix for issue #47#48

Proposed fix for issue #47#48
alex-selvita wants to merge 1 commit into
jvkersch:mainfrom
alex-selvita:patch-1

alex-selvita commented Oct 27, 2024 •

edited by jvkersch

Loading

Uh oh!

jvkersch Oct 27, 2024

Uh oh!

jvkersch left a comment

Uh oh!

alex-selvita commented Oct 27, 2024

Uh oh!

jvkersch commented Dec 15, 2024

Uh oh!

alex-selvita commented Dec 18, 2024

Uh oh!

jvkersch commented Dec 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alex-selvita commented Oct 27, 2024 • edited by jvkersch Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvkersch Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

jvkersch left a comment

Choose a reason for hiding this comment

Uh oh!

alex-selvita commented Oct 27, 2024

Uh oh!

jvkersch commented Dec 15, 2024

Uh oh!

alex-selvita commented Dec 18, 2024

Uh oh!

jvkersch commented Dec 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-selvita commented Oct 27, 2024 •

edited by jvkersch

Loading