You are here

antibody.linuxgccrelease- output models don't have the same sequence as the input fasta

5 posts / 0 new
Last post
antibody.linuxgccrelease- output models don't have the same sequence as the input fasta
#1

hi,

I am using antibody.linuxgccrelease aplication in order to model camiled heavy chain only antibodies.

when using the command:

antibody.linuxgccrelease -exclude_homologs true -vhh_only -fasta my_fasta.fa | tee grafting.log

 

for example with my_fasta.fa:

>heavy
QVQLQESGPSLVKPSQTLSLTCTVSGLSLSDHNVGWIRQAPGKALEWLGVIYKEGDKDY
NPALKSRLSITKDNSKSQVSLSLSSVTTEDTATYYCATLGCYFVEGVGYDCTYGLQHTTF
HDAWGQGLLVTVSS

I often get models that don't have the same sequence as the input fasta,  mainly missing the last 

Serine (like when using my_fasta.fa example) , but somtimes with additional amino acids in the begining or some missing amino acids in the end.

the weirdest result I got is when running with the following fasta file:

>heavy
QVQLVQSGAEVVKPGASVKVSCKASGYAFSSSWMNWVRQAPGQGLEWIGRIYPGDGDTN
YAQKFQGKATLTADKSTSTAYMELSSLRSEDTAVYFCAREYDEAYWGQGTLVTVSS

I got models with the following sequence:

QLVQQSGAEVVKPGASVKVSCKASGYAFSSSWMNWVRQAPGQGLEWIGRIYPGDGDTN YAQKFQGKATLTADKSTSTAYMELSSLRSEDTAVYFCAREYDEAYWGQGTLVTVS

(the first two amino acids and the last amino acids are missing and an extra Q was added).

this problem creates errors when trying to compute the rmsd, for example in the CDR3 modeling stage with in:file:native (with antibody_H3.linuxgccrelease I get an error because the sequences are different).

 

Is there any way to solve this issue?

Really appreciate the help!

Thanks.

 

 

 

Post Situation: 
Thu, 2021-01-21 15:35
agctomer

Hi,

Thanks for reaching out. A change in sequence should definitely not happen.

We just looked into this and found it is due to the template structure selected for your heavy framework region (PDB 1RHH). This structure has an additional residue (glutamine) in the very beginning which is numbered as residue 6A and does not get replaced with your input sequence because our code cannot handle these insertions. They are very rare and we haven't noticed this so far.

We are working on updating the database to prevent this behavior in the future, but for now your easiest solution would be to exclude that specific PDB using the -antibody:exclude_pdb flag. 

It will also work fine if you turn the exclude_homologs flag of, because a different template will be selected, but I am assuming you need that flag for your purposes.

Hope this helps.

Tue, 2021-01-26 08:27
rahelf

hi,

Thanks for the quick response!

I will mention that this happens for me every time I try to model only the heavy chain of a traditional antibody (it happens for me in all of the 47 sequences I tried).

as I mentioned before, the output model is usually missing the last amino acid and adds/removes some amino acids in the begining of the sequence.

when modeling real nanobodies this problem doesn't seem to happen.

I thought that when using the -vhh flag, the program can also handle modeling only the heavy chain of traditional antibodies.... so maybe this is the problem.

Thanks again for the help!

Wed, 2021-01-27 01:26
agctomer

Hi,

For your 47 sequences, do you get an insertion every time? I am not concerned about additions or deletions at the termini. Rosetta just adjusts your sequence to correspond to an Fv region. If you give Rosetta an entire IgG sequence (including the constant domains), it will also trim away everything that is not the Fv. The insertion you got on the other hand should not happen as it is altering the sequence you provided.

Wed, 2021-01-27 11:13
rahelf

yes, insertion or deletions (or both) at the termini. 

the only one with an insertion in the middle of the sequence is the one I sent above.

 

Sun, 2021-01-31 01:06
agctomer