I am using antibody.linuxgccrelease aplication in order to model camiled heavy chain only antibodies.
when using the command:
antibody.linuxgccrelease -exclude_homologs true -vhh_only -fasta my_fasta.fa | tee grafting.log
for example with my_fasta.fa:
>heavy QVQLQESGPSLVKPSQTLSLTCTVSGLSLSDHNVGWIRQAPGKALEWLGVIYKEGDKDY NPALKSRLSITKDNSKSQVSLSLSSVTTEDTATYYCATLGCYFVEGVGYDCTYGLQHTTF HDAWGQGLLVTVSS
I often get models that don't have the same sequence as the input fasta, mainly missing the last
Serine (like when using my_fasta.fa example) , but somtimes with additional amino acids in the begining or some missing amino acids in the end.
the weirdest result I got is when running with the following fasta file:
>heavy QVQLVQSGAEVVKPGASVKVSCKASGYAFSSSWMNWVRQAPGQGLEWIGRIYPGDGDTN YAQKFQGKATLTADKSTSTAYMELSSLRSEDTAVYFCAREYDEAYWGQGTLVTVSS
I got models with the following sequence:
(the first two amino acids and the last amino acids are missing and an extra Q was added).
this problem creates errors when trying to compute the rmsd, for example in the CDR3 modeling stage with in:file:native (with antibody_H3.linuxgccrelease I get an error because the sequences are different).
Is there any way to solve this issue?
Really appreciate the help!
Thanks for reaching out. A change in sequence should definitely not happen.
We just looked into this and found it is due to the template structure selected for your heavy framework region (PDB 1RHH). This structure has an additional residue (glutamine) in the very beginning which is numbered as residue 6A and does not get replaced with your input sequence because our code cannot handle these insertions. They are very rare and we haven't noticed this so far.
We are working on updating the database to prevent this behavior in the future, but for now your easiest solution would be to exclude that specific PDB using the -antibody:exclude_pdb flag.
It will also work fine if you turn the exclude_homologs flag of, because a different template will be selected, but I am assuming you need that flag for your purposes.
Hope this helps.
Thanks for the quick response!
I will mention that this happens for me every time I try to model only the heavy chain of a traditional antibody (it happens for me in all of the 47 sequences I tried).
as I mentioned before, the output model is usually missing the last amino acid and adds/removes some amino acids in the begining of the sequence.
when modeling real nanobodies this problem doesn't seem to happen.
I thought that when using the -vhh flag, the program can also handle modeling only the heavy chain of traditional antibodies.... so maybe this is the problem.
Thanks again for the help!
For your 47 sequences, do you get an insertion every time? I am not concerned about additions or deletions at the termini. Rosetta just adjusts your sequence to correspond to an Fv region. If you give Rosetta an entire IgG sequence (including the constant domains), it will also trim away everything that is not the Fv. The insertion you got on the other hand should not happen as it is altering the sequence you provided.
yes, insertion or deletions (or both) at the termini.
the only one with an insertion in the middle of the sequence is the one I sent above.