You are here

how to cluster loop?

7 posts / 0 new
Last post
how to cluster loop?

I've build loop model based on Rosetta Loop protocol and I am going to cluster those loops based on cluster protocol. However, I found that the cluster protocol is designed for the whole protein and there is no option for clustering only a part of the protein.
So, I am wondering, is it possible only cluster a part of the protein?


Post Situation: 
Wed, 2011-01-05 19:48

Did you try using the cluster program anyway? Large identical sections shouldn't cause it to not function...

Thu, 2011-01-06 06:41

yes, it works but didn't work well because in the same cluster the results are largely different(eg: I use 3A for cutoff). So, it is wise to cluster them just focus on the loop region. Do you have any advices?

Thank you very much

Thu, 2011-01-06 17:34

I guess you could try extracting JUST the loop residues into PDBs with a script and clustering those.

If you have constant sequences, then the loop residues ought to be on constant line numbers, so you can use something like:

for each PDB
head -n $PDB | tail -n > looponly_$PDB

Then you can try clustering those. The new files will have the old file names embedded so you can translate them back...

If you have nonconstant sequences then you can do something slightly fancier with sed/awk/grep where you search for the right residue number in the right column - either way it should be a one-liner to extract the needed residues into their own PDB file.

Fri, 2011-01-07 06:58

Thank you very much for your so kind advices. I am still a little bit confused about this command:

for each PDB
head -1 $PDB | tail -100 > looponly_$PDB

I store above three lines in a file and then run it in the terminal, it is said:

./00: line 1: syntax error near unexpected token `PDB'
./00: line 1: `for each PDB'

I also tried the following command in terminal directly:

head -1 $PDB | tail -10 > looponly_$PDB

and it is said that:

PDB: Undefined variable.

How to solve this?

Sat, 2011-01-08 00:56

smlewis's script is a pseudo code, the PDB variable is undefined.
You can change your script like this:

#for example, my pdb files are in /home/SunH/my_decoys/
foreach PDB ( /home/SunH/my_decoys/* )
echo "Extract loop residues from $PDB"
#read loop residues from each pdb file and then write to a new pdb file
head -1 $PDB | tail -100 > "$PDB"_loop_only

Sat, 2011-01-08 04:15

Yeah...I use tcsh, but most of my lab uses bash, so I just left the loop in pseudocode so albumns could pick his/her favorite shell. Sorry for the confusion.

Sat, 2011-01-08 08:16