You are here

clustering_decoy

15 posts / 0 new
Last post
clustering_decoy
#1

hi..

how to identify the best decoy according to clustering tree.
if my protein is 176 aa, how many decoy should i generate

Thu, 2008-05-08 18:34
icb_bio

Ad 1) I usually do as following:

^find /full/path/to/decoys/folder/ -maxdepth 1 -name '*.pdb' -size +1 > list^
^/home/kosa/bin/compose_score_silent.py t228.out list^
^/home/kosa/bin/cluster_info_silent.out ../t228.out - t228 5,25,50,100 2,5^

and eventually re-running the last step with different options.

Ad 2) I think nobody knows the answer for this question ;-) You might try to cluster decoys after generating lets say 10 000 decoys, analyze the results and then if you do not see big clusters generate more decoys ... The results depend also on the particular protein, some of them "fold more easily" with rosetta (ex. those with simple topologies).

> hi..
>
> how to identify the best decoy according to clustering tree.
> if my protein is 176 aa, how many decoy should i generate

Fri, 2008-05-09 02:20
kosa

hi kosa,
how to identify big cluster according to the tree figure
do u have eg. for the big cluster

Sun, 2008-05-11 20:04
icb_bio

can u please go to this blog:
http://icbbio.blogspot.com/

what is your comment about this clustering tree

is there any good decoy to start protein modelling

Mon, 2008-05-12 00:38
icb_bio

Well, it does not look very good... Look at the rmsd values: they are pretty high (above 8A) and clusters are in the same time quite small. In the prefix.info file you should be able to find the exact rmsd cutoff which was used for the clustering. I would check the superposition of the decoys in five top clusters to see what 8A actually means for your protein. You should have the CA superpositions in files with names such as cluster00.015.pdb. If the decoys are virtually dissimilar you need to go down with the rmsd threshold (by manipulating cluster_info_silent.out options). But then of course you would get smaller clusters (and I would not be sure whether clusters with less then 10 members are meaningful at all...).

Maybe indeed you need to generate more decoys (how many you got so far?) or run rosetta with different options. Maybe it is possible to split your protein into two domains?

Frankly, I have never succeeded folding with rosetta a protein as long as yours. But certainly I do not use so big computational resources as some rosetta power users ;-)

>
>
>
> can u please go to this blog:
> http://icbbio.blogspot.com/
>
> what is your comment about this clustering tree
>
>
> is there any good decoy to start protein modelling

Mon, 2008-05-12 01:47
kosa

how do u know my decoy above 8A.
i am generate 9999 decoy
run cluster_info_silent.out with 5,15,45,75 3,4

what ur comment

> Well, it does not look very good... Look at the rmsd values: they are pretty high (above 8A) and clusters are in the same time quite small. In the prefix.info file you should be able to find the exact rmsd cutoff which was used for the clustering. I would check the superposition of the decoys in five top clusters to see what 8A actually means for your protein. You should have the CA superpositions in files with names such as cluster00.015.pdb. If the decoys are virtually dissimilar you need to go down with the rmsd threshold (by manipulating cluster_info_silent.out options). But then of course you would get smaller clusters (and I would not be sure whether clusters with less then 10 members are meaningful at all...).
>
> Maybe indeed you need to generate more decoys (how many you got so far?) or run rosetta with different options. Maybe it is possible to split your protein into two domains?
>
> Frankly, I have never succeeded folding with rosetta a protein as long as yours. But certainly I do not use so big computational resources as some rosetta power users ;-)
>
>
>
>
> >
> >
> >
> > can u please go to this blog:
> > http://icbbio.blogspot.com/
> >
> > what is your comment about this clustering tree
> >
> >
> > is there any good decoy to start protein modelling

Mon, 2008-05-12 18:57
icb_bio

> how do u know my decoy above 8A.

On the X axis of your clustering tree you have got rmsd values... You should be able to find the exact rmsd cutoff used for clustering in the *.info file you should have got in the clustering directory. It should be a line like:

TARGET: 100 THRESHOLD: 1.271307

just after the AC lines.

> i am generate 9999 decoy

For such a long protein it might be much too few. Especially if you get poor clustering.

> run cluster_info_silent.out with 5,15,45,75 3,4

With such options the program did not find any cluster with the size above 15 when using rmsd threshold between 3 and 4 A. So it increased the rmsd threshold until it got the top cluster of size 15.

Please post the AC lines from your *.info file so I could advice you how you should modify the clustering options.

And if you have computational power generate another 20000 decoys ;-) You can try different options/protocols or try to fold a homolog.

Tue, 2008-05-13 02:57
kosa

is this the AC lines?

COMMAND: ../../C/cluster_info_silent.out bbphad.out _ cluster1/tmp 5,15,45,75 3,4
AC: target 15 threshold 8.21 clusters 69 coverage 466
AC: target 16 threshold 8.21 clusters 69 coverage 466
AC: target 17 threshold 8.21 clusters 69 coverage 466
AC: target 18 threshold 8.37 clusters 88 coverage 616
AC: target 19 threshold 8.37 clusters 88 coverage 616
AC: target 20 threshold 8.37 clusters 88 coverage 616
AC: target 21 threshold 8.39 clusters 89 coverage 628
AC: target 22 threshold 8.40 clusters 91 coverage 639
AC: target 23 threshold 8.40 clusters 91 coverage 639
AC: target 24 threshold 8.40 clusters 91 coverage 639
AC: target 25 threshold 8.55 clusters 115 coverage 819
AC: target 27 threshold 8.55 clusters 115 coverage 819
AC: target 29 threshold 8.61 clusters 120 coverage 870
AC: target 31 threshold 8.64 clusters 126 coverage 916
AC: target 33 threshold 8.64 clusters 126 coverage 916
AC: target 35 threshold 8.65 clusters 128 coverage 931
AC: target 38 threshold 8.80 clusters 156 coverage 1192
AC: target 41 threshold 8.87 clusters 165 coverage 1295
AC: target 44 threshold 8.87 clusters 166 coverage 1305
AC: target 47 threshold 8.88 clusters 167 coverage 1311
AC: target 51 threshold 8.97 clusters 181 coverage 1453
AC: target 55 threshold 8.99 clusters 185 coverage 1493
AC: target 60 threshold 9.10 clusters 201 coverage 1700
AC: target 65 threshold 9.16 clusters 220 coverage 1867
AC: target 71 threshold 9.22 clusters 227 coverage 1994
TARGET: 15 THRESHOLD: 8.209858

what's ur comment?

> > how do u know my decoy above 8A.
>
> On the X axis of your clustering tree you have got rmsd values... You should be able to find the exact rmsd cutoff used for clustering in the *.info file you should have got in the clustering directory. It should be a line like:
>
> TARGET: 100 THRESHOLD: 1.271307
>
> just after the AC lines.
>
> > i am generate 9999 decoy
>
> For such a long protein it might be much too few. Especially if you get poor clustering.
>
> > run cluster_info_silent.out with 5,15,45,75 3,4
>
> With such options the program did not find any cluster with the size above 15 when using rmsd threshold between 3 and 4 A. So it increased the rmsd threshold until it got the top cluster of size 15.
>
> Please post the AC lines from your *.info file so I could advice you how you should modify the clustering options.
>
> And if you have computational power generate another 20000 decoys ;-) You can try different options/protocols or try to fold a homolog.
>
>
>
>

Tue, 2008-05-13 23:27
icb_bio

i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
> hi..
>
> how to identify the best decoy according to clustering tree.
> if my protein is 176 aa, how many decoy should i generate

Tue, 2008-05-13 23:31
icb_bio

Hi,

Yes, these are the AC lines.
So you can see that the program did clustering at 8.209858 rmsd threshold to get the top closter of size 15. AC line can show you what would happen with lower threshold but you need to specify lower bound for the minimum size of the top cluster (15 members at present).

Please to clustering with the following options:
5,5,15,20 3,4

however anyway you need to generate more decoys. And you should really open the files with names such as " cluster00.015.pdb" in any molecular viewer which can read CA traces (pymol, ramol) and see whether structures from a single cluster have the same fold at the given rmsd threshold.

> is this the AC lines?
>
> COMMAND: ../../C/cluster_info_silent.out bbphad.out _ cluster1/tmp 5,15,45,75 3,4
> AC: target 15 threshold 8.21 clusters 69 coverage 466
> AC: target 16 threshold 8.21 clusters 69 coverage 466
> AC: target 17 threshold 8.21 clusters 69 coverage 466
> AC: target 18 threshold 8.37 clusters 88 coverage 616
> AC: target 19 threshold 8.37 clusters 88 coverage 616
> AC: target 20 threshold 8.37 clusters 88 coverage 616
> AC: target 21 threshold 8.39 clusters 89 coverage 628
> AC: target 22 threshold 8.40 clusters 91 coverage 639
> AC: target 23 threshold 8.40 clusters 91 coverage 639
> AC: target 24 threshold 8.40 clusters 91 coverage 639
> AC: target 25 threshold 8.55 clusters 115 coverage 819
> AC: target 27 threshold 8.55 clusters 115 coverage 819
> AC: target 29 threshold 8.61 clusters 120 coverage 870
> AC: target 31 threshold 8.64 clusters 126 coverage 916
> AC: target 33 threshold 8.64 clusters 126 coverage 916
> AC: target 35 threshold 8.65 clusters 128 coverage 931
> AC: target 38 threshold 8.80 clusters 156 coverage 1192
> AC: target 41 threshold 8.87 clusters 165 coverage 1295
> AC: target 44 threshold 8.87 clusters 166 coverage 1305
> AC: target 47 threshold 8.88 clusters 167 coverage 1311
> AC: target 51 threshold 8.97 clusters 181 coverage 1453
> AC: target 55 threshold 8.99 clusters 185 coverage 1493
> AC: target 60 threshold 9.10 clusters 201 coverage 1700
> AC: target 65 threshold 9.16 clusters 220 coverage 1867
> AC: target 71 threshold 9.22 clusters 227 coverage 1994
> TARGET: 15 THRESHOLD: 8.209858
>
> what's ur comment?
>
>
> > > how do u know my decoy above 8A.
> >
> > On the X axis of your clustering tree you have got rmsd values... You should be able to find the exact rmsd cutoff used for clustering in the *.info file you should have got in the clustering directory. It should be a line like:
> >
> > TARGET: 100 THRESHOLD: 1.271307
> >
> > just after the AC lines.
> >
> > > i am generate 9999 decoy
> >
> > For such a long protein it might be much too few. Especially if you get poor clustering.
> >
> > > run cluster_info_silent.out with 5,15,45,75 3,4
> >
> > With such options the program did not find any cluster with the size above 15 when using rmsd threshold between 3 and 4 A. So it increased the rmsd threshold until it got the top cluster of size 15.
> >
> > Please post the AC lines from your *.info file so I could advice you how you should modify the clustering options.
> >
> > And if you have computational power generate another 20000 decoys ;-) You can try different options/protocols or try to fold a homolog.
> >
> >
> >
> >

Wed, 2008-05-14 01:19
kosa

Are you generating them in pdb or silent format?

> i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
> > hi..
> >
> > how to identify the best decoy according to clustering tree.
> > if my protein is 176 aa, how many decoy should i generate

Wed, 2008-05-14 01:20
kosa

i am generates in silent format

> Are you generating them in pdb or silent format?

>
> > i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
> > > hi..
> > >
> > > how to identify the best decoy according to clustering tree.
> > > if my protein is 176 aa, how many decoy should i generate

Wed, 2008-05-14 18:04
icb_bio

is it correct if i just using a domain which is conserved domain according to the domain prediction server. i just remove N and C terminal to make my protein smaller from 295 aa to 176 aa.
or any other suggestion

thanks

Hasni

Wed, 2008-05-14 19:59
icb_bio

I think you can just concatenate the files but I am not sure. I always generate decoys in pdb format becuase i) later I have a script that extracts decoys in clusters from the original set of pdb decoys, ii) I run rosetta on local clusters so I do not have to care about the size of the output data iii) I often process decoys by other means.

So: I do not have much experience with working with silent format. I will start separate post on this forum to ask how to work with that.

> i am generates in silent format
>
>
>
>
> > Are you generating them in pdb or silent format?
>
>
>
> >
> > > i am generated 3 x of 9999 decoy, did i need to merge them befor clustering analysis? if yes how?
> > > > hi..
> > > >
> > > > how to identify the best decoy according to clustering tree.
> > > > if my protein is 176 aa, how many decoy should i generate

Thu, 2008-05-15 02:51
kosa

Yes, it is correct if the conserved domain is independent folding unit (if the conserved domain can be found in different domain context in unrelated proteins it is very likely to be the case). However, if it closely interacts with other domains with big hydrophobic patch you might run into problems (rosetta might try to push the should-be-surface hydrophobic residues to the core).

In your case I think the problems are due to the protein length and many many more decoys as well as more sophisticated rosetta approaches might be needed (with no guarantee of success).

>
> is it correct if i just using a domain which is conserved domain according to the domain prediction server. i just remove N and C terminal to make my protein smaller from 295 aa to 176 aa.
> or any other suggestion
>
>
> thanks
>
> Hasni

Thu, 2008-05-15 03:34
kosa