When going through the rosetta tutorials, I see that the vall database is old, and I want to build new vall database.
could you please let me know how to efficiently build new vall database from PDB structures. When I run pdb2vall, it gives me this:
Usage: pdb2vall.py [options]
-h, --help show this help message and exit
-p PDB_FN five-letter code + chain pdb id eg. 2oxgZ
don't get structure_profile check point, use sequence
-d debug: print out some information
-n dry run: don't run the system( cmd )
I am finding it difficult to understand what options to choose for efficient vall database creation. Thanks!
VALL generation is a bit of a black art. It's not something most people within the Rosetta community have ever tried or gotten to work.
But if your only concern is that the VALL isn't up-to-date with the most recent PDB, I honestly wouldn't bother with regenerating it. Tests which have been done have actually showed that updating the VALL sometimes actually gives *worse* structure prediction results than with an older database.
The rationalization for this goes something like the following: The concept behind fragment-based structure prediction is that local fragment structures (e.g. 3-mers and 9-mers) are exhaustively sampled in the PDB. That is, while the full structure of intertest might not be in the PDB, if you chopped up the structure into fragments, each individual fragment would be represented in the PDB. This theory actually seems to hold, and (for 3-mers and 9-mers) has been true for over a decade now. So that 2011 VALL should contain all the same 3-mer and 9-mer structures that a hypothetical 2019 one would.
The reason why a new database is *worse* than the older one probably has to do with oversampling of structures. In order to get the VALL to be representative, you have to de-duplicate it for all the biases of crystallographers (where they crystalize hundreds of different mutants of lysozyme, for example). The PDB has grown a lot in the past decade (with much more oversampling of "easy" structures), and so the deduplication is much harder to do these days than it was in the past, making the statistical prevelence of fragments of potentially lower quality.
Now there's certainly potential rationales for regenerating the VALL. -- If you're looking for things larger than 9-mers, for instance, the 2011 version might not be exhaustive coverage yet, and updating may fill things out further. If you're looking at alternate ways of generating the VALL, or of calculating the statistics, or if you want to play around with using various experimental data during fragment prediction and need an expanded/different structure set. But if you just want to do standard 3-mer and 9-mer sampling using the standard fragment picking protocols, there's really no need to regenerate the VALL. The existing ones are sufficient for that purpose, even if they're not the newest and shiniest.