You know how when you import PDB structures into Rosetta and relax them Rosetta adds the missing hydrogens? What is the algorithm that does that?
I want to know the mathematical aproach, not the code; does it use rotation vectors and a rotation matrix to find the orientation and position the hydrogens? Does it use another approach?
I beleive this same algorithm is used to add/replace sidechain? it finds the orientation and angles of the side chains and replaces them?
I have a backbone without any oxygens and I want to use this same algorithm to write a script that would add oxygens to a backbone.
You are correct that the same approach used to add missing hydrogens is the one used to add missing sidechains. The geometry of these missing atoms are encoded in the database via the params files (see Rosetta/main/database/chemical/residue_type_sets/fa_standard/residue_types/l-caa/ for the standard amino acids). The part of the params files of interest here are the ICOOR lines. These encode the position of the atom as internal coordinates (length, angle, dihedral) with respect to other atoms in the protein.
Take the C beta of alanine:
If a protein was missing the alanine sidechain, this line would tell Rosetta to rebuild the C beta atom at 1.521736 Ang from CA, with a CB-CA-N angle of 180-69.625412 degrees (as a quirk, angles are measured as a deviation from "straight ahead", rather than from a typical reference), and with a CB-CA-N-C dihedral of -122.8 degrees.
The actual algorithm of how the coordinates are calculated can be found in Rosetta/source/src/core/kinematics/Stub.hh in Stub::spherical(). Briefly, the position of the atom based on distance/angle/dihedral is found with respect to a standard reference frame centered on the origin and aligned with the axes. This coordinate is then transformed via rotation matrix and displacment vector to be in the appropriate location with respect to the reference atoms. The reference frame is set up such that the appropriate rotation matrix can be found via simple vector algebra on the atom coordinates (see Stub::from_four_points() for details).
Building things like the backbone oxygen is just slightly different:
It's basically the same thing as before, except for that UPPER bit. That's just specifies that the reference atom should be - not an atom in this residue - but the atom in the residue which is connected to the "upper" connection point of the residue. (For amino acids, this would be the N connected to the C-terminus of this residue.) There's also an ICOOR line for the UPPER connection point, which is used when the residue isn't connected to another. (Though if it's a true terminus, the terminus patches remove the UPPER connection and redo the ICOOR for the oxygen to reference different atoms.)