dnaMD.DNA class
Summary
|
To get the parameters over all frame for the given range of base pair/steps |
|
To read and store basepairs parameters (shear, stretch, stagger, buckle, propeller and opening) from an input file. |
|
To read and set local helical radius of both strand |
|
To read and store base-step (Shift, Slide, Rise, Tilt, Roll and Twist) and helical base-step (X-disp, Y-disp, h-Rise, Inclination, Tip and h-Twist) parameters from an input file |
|
To calculate average and error of the given parameter for the given set of base-pairs/steps |
|
To read and set local helical-axis positions from an input file. |
|
To get the parameter of either a specfic base-pair/step or a DNA segment as a function of time. |
|
To get the parameter distribution of either a specific base-pair/step or a DNA segment over the MD simulation. |
|
To read and store Major and Minor grooves from an input file. |
|
To read and store backbone dihedrals (alpha, beta, gamma, delta, epsilon and zeta) and chi dihedral of both strands from an input file. |
|
To determine the global helical axis by smoothing local axis using spline interpolation. |
To calculate curvatures and tangent vectors along the helical axis. |
|
|
To calculate angle (Radian) between two tangent vectors of global helical axis. |
|
To calculate angles (Radian) in 2D plane between two tangent vectors of global helical axis. |
|
To write trajectory of helcial-axis as a PDB format file. |
Documentation
- class DNA(num_bp, filename=None, startBP=1)[source]
DNA class stores all data obtained from the input files.
To initialize this class:
dna = DNA(60) # 60 is the number of basepairs
This class also contains several methods (functions) that are discussed in following sections.
The data inside the class is organized as a dictionary in following architecture:
|------- bp ---- BP number ---- Parameter array (1D/2D) data-----| |------- bps ---- BP Step Number ---- Parameter array (1D/2D)
- Parameters:
num_bp (int) – Number of base-pairs in the DNA.
filename (str) – Name of HDF5 file
startBP (int) – Number ID of first basepair.
- num_bp
Number of base-pairs in the DNA.
- Type:
int
- num_step
Number of base-steps in the DNA.
- Type:
int
- filename
Name of HDF5 file
- Type:
str
- startBP
Number ID of first basepair.
- Type:
int
- smooth_axis
If axis is smoothed and global axis is already determined.
- Type:
bool
- data
Dictionary of data. All data are stored in this dictionary.
- Type:
dictionary
- time
Trajectory time.
- Type:
ndarray
- mask
Boolean array indicating which frame of trajectory should be masked.
- Type:
ndarray
- h5
h5py.File object to read the data directly from file.
- Type:
object
- _openFile()[source]
Open the HDF5 file for reading/writing.
This methad is called during the initialization of this class.
- calculate_2D_angles_bw_tangents(paxis, base_step, masked=True)[source]
To calculate angles (Radian) in 2D plane between two tangent vectors of global helical axis.
Principal Axis
2D planes
X-axis
XY and XZ planes
Y-axis
XY and YZ planes
Z-axis
XZ and YZ planes
- Parameters:
paxis (str) – Principal axis parallel to the DNA axis. For example, IF DNA helical axis is parallel or aligned with Z-axis, paxis should be
'Z'
.base_step (1D list) – List of two base-steps for which angle will be calculated. For example: base_step =
[5, 50]
angle between tangent vector 5th and 50th base-steps will be calculated.masked (bool) –
Default=False
. To skip specific frames/snapshots.DNA.mask
array should be set to use this functionality. This array contains boolean (eitherTrue
orFalse
) value for each frame to mask the frames. Presently, mask array is automatically generated duringDNA.generate_smooth_axis()
to skip those frames where 3D fitting curve was not successful within the given criteria.
- Returns:
angle-one (1D array) – Array of calculated angle in first plane of length is equal to number of frames. When
masked
is applied, length of this array can be smaller than total number of frames.angle-two (1D array) – Array of calculated angle in second plane of length is equal to number of frames. When
masked
is applied, length of this array can be smaller than total number of frames.
- calculate_angle_bw_tangents(base_step, cumulative=False, masked=False)[source]
To calculate angle (Radian) between two tangent vectors of global helical axis.
- Parameters:
base_step (1D list) –
List of two base-steps for which angle will be calculated. For example: base_step =
[5, 50]
either of following can be calculated.angle between tangent vector 5th and 50th base-steps.
summation over 44 angles that are formed between adjacent tangent vectors of 5-50 bp DNA segment.
See below two choose between these two types.
cumulative (bool)) –
Default: False
: If it is false, first type of angle is calculated otherwise second type of angle is calculated as explained in the above example of optionbase_step
.masked (bool) –
Default=False
. To skip specific frames/snapshots.DNA.mask
array should be set to use this functionality. This array contains boolean (eitherTrue
orFalse
) value for each frame to mask the frames. Presently, mask array is automatically generated duringDNA.generate_smooth_axis()
to skip those frames where 3D fitting curve was not successful within the given criteria.
- Returns:
angle – Array of calculated angle of length is equal to number of frames. When
masked
is applied, length of this array can be smaller than total number of frames.- Return type:
1D array
- calculate_curvature_tangent(step_range=False, step=None, store_tangent=False)[source]
To calculate curvatures and tangent vectors along the helical axis.
The curvature and tangent vectors are calculated using Frenet-Serret formula. The calculated values are stored in
DNA.data
dictionary and also in HDF5 file.- Parameters:
step_range (bool) –
step_range = True
: Calculate curvature and tangent vectors for the given range of base-stepsstep_range = False
: Calculate curvature and tangent vectors for entire DNA. If smoothed helical-axis of any base-step will be found to be not available, error will be raised.
step (list) –
- List containing lower and higher limit of base-steps range.
This option only works with
step_range=True
.This list should not contain more than two number.
First number should be less than second number.
- Example for base-step 4 to 15:
step = [4,15] # step_range = True
store_tangent (bool) –
store_tangent = True
: The calculated tangent vectors will be stored for later use.store_tangent = False
: The calculated tangent vectors will be discarded.
In case of HDF5 file, calculated tangents will be stored in this file and it will not add cost to memory. However, without HDF5 file, storing tangents in
DNA.data
will be expansive for memory.
- generate_smooth_axis(step_range=False, step=None, smooth=500.0, spline=3, fill_point=6, cut_off_angle=20)[source]
To determine the global helical axis by smoothing local axis using spline interpolation.
Note
A 3D curve is fitted on local helical axis that are calculated using
do_x3dna
tool. Sometimes in few frames, fitting may not be accurate and produces artifact. To record these frames,DNA.mask
array containing boolean values are generated. If value isTrue
, fitting might not be correct and vice versa. This array could be used in later analysis to skip/mask the frames containing inaccurate axis.Warning
This function requires SciPy package.
- Parameters:
step_range (bool) –
step_range = True
: Make axis smooth for the given range of base-stepsstep_range = False
: Make axis smooth for for entire DNA. If original helical-axis of any base-stepwill be found to be not available, error will be raised.
step (list) –
- List containing lower and higher limit of base-steps range.
This option only works with
step_range=True
.This list should not contain more than two number.
First number should be less than second number.
- Example for base-step 4 to 15:
step = [4,15] # step_range = True
smooth (float) –
A smoothing condition. For more details, see about
s = None
, which is passed into scipy.interpolate.splprep() method.Warning
Lower value may lead to an artifact of local sharp kink in the smoothed axis.
Higher value may lead to the calculation of wrong helical axis.
spline (int) –
Degree of spline. For more details, see about
k = 3
, which is passed into scipy.interpolate.splprep() method.fill_point (int) – Number of intrapolated points between two adjacent helical-axis coordinates.
cut_off_angle (float) –
Cut-off bending angle to define sharp kink in fitted curve. If angle in fitted curve is larger than this cut-off, refitting will be performed after deleting few of the original helical axis positions. If after this deletions, bending angle will not reduce below cut-off angle, value of
smooth
will be increased by 100 and entire cycle of fitting-refitting will be performed. When, value ofsmooth
increases to more than 10000 during this fitting-refitting cycles, fitting process will be stopped with a warning message.To record the frames with bad fitting,
True
value will be stored inDNA.mask
array for respective frame.
- get_mean_error(bp, parameter, err_type='std', bp_range=True, merge_bp=1, merge_method='mean', masked=False, tool='gmx analyze')[source]
To calculate average and error of the given parameter for the given set of base-pairs/steps
Warning
To calculate errors by using
error = 'acf'
orerror = 'block'
, GROMACS toolg_analyze
orgmx analyze
should be present in$PATH
.- Parameters:
bp (1D list or array) –
base-pairs to analyze Example:
bp = [6] # bp_range = False bp = [4,15] # bp_range = True bp = range(4,15) # bp_range = False bp = np.arange(4,15) # bp_range = False bp = [2,5,6,7,9,12,18] # bp_range = False
parameter (str) – Name of a parameter. For details about accepted keywords, see
parameter
in the methodDNA.get_parameters()
.error (str) –
Method of error estimation. Currently accepted method as follows:
error = 'std'
: Standard Deviationerror = 'acf'
: Standard error using autocorrelation time (requires:g_analyze
orgmx analyze
)error = 'block'
: Standard error using block averaging method (requires:g_analyze
orgmx analyze
)
bp_range (bool) –
Default=True
: As shown above, ifTrue
, bp is taken as a range otherwise list or numpy array.merge_bp (int) – Number of base-pairs or steps to merge for creating the small DNA segments
merge_method (str) –
Method to calculate the parameter of a DNA segment from local parameters of all base-pairs/steps that are between the range given through
bp
. Currently accepted keywords are as follows:merge_method = mean
: Average of local parametersmerge_method = sum
: Sum of local parameters
masked (bool) –
Default=False
. To skip specific frames/snapshots.DNA.mask
array should be set to use this functionality. This array contains boolean (eitherTrue
orFalse
) value for each frame to mask the frames. Presently, mask array is automatically generated duringDNA.generate_smooth_axis()
to skip those frames where 3D fitting curve was not successful within the given criteriatool (str) – Gromacs tool
g_analyze
orgmx analyze
orgmx_mpi analyze
etc. It will be used to calculate autocorrelation time or block averaging error. It should be present in$PATH
- Returns:
basepairs/steps (1D array) – Number of base pair-steps. If
merge_bp>1
, middle number will be returned.values (1D array) – Average values of the parameter
errors (1D array) – Error values for corresponding average values
- get_parameters(parameter, bp, bp_range=True, masked=False)[source]
To get the parameters over all frame for the given range of base pair/steps
- Parameters:
parameter (str) –
Input parameter name. Parameter from following list are accepted:
shear
stretch
stagger
buckle
propeller
opening
shift
slide
rise
tilt
roll
twist
x-disp
y-disp
h-Rise
inclination
tip
h-Twist
helical X-axis
helical Y-axis
helical z-axis
helical X-axis smooth
helical Y-axis smooth
helical z-axis smooth
curvature
tangent
radius S-1
radius S-2
major Groove
major Groove Refined
minor Groove
minor Groove Refined
alpha S1
beta S1
gamma S1
delta S1
epsilon S1
zeta S1
chi S1
alpha S2
beta S2
gamma S2
delta S2
epsilon S2
zeta S2
chi S2
bp (1D list) –
List of base-pairs to analyze Example:
bp = [6] # bp_range = False bp = [4,15] # bp_range = True bp = range(4,15) # bp_range = False bp = np.arange(4,15) # bp_range = False bp = [2,5,6,7,9,12,18] # bp_range = False
bp_range (bool) –
Default=True
: As shown above, ifTrue
, bp is taken as a range otherwise list or numpy array.masked (bool) –
Default=False
: To skip specific frames/snapshots. dnaMD.DNA.mask array should be set to use this functionality. This array contains boolean (eitherTrue
orFalse
) value for each frame to mask the frames. Presently, mask array is automatically generated duringDNA.generate_smooth_axis()
method to skip those frames where 3D fitting curve was not successful within the given critera.
- Returns:
parameters –
parameters[bp][nframe] (2D list)
: where bp is number of base pairs/steps and nframe is total number of frames in the trajectory.- Return type:
2D list
- parameter_distribution(parameter, bp, bins=30, merge=False, merge_method='mean', masked=False)[source]
To get the parameter distribution of either a specific base-pair/step or a DNA segment over the MD simulation.
- Parameters:
parameter (str) – Name of a base-pair or base-step or helical parameter For details about accepted keywords, see
parameter
in the methodDNA.get_parameters()
.bp (1D list or array) –
base-pairs to analyze Example:
bp = [6] # merge = False bp = [4,15] # merge = True
int (bins) – Number of bins to calculate histogram
merge (bool) –
Default=False
: As shown above, ifTrue
, bp should a list of range otherwise a list of single value. Ifbp = True
, the parameter for the respective DNA segment could be merged or calculated bymerge_method
.merge_method (str) –
Method to calculate the parameter of a DNA segment from local parameters of all base-pairs/steps that are between the range given through
bp
. Currently accepted keywords are as follows:merge_method = mean
: Average of local parametersmerge_method = sum
: Sum of local parameters
masked (bool) –
Default=False
. To skip specific frames/snapshots.DNA.mask
array should be set to use this functionality. This array contains boolean (eitherTrue
orFalse
) value for each frame to mask the frames. Presently, mask array is automatically generated duringDNA.generate_smooth_axis()
to skip those frames where 3D fitting curve was not successful within the given criteria.
- Returns:
values (1D array) – Array containing parameter values
density (1D array) – Array containing density for respective parameter values
- set_backbone_dihedrals(filename, bp, parameters='all', bp_range=True)[source]
To read and store backbone dihedrals (alpha, beta, gamma, delta, epsilon and zeta) and chi dihedral of both strands from an input file.
Note
alpha: O3’(i-1)-P-O5’-C5’
beta: P-O5’-C5’-C4’
gamma: O5’-C5’-C4’-C3’
delta: C5’-C4’-C3’-O3’
epsilon: C4’-C3’-O3’-P(i+1)
zeta: C3’-O3’-P(i+1)-O5’(i+1)
chi for pyrimidines(Y): O4’-C1’-N1-C2
chi for purines(R): O4’-C1’-N9-C4
- Parameters:
filename (str) – Input file, which is generated from do_x3dna. e.g. BackBoneCHiDihedrals_g.dat
bp (1D list or array) –
base-pairs to analyze Example:
bp = [6] # bp_range = False bp = [4,15] # bp_range = True bp = range(4,15) # bp_range = False bp = np.arange(4,15) # bp_range = False bp = [2,5,6,7,9,12,18] # bp_range = False
parameters (str or 1D list) –
Either
All
to extract all angles or list of angles as follows:alpha S1
beta S1
gamma S1
delta S1
epsilon S1
zeta S1
chi S1
alpha S2
beta S2
gamma S2
delta S2
epsilon S2
zeta S2
chi S2
bp_range (bool) –
Default=True
: As shown above, ifTrue
, bp is taken as a range otherwise list or numpy array.
- set_base_pair_parameters(filename, bp, parameters='all', bp_range=True)[source]
To read and store basepairs parameters (shear, stretch, stagger, buckle, propeller and opening) from an input file.
- Parameters:
filename (str) – Input file, which is generated from do_x3dna. e.g. L-BP_g.dat
bp (1D list or array) –
base-pairs to analyze Example:
bp = [6] # bp_range = False bp = [4,15] # bp_range = True bp = range(4,15) # bp_range = False bp = np.arange(4,15) # bp_range = False bp = [2,5,6,7,9,12,18] # bp_range = False
parameters (1D list) –
List of base-pairs parameters as follows:
shear
stretch
stagger
buckle4
propeller
opening6
By default all six parameters will be extracted from the file.
bp_range (bool) –
Default=True
: As shown above, ifTrue
, bp is taken as a range otherwise list or numpy array.
- set_base_step_parameters(filename, bp_step, parameters='all', step_range=True, helical=False)[source]
To read and store base-step (Shift, Slide, Rise, Tilt, Roll and Twist) and helical base-step (X-disp, Y-disp, h-Rise, Inclination, Tip and h-Twist) parameters from an input file
- Parameters:
filename (str) – Input file, which is generated from do_x3dna. e.g. L-BPS_g.dat or L-BPH_g.dat.
bp_step (1D list or array) –
base-steps to analyze Example:
bp_step = [6] # step_range = False bp_step = [4,15] # step_range = True bp_step = range(4,15) # step_range = False bp_step = np.arange(4,15) # step_range = False bp_step = [2,5,6,7,9,12,18] # step_range = False
parameters (str or 1D list) –
Either
All
to extract all parameter available in input file or list of either base-step or helical base-step parameter as follows:If helical =
False
:shift
slide
rise
tilt
roll
twist
If helical =
True
:x-disp
y-disp
h-rise
inclination
tip
h-twist
step_range (bool) –
Dfault=True
: As shown above, ifTrue
, bp_step is taken as a range otherwise list or numpy array.helical (bool) – If
True
, parameters in input file will be considered as helical base-steps parameters IfFalse
, parameters will be considered as base-steps parameters.
- set_helical_axis(filename, step_range=False, step=None)[source]
To read and set local helical-axis positions from an input file.
- Parameters:
filename (str) – Input file, which is generated from do_x3dna. e.g. HelAxis_g.dat
step_range (bool) –
step_range = True
: read axis coordinates of base-steps for the given range of base-stepsstep_range = False
: read axis coordinates of all base-steps
step (list) –
- List containing lower and higher limit of base-steps range.
This option only works with
step_range=True
.This list should not contain more than two number.
First number should be less than second number.
- Example for base-step 4 to 15:
step = [4,15] # step_range = True
- set_helical_radius(filename, bp, atomname='P', full=False, bp_range=True)[source]
To read and set local helical radius of both strand
- Parameters:
filename (str) – Input file, which is generated from do_x3dna. e.g. HelixRad_g.dat
bp (1D list or array) –
base-pairs to analyze Example:
bp = [6] # bp_range = False bp = [4,15] # bp_range = True bp = range(4,15) # bp_range = False bp = np.arange(4,15) # bp_range = False bp = [2,5,6,7,9,12,18] # bp_range = False
atomname (str) – Atom name to consider for the DNA helix (accepted keywords:
P
,O4*
,O4'
,C1*
andC1
)full (bool) – To calculate full helical radius. Overrides atomname option and uses atom
P
, and 1 A is added to the radius calculated by 3DNA packagebp_range (bool) –
Default=True
: As shown above, ifTrue
, bp is taken as a range otherwise list or numpy array.
- set_major_minor_groove(filename, bp_step, parameters='all', step_range=True)[source]
To read and store Major and Minor grooves from an input file.
Minor groove : direct P-P distance
Minor Groove Refined : refined P-P distance which take into account the directions of the sugar-phosphate backbones
Major groove : direct P-P distance
Major Groove Refined : refined P-P distance which take into account the directions of the sugar-phosphate backbones
Warning
The major and minor grooves (direct P-P) cannot be calculated for first and last two base-steps
The major and minor grooves (refined P-P) cannot be calculated for first and last three base-steps
- Parameters:
filename (str) – Input file, which is generated from do_x3dna. e.g. MGroove_g.dat
bp_step (1D list or array) –
base-steps to analyze Example:
bp_step = [6] # step_range = False bp_step = [4,15] # step_range = True bp_step = range(4,15) # step_range = False bp_step = np.arange(4,15) # step_range = False bp_step = [2,5,6,7,9,12,18] # step_range = False
parameters (1D list) –
- List of groove parameter names as follows:
minor groove
minor groove refined
major groove
major groove refined
By default all four groove parameters will be extracted from the file.
step_range (bool) –
Default=True
: As shown above, ifTrue
, bp_step is taken as a range otherwise list or numpy array.
- time_vs_parameter(parameter, bp, merge=False, merge_method='mean', masked=False)[source]
To get the parameter of either a specfic base-pair/step or a DNA segment as a function of time.
- Parameters:
parameter (str) – Name of a base-pair or base-step or helical parameter. For details about accepted keywords, see
parameter
in the methodDNA.get_parameters()
.bp (1D list or array) –
base-pairs to analyze Example:
bp = [6] # merge = False bp = [4,15] # merge = True
merge (bool) –
Default=False
. As shown above, ifTrue
, bp should a list of range otherwise a list of single value. Ifbp = True
, the parameter for the respective DNA segment could be merged or calculated bymerge_method
.merge_method (str) –
Method to calculate the parameter of a DNA segment from local parameters of all base-pairs/steps that are between the range given through
bp
. Currently accepted keywords are as follows:merge_method = mean
: Average of local parametersmerge_method = sum
: Sum of local parameters
masked (bool) –
Default=False
. To skip specific frames/snapshots.DNA.mask
array should be set to use this functionality. This array contains boolean (eitherTrue
orFalse
) value for each frame to mask the frames. Presently, mask array is automatically generated duringDNA.generate_smooth_axis()
to skip those frames where 3D fitting curve was not successful within the given criteria.
- Returns:
time (1D array) – Array containing time of each frame from trajectory
value (1D array) – Array containing parameter values for each frame from trajectory
- write_haxis_pdb(filename='helical_axis.pdb', step_range=False, step=None, write_smooth_axis=True, write_orig_axis=False, write_curv=False, scale_curv=1)[source]
To write trajectory of helcial-axis as a PDB format file.
Both local helical axis and global (smoothed) axis can be written to PDB file. For global axis, curvature could be written in B-factor field of PDB file.
- Parameters:
filename (str) – Name of the output PDB format file.
step_range (bool) –
step_range = True
: Make axis smooth for the given range of base-stepsstep_range = False
: Make axis smooth for entire DNA. If original helical-axis of any base-step willbe found to be not available, error will be raised.
step (list) –
- List containing lower and higher limit of base-steps range.
This option only works with
step_range=True
.This list should not contain more than two number.
First number should be less than second number.
- Example for base-step 4 to 15:
step = [4,15] # step_range = True
write_smooth_axis (bool) – Write coordinates of smoothed helical axis as chain A.
write_orig_axis (bool) – Write coordinates of original helical axis (output from do_x3dna) as chain B.
write_curv (bool) – Write curvature of smoothed helical axis in B-factor column of PDB file.
scale_curv (int) – Scaling of curvature.
curvature * scale_curv
is written in B-factor column of PDB file.