hichipper parameters¶
Running
hichipper --help
shows the parameters that can be used in this software package as reproduced below.
Usage: hichipper [OPTIONS] MANIFEST
A preprocessing and QC pipeline for HiChIP data.
Options:
--out TEXT Output directory name; must not be an already existing directory [Required]
--min-dist TEXT Minimum distance; default = 5000
--max-dist TEXT Peak padding width (applied on both left and
right); default = 2000000
--macs2-string TEXT String of arguments to pass to MACS2; only is
called when peaks are set to be called;
default = "-q 0.01 --extsize 147 --nomodel"
--macs2-genome TEXT Argument to pass to the -g variable in MACS2
(mm for mouse genome; hs for human genome);
default = "hs"
--peak-pad TEXT Peak padding width (applied on both left and
right); default = 500
--merge-gap TEXT Merge nearby peaks (after all padding is
complete); default = 500
--keep-temp-files Keep temporary files?
--skip-background-correction Skip restriction fragment aware background
correction?
--skip-resfrag-pad Skip restriction fragment aware padding
--skip-qc Skip QC report generation?
--skip-diffloop Skip analyses in diffloop (e.g. Mango loop
calling; .rds generation)
--make-ucsc Make additional output files that can support
viewing in UCSC genome browser; requires tabix
and htslib tools.
--keep-samples TEXT Comma separated list of sample names to keep;
ALL (special string) by default
--ignore-samples TEXT Comma separated list of sample names to
ignore; NONE (special string) by default
--read-length TEXT Length of reads from sequencing runs; default = 75
--version Show the version and exit.
--help Show this message and exit.
Running
hichipper --version
will show the version of this package currently installed.
hichipper, version 0.6.0
Check the badge up top to see if a newer version is available or try directly through pip
:
pip install hichipper --upgrade
Unless these flags are supplied, the pipeline will attempt to run. Minimally sufficient parameters include
the --out
flag and a .yaml
file as shown in the example executions. Below are some explanations of the
additional parameters than can be configured when executing the pipeline.
Restriction-fragment aware padding¶
Parameter explanations¶
Most of the parameter options are fairly straight forward. Running hichipper --version
or hichipper --help
doesn’t run the tool but supplies the information noted above. Otherwise, the default run mode requires
a .yaml
file supplied in addition to the --out
parameter, which specifies the output directory of the run.
Users can decide to customize final output by using boolean flags or supply variable text input. The following
cartoon shows a graphical overview of important parameters to consider when running hichipper.
As noted in orange, defined peaks are automatically padded by some integer width from the --peak-pad
flag. By default,
this pad extends 500 base pairs in either direction. Padding the peaks boosts the number of PETs that can be mapped to loops.
For example, PET II
would not be considered in loop since the left end of the read does not overlap with the called peak (black).
However, it does overlap with the padded peak, so it is retained with padding. When two peaks are close to one another, they may
be merged using the --merge-gap
command. As suggested in the figure, the padded peaks B
and C
may be sufficiently close to be merged into a single anchor.
Note that this can lead to some PETs becoming self-ligation (e.g. I-III
). Note, the --merge-gap
command is equivalent to running
bedtools merge -d on the padded anchors.
By default, the merge-gap
is 500 base pairs. Specifying this to 0
can cause issues, particularly when the width of a PET
spans multiple peaks.
The dist
or distance between two peaks is noted in black as the center of two peaks. The --min-dist
flag is the smallest
and --max-dist
is the largest integer number that ensures this distance falls between to be considered in a loop. These defaults
are 5Kb and 2Mb as smaller reads are likely self-ligations whereas larger reads are unlikely to be biologically real loops.
From our inspection of the HiChIP data, we determined that reads for putative loops localize to the edges of restriciton fragments and that
increasing the padding to the edges of fragments can prove beneficial for maximizing the read density associated with loops.
By default, hichipper adds additional padding to the edges of restriction fragments unless the --skip-resfrag-pad
pad is thrown.
To give an example, consider peak C
. Under the default options, the anchor corresponding to this peak would span restriction
fragments 7
and 9
(assuming that the merge-gap
is small enough such that B
and C
are not merged). This is because the
additional restriction fragment padding occurs after the original padding (orange), and hichipper padds to the edges of whatever
fragment pads are overlapping the peak. However, if the user specified --peak-pad 0
, the anchor corresponding to C
would only span
restriction motifs 8
and 9
since the peak only sits on that singular fragment. While we have found the default options to be
sensible, the user can adjust these padding parameters to potentially increase precision of anchors calls possibly at the expense of PET density or vice-versa.
Finally, we note the --macs2-genome
and --macs2-string
which by default has parameters that we felt were suited appropriately for processing HiChIP data.
However, users can modify these when performing peak-calling from HiChIP data directly.
User parameter recommendations¶
- If
R
is not in the system or if theR
package dependencies could not be installed, the following flags should be added:
--skip-resfrag-pad --skip-diffloop --skip-qc --skip-background-correction
- In the current version of hichipper, the novel background correction implementation is quite memory intense. Thus, users running hichipper on a laptop or other low RAM machine should likely skip the adaptive background correction.
--skip-background-correction