Using SeqWord Motif Mapper
BactEpiGenPro uses the "SeqWord Motif Mapper" program to seamlessly map methylation motifs/sites in a bacterial genome. This program and all mentioned scripts can be found on our GitHub page.
1. Dataset
Download example files
- Annotated reference genome in GenBank Flat File Format (.gbk).
- Modified nucleotides in General Feature Format (.gff).
- Sequence regions to ignore in .txt format (optional).
There are a few ways to obtain these files.
GBK files:
- Find GenBank files from a public database such as the
NCBI.
- Use a genome annotation tool such as PROKKA
and specify .gbk as your preferred output format.
- Already have an annotated genome in FASTA format? See GitHub page or convert to .gbk
here.
GFF files:
- We highly recommend using PacBio SMRT sequencing for methylation analysis. The SMRT Link software can be used to identify modified bases. An example script is available on our GitHub page.
Below is an example workflow that can guide your analysis.
2. Parameters
- Basic
- Motif: A specific nucleotide sequence recognised by methyltransferases.
- Modified base locations: Comma separated integer list of modified base positions within the motif. This value can be specified for both the forward and reverse strands.
- Show methylated motifs: Show or hide methylated sites in the output.
- Searching mode: Defines the searching mode for methylated sites in a motif. Sites are reported if one or more nucleotides are methylated, while motifs
are reported only if all specified nucleotide positions are methylated.
- Email: Send results to an email address (optional). We recommend using this option on larger datasets that may take longer to run.
- Advanced
- Filter regions: Sequence regions to ignore in .txt format.
- Allow context mismatches: Toggle context mismatches on or off (default = on).
- Context mismatches: The maximum number of context sequence mismatches allowed per methylation site (default = 2).
- Cut-off score: Sets the strictness of the search. This is defined as the likelihood that a given nucleotide is methylated (default = unset). For example, a cut-off of 21 equates to p = 0.01.
- Promoter sequence length: The length of the upstream promoter sequence (default = 75 bp).
3. Outputs
- Overview of genomic methylation sites found for a given motif.
- Text output including methylation motifs, locations, and gene annotations: