SeqWord Motif Mapper (SWMM) is a tool that conducts visual and statistical analysis on bacterial methylation patterns. The main focus of the program is to detect the distribution of canonical and non-canonical methylation motif distributions.
Users are expected to provide PacBio genomic inputs and must select the tasks for the program to run. The user will then submit these tasks and can view the results online or locally once the analysis is compete.
The online version of SWMM is slightly limited in its allowed parameters, but full usage is available on GitHub.
Users are required to upload a reference genome (genomic, assembled, or plasmid) in GenBank (GBK) format and the modified nucleotides that were detected by the ipdSummary program in General Feature Format (GFF). An optional mask file can also be provided if certain regions need to be skipped from the analysis. The maximum file size per upload is currently 100 MB.
General SettingsSet global options for all tasks where applicable. This includes format of outputs and settings for the context of sequence regions found near methylation sites.
--Exclude Strand
Exclude leading or lagging strands from analysis.
--Promoter Length
Set length of nearby promoter regions in base pairs. This is especially important for the statistics panel
--Maximum BLAST Mismatches
SWMM uses BLASTN alignments of the GFF file to find consensus matches in the GBK file. Use this parameter to limit the number of mismatches allowed. The motif mismatch is a related parameter that specifies full or partial motif matching.
--Cut-off Score
Set the lower limit for NucMod scores per modified base. These NucMod values represent the likelihood that a base is modified and should be present in the GFF file.
Circular MapGenerates a Circos-like plot representing the genomic locations of methylated sites that map to a motif on either DNA strand.
--Motif
Specific nucleotide sequence recognized by a methyltransferase. Enter a combination of characters from the table below with modified locations. Example: GATC,2,-2.
A | T | G | C | U | R | Y | S | W | K | M | B | V | D | H | N |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | T | G | C | U | [A, G] | [C, T] | [G, C] | [A, T] | [G, T] | [A, C] | [C, G, T] | [A, C, G] | [A, G, T] | [A, C, T] | [A, C, G, T] |
--Search Mode
Allow partial (sites) or strict (motifs) matches.
--Show Modified
Affects the output by showing sites that include or exclude the motif.
--Window Length
Refers to the base pair size of the sliding window used for the search. Window step is a related parameter that affects the increment by which the window advances.
--Cut-off Score
Refer to General Settings
Dot PlotGenerates a dot-plot showing NucMod scores of modified or strictly m6A/m4C methylated nucleotides vs the coverage estimated by ipdSummary.
--Modified Nucleotides
Filter data points by modified nucleotides (A,C,T,G).
--Methylation Type
Filter data points by methylation type (m6A/m4C).
--Dotplot Motif
Filter data points by a motif (see general settings).
--Cut-off Score
Adjusts y-axis based on upper or lower limit of cut-off score (refer to general settings).
--Maximum Coverage
Adjusts x-axis by an upper coverage limit.
Statistics PanelAdds a panel of nucleotide-based statistics to your tasks. Here the user is shown information on methylated sites and potentential downstream effects. It is worth noting that running this task can be computationally and time expensive, especially for many sites. To mitigate this, it is recommended to input a more specific motif or to increase the strictness of the search.
--Tasks
Additional statistics to calculate. Select from GC-content (GC), GC-Skew (GCS), and mobile genetic elements (mge).
This section provides a tutorial on the programs usage.
We will be working with the annotated genome of a Bacillus velezensis strain and have obtained information on the base modifications. You can download the files used here.
First, I need to upload the GBK and GFF file. If I want to filter out features, then I can easily do so by uploading a filter file. For this analysis, I have opted not to exclude any regions, but an example filter file is included in the tutorial folder
Next, I must add tasks for SWMM to run. I am happy with the general settings so I will leave them as is.
Currently, I do not have a good gauge of how much methylation is present in my genome. Therefore, I will add a dotplot to identify what we are working with before proceeding with the analysis. To add the dotplot, click the Dotplot button in the Tasks tab, modify the parameters from the generated menu, and click “Add Task”.
To submit the task, click the submit button. You can also enter your email address should you wish to access the results at a later stage.
You will then be redirected by the browser to a link for the outputs page which is also emailed to you.
My data looks good since most of the detected nucleotides that pass the cut-off score of 21 can be found under 30X coverage. You can download the graph in a specified format as shown in the picture.
I will now submit another job to the server. Since we have already run the dotplot, you may remove it from the list of tasks by clicking the “Reset task” button. This time, I want to add a circular map that includes a statistics panel. A study by Zhao et al. (2023) shows that certain L. paracasei strains have several motifs, such as CYYANNNNNNGTG, that are involved in the regulation of carbohydrate metabolism. I want to see if the B. velezensis strain has a similar methylation profile. Enter CYYANNNNNNGTG,1,4 as the input motif, which is methylated at the 1st and 4th positions. Keep defaults for the other settings.
You should see a circular diagram showing the distribution of methylated sites and several statistics below it. The returned results show that my genome does indeed contain methylation for the entered motif.
These results can be used to characterize the methylation pattern. For example, we can see that methylation of the entered motif has a clear preference for the reverse strand and that the frequency of sites is significantly reduced across promoter and coding regions. However, if we look at the detailed text output, one of these sites has putative downstream effects on Hexuronate utilization and may be involved in carbohydrate metabolism. I can now use these results to guide more focused studies on the organism.
References used in this example: Zhang, H., & Zhang, W. (2023). Roles of adenine methylation in the physiology of Lacticaseibacillus paracasei. Nature Communications, 14(1). https://doi.org/10.1038/s41467-023-38291-1 .