Frequently Asked Questions (FAQs)

  1. When should I use ResistoXplorer?
  2. What types of input does ResistoXplorer accept?
  3. Does ResistoXplorer accepts raw sequencing data as an input?
  4. How can I format my data into an acceptable input for ResistoXplorer?
  5. Why should I use the data filtering option?
  6. Which category should I choose to perform data filtering?
  7. What should I normalize my data?
  8. What are the various normalization methods available in ResistoXplorer?
  9. Which normalization method should I choose?
  10. When should I opt to rarefy my data?
  11. What is the Unassigned category?
  12. How can I detect and deal with outliers?
  1. When should I use ResistoXplorer?

    ResistXplorer is a user-friendly, comprehensive web-based tool for analyzing data sets generated from AMR metagenomics studies. The focus of this tool is to perform visual, statistical and exploratory analysis of resistome data. It is designed to support five kinds of analysis:

    • Compositional profiling: provides various analyses commonly used in community ecology such as alpha diversity, rarefaction curves or ordination analysis, coupled with various intuitive visualization support (interactive sankey diagram, zoomable sunburst, treemap, stacked bar or area plots) to assist users gain a comphrehensive view of the composition of their uploaded resistome profiles.
    • Functional profiling: supports analysis and profiling of resistome at various functional categories or levels (e.g. Drug class, Mechanism) to help users acheive more biological and actionable insights together with a better understanding of their data.
    • Comparative analysis: providing multiple standard as well as more advanced (CoDA-based) normalization techniques coupled with differential analysis approaches (edgeR, DEseq2, LEfSE, metagenomeSeq, ANCOM and ALDEx2) to detect features (ARGs) that are significantly different between conditions under investigation.
    • Integrative analysis: enable users to integrate thier paired resistome and microbiome abundance profiles for understanding the complex interplay as well as exploring the potential associations between microbial ecology and AMR using several univariate and multivariate omics integration statistical methods.
    • ARGs-microbial host network exploration: allow users to intuitively explore the microbial hosts and functional associations of their uploaded list of antimicrobial resistance genes (ARGs) using a powerful and fully featured network visual analytics system to obtain better interpretation of AMR resistance mechanisms and information on possible dissemination routes of ARGs.

  2. What types of input does ResistoXplorer accept?

    ResistoXplorer supports most common format generated in metagenomic-based AMR studies i.e., count abundance tables (ARG/OTU/taxa), together with metadata and annotation files. Users can found the detailed descriptions, example datasets and screenshots on the Data Format page.

  3. Does ResistoXplorer accepts raw sequencing data as an input?

    ResistoXplorer is explicitly designed for downstream analysis and currently does not accept raw sequencing data. The annotated ARG count or abundance table obtained after preprocessing and upstream analysis of metagenomic sequencing data is the standard input for ResistoXplorer.

  4. How can I format my data into an acceptable input for ResistoXplorer?

    For shotgun metagenomics data, there are several bioinformatics pipelines for processing the raw sequence data into an ARG count table. For instance,

    • AMRPlusPlus: is a Galaxy-based metagenomics pipeline that uses current and new tools to help identify and quantify ARGs within metagenomic sequence data to generate count table in text format.
    • ARG-OAPs: it utilizes the SARG database and a hybrid functional gene annotation pipeline to do fast annotation and classification of ARG-like sequences from metagenomic data.
    • DeepARG: is a fully automated data analysis pipeline for antibiotic resistance annotation of metagenomic data using the deepARG algorithm and deepARG-DB database.

    For more information on available methods and databases for resistome annotation, please refer to the review articles by Moolchandani, B et al. and Gupta, CL et al.

  5. Why should I use the data filtering?

    The purpose of data filtering is to allow users to remove low quality and/or uninformative features to improve downstream statistical analysis. ResistoXplorer contains three data filtering procedures:

    1. Minimal data filtering - the procedure will remove features containing all zeros, or appearing in only one sample with very extremely low counts (considered as artifacts). These extremely rare features should be removed from analysis due to biological and technical considerations. It applies to all analysis except alpha and rarefaction analysis;
    2. Low abundance features - features that are present in a few samples with very low read counts are may be due to sequencing errors or low-level contaminations;
    3. Low variance features - features with constant variance across all the samples are unlikely to be associated with the conditions under study;

  6. Which category should I choose to perform data filtering?

    Users can choose from low count abundance filter options to filter features with very small counts in very few sample (i.e. low prevalence). If primary purpose is comparative analysis, users should filter features that exhibit low variance based on inter-quantile range, standard deviation or coefficient of variation, as they are very unlikely to be significant in the comparative analysis.

    The Data Inspection page provides the text summary and statistics of the uploaded files (Text Summary tab) along with graphically describing the read counts for all uploaded samples (Library Size Overview tab), which are informative in selecting or choosing the appropriate filtration method and their cutoff based on the data.

  7. Why should I normalize my data?

    Metagenomic data possess some unique characteristics such as vast differences in sequencing depth, sparsity, skewed distributions, over-dispersion and compositionality. Hence, it is critical to normalize the data to achieve comparable and meaningful results. To deal with these issues, ResistoXplorer supports three kind of normalization approaches such as:

    • Rarefaction: this method deals with uneven sequencing depths by randomly removing reads in the different samples until the sequencing depth is equal in all samples.
    • Scaling-based methods: these methods account for uneven sequencing depths by deriving a sample-specific scaling factor for bringing samples to the same scale for comparison.
    • Transformation-based: it includes approaches to deal with sparsity, compositionality, and large variations within the count data.

  8. What are the various normalization methods and which method should I choose?

    ResistoXplorer supports a variety of methods for data normalization. A brief summary is provided below:

    • Rarefaction: this method deals with uneven sequencing depths by randomly removing reads in the different samples until the library size of all the samples are same as sample with lowest sequencing depth. Whenever the library size of the samples varies too much (i.e., >10X), it is recommended to perform rarefaction before normalizing your data. It also seems to perform better during ordination or clustering analysis. For more details, please refer to paper by Weiss, S et al.
    • Total Sum Scaling (TSS) normalization: this method removes technical bias related to different sequencing depth in different libraries via simply dividing each feature count with the total library size to yield relative proportion of counts for that feature. For easier interpretation, we can multiply it by 1,000,000 to get the number of reads corresponding to that feature per million reads. It is also known as Count per Million (CPM) normalization. LEfSe utilizes this kind of approach for normalizing data before conducting statistical testing.
    • Log Count per million (CPM) normalization: this method perform log transformation on count per million normalized data in order to deal with large variance in count distributions in addition to library size differences. his kind of approach is been used by R packages such as edgeR and voom which are designed for identifying differentially abundant genes in RNA-Seq count data.
    • Cumulative Sum Scaling (CSS) normalization: this method corrects for differences in library size by calculating the scaling factors as the cumulative sum of gene abundances up to a data-derived threshold to remove the variability in data caused by highly abundant genes. By default, metagenomeSeq utilizes this approach for differential analysis.
    • Relative proportion: this approach computes the relative proportion of a feature by dividing each feature count by the total number of counts (library size) per sample.
    • Relative log expression (RLE) normalization: this method estimates the median library from the geometric mean of the gene-specific abundances over all samples. The median ratio of each sample to the median library is used as the scaling factor. By default, DESeq2 utilizes this approach for identifying differentially abundant genes. This method was proposed by Anders & Huber.
    • Trimmed mean of M-values (TMM) normalization: this method is proposed by Robinson & Oshlack, where the scaling factor is derived using a weighted trimmed mean over the differences of the log-transformed gene-count fold-change (relative abundance) between the samples. By default, edgeR utilizes this approach for differential analysis in ResistoXplorer.
    • Upper Quantile normalization: this approach calculates the scaling factors from the 75th percentile of the gene count distribution for each library, after removing genes which are zero in all libraries. This method is derived from edgeR package proposed by this method proposed by Bullard et al.
    • Hellinger Transformation: this method computes the relative proportion of a feature by dividing each feature count by the total number of counts (library size) per sample, and then taking the square root of it.
    • Log-Ratio (CLR and ALR) Transformation: these methods are specifically designed to normalize compositional data. They transforms the relative abundances of each element, or the values in the table of counts for each element, to ratios between all parts by using either geometric mean of the sample or single element as the reference. Further, taking the logarithm of these ratios, brings the data in a Euclidean (real) space, such that standard statistical methods can be applied. CoDA-based methods such as ALDEx2 and ANCOM utilize this approach for normalizing data before conducting statistical testing.

  9. Which normalization method should I choose?

    There is no consensus guideline with regard to which normalization performs the best and should be used for all types of datasets.The choice of method is dependent upon the type of analyses to be performed. Users can explore different approaches and visually investigate the clustering patterns (i.e., ordination plots, dendrogram and heatmap) to determine the effects of different normalization methods with regard to the experimental factor of interest. For detailed discussion about these methods and their performance on different type of analyses, users are referred to these comparative studies by Pereira, MB et al., Jonsson, V et al., Weiss, S et al. and McMurdie, PJ et al.

    The normalized data is used for exploratory data analysis including ordination, clustering and integrative analysis . Also, differential abundance testing using different approaches are performed on normalized data. However, each of these methods will use their own specific normalization procedure. For example, the relative log expression (RLE) normalization is used for DESeq2, and the centered log-ratio transformation is applied for ALDEx2.

  10. When should I opt to rarefy my data?

    Whenever the library size or sequencing depth of your samples differ from each other by more than 10 fold, it is recommended to perform rarefaction before normalizing your data. Such issue cannot be addressed by normalization methods directly. Note, users should also consider to remove the shallow sequenced samples as such gross difference could be due to experimental failure. For more details, please refer to the paper by Weiss, S et al. In ResistoXplorer, users can directly visualize the Library size (available on the Data Inspection page) or either perform rarefaction curve analysis to determine the sequencing depth with respect to number of ARGs (features) detected for each sample in order to check the need of rarefaction for their data.

  11. What is the Unassigned category?

    Unassigned category contain features without or missing (have NA) annotation at certain functional level. For instance, if you are looking at Class level, and 20% of the features do not have a Class-level annotation, these are merged together as Unassigned.

  12. How can I detect and deal with outliers?

    Outliers refers to those samples that are significantly different from the "majority". The potential outlier will distinguish itself as the sample located far away from major clusters or trends formed by the remaining data. To discover potential outliers, users can use a variety of summary plots to visualize their data. For instance, a sample with extreme diversity (alpha or ordination) or very low sequencing depth (rarefaction curve analysis or Library size overview) may be considered a potential outlier. Outliers may be arise due to either biological or technical reasons. To deal with outliers, the first step is to check if those samples were measured properly. In many cases, outliers are the result of operational errors during the analytical process. If those values cannot be corrected (i.e. by normalization procedures), they should be removed from your analysis.

Processing ....
Processing ....
Your session is about to expire!

You will be logged off in seconds.

Do you want to continue your session?