- Based on the plots you see in demux.qzv, what values would you choose for –p-trunc-len and –p-trim-left in this case?
- View the table.qzv QIIME 2 artifact, and in particular the Interactive Sample Detail tab in that visualization. What value would you choose to pass for –p-sampling-depth? How many samples will be excluded from your analysis based on this choice? How many total sequences will you be analyzing in the core-metrics-phylogenetic command?
- Which categorical sample metadata columns are most strongly associated with the differences in microbial community richness? Are these differences statistically significant?
- Which categorical sample metadata columns are most strongly associated with the differences in microbial community evenness? Are these differences statistically significant?
- Are the associations between subjects and differences in microbial composition statistically significant? How about body sites? What specific pairs of body sites are significantly different from each other?
- Do the Emperor plots support the other beta diversity analyses we’ve performed here? (Hint: Experiment with coloring points by different metadata.)
- What differences do you observe between the unweighted UniFrac and Bray-Curtis PCoA plots?
- When grouping samples by “body-site” and viewing the alpha rarefaction plot for the “observed_features” metric, which body sites (if any) appear to exhibit sufficient diversity coverage (i.e., their rarefaction curves level off)? How many sequence variants appear to be present in those body sites?
- When grouping samples by “body-site” and viewing the alpha rarefaction plot for the “observed_features” metric, the line for the “right palm” samples appears to level out at about 40, but then jumps to about 140. What do you think is happening here? (Hint: be sure to look at both the top and bottom plots.)
- Recall that our rep-seqs.qzv visualization allows you to easily BLAST the sequence associated with each feature against the NCBI nt database. Using that visualization and the taxonomy.qzv visualization created here, compare the taxonomic assignments with the taxonomy of the best BLAST hit for a few features. How similar are the assignments? If they’re dissimilar, at what taxonomic level do they begin to differ (e.g., species, genus, family, …)?
- Visualize the samples at Level 2 (which corresponds to the phylum level in this analysis), and then sort the samples by body-site, then by subject, and then by days-since-experiment-start. What are the dominant phyla in each in body-site? Do you observe any consistent change across the two subjects between days-since-experiment-start 0 and the later timepoints?
- Which sequence variants differ in abundance across Subject? In which subject is each sequence variant more abundant? What are the taxonomies of some of these sequence variants? (To answer the last question you’ll need to refer to another visualization that was generated in this tutorial.)
- Which genera differ in abundance across subject? In which subject is each genus more abundant?