Align Tab
Under the Align tab, you will be able to align RNA and protein evidence against the genome you are annotating. For eukaryotes, nucleotide BLAST, BLAT, HISAT2, PASA, and TopHat are available to use for transcript (nucleotide) alignments and protein BLAST and DIAMOND for protein alignments (Fig. 26). For prokaryote projects, nucleotide and protein BLAST are available along with BLAT. The page defaults to displaying the "Transcript Alignments" tool section, and the "Protein Alignments" tool section can be opened by clicking on the section name (Fig. 26A). To set up a job for a tool, click on the gray box that contains the tool name (Fig. 26B). Some of the tools have numerous settings that the user can adjust, others have few or no settings that can be adjusted by the user. You can run the tools multiple times, with different settings, as long as each job has an unique Job Name (Fig. 27A). Please see the "Available Tools" table for more details about the alignment tools.
Figure 26. Align Tab in GenSAS.
For nucleotide BLAST, BLAT and PASA, there are global data files available to use. GenSAS provides the mRNA NCBI RefSeq datasets for all the major RefSeq categories (Fig. 27B). Any transcript and EST evidence files that you uploaded will also be available for use with these tools. For PASA, you will also have to select the "Vector FASTA file" in order for the tool to run (Fig. 28A). PASA checks sequences for vector contamination. As with the the repeat tools, please look at the results of these tools before using the data in downstream tools.
Figure 27. BLAST interface.
For RNA-seq reads, you can align them to the genome using HISAT2 or TopHat. This alignment can then be used to train Augustus during the "Structural" step. To align RNA-seq reads, either select the single read file or the paired read files (Fig. 28B) you uploaded during the Evidence step of GenSAS under the HISAT2 or TopHat job creation form. Please be aware that results from HISAT2 and TopHat job take awhile to load in JBrowse once the job has completed.
Figure 28. PASA and HISAT2 interfaces.
To set up a protein BLAST of DIAMOND job, click on "Protein Alignments" and then the tool name to access the job creation interface (Fig. 29A). During this phase of GenSAS, we only provide access to the RefSeq, SwissProt, and TrEMBL global databases for DIAMOND alignments becuase DIAMOND runs much quicker than BLAST. Under the "Functional" step of GenSAS, you will be able to run protein BLAST with these global databases since aligning gene models with these databases is less computationally intensive. Under the "Align" step, we do allow you to perform protein BLAST with species-specific evidence (Fig. 29B) that you have loaded into GenSAS during the "Evidence" step.
Figure 29. Protein BLAST under Align step.
Once you are done with setting up alignment tool jobs, click "Proceed to the next step" near the top of the tab to move to the "Structural" annotation step of GenSAS.