GBS Pipeline Tutorial - Step 3 - Alignment
Alignment
Introduction:
We are now half-way through the GBS pipeline. If you have gotten this far you have done very well! The next couple of steps are mostly straight forward and just require one or two files. If you have any questions about the output from either step you've done so far or if you are having difficulties you are welcome to come to the bioinformatics group for assistance.
Materials:
For this step we will need just one file. This file is a FASTA of the reference genome for your species of interest. If you have your own local copy of the genome you can transfer it to the server using WinSCP again. Genome files can be quite large in size so this file transfer may take a while. If you do not have your own copy of the genome then you can ask Larissa or Carolyn from the bioinformatics group to give you the appropriate pathname to the file if it is located on the server. We will also need to edit the "GBS.conf" file again to specify where the genome file is located.
Where should the reference genome go?
If you are using a genome that the AGILE team has assembled (i.e. lentil), we may have it on the server already. You can use the command ls /storage/bin/Applications/GBSpipeline-master/Genomes to see if the genome you want is already available. If it is not, you will need to transfer it to the server on your own. When you are transferring your genome file make sure you place the file in the same directory as all of the other GBS files we have generated so far. This will make editing the "GBS.conf" file very simple. You can use the cp (copy) command if you are copying the genome from the server to your own storage location. If you have misplaced the genome file, try to find it with the ls and cd commands. It is good practice to look for and deal with misplaced files on the server! If you have found it, use the mv command to put it in the right place.
Editing the GBS.conf file:
Once you have moved your genome of choice into the appropriate place we can specify the filename in the "GBS.conf" file. Once again on your local machine open up the "GBS.conf" file that you have hopefully saved somewhere. I should also mention that if you have had to recopy the GBS.conf file you will most likely have to re-enter some fields that are required. Otherwise, when you open up the "GBS.conf" file, the field we need to fill in should be close to the top. It is the reference genome field in the section labelled "REQUIRED FOR MULTIPLE STEPS." Beside the field that says REFERENCE, you can enter the name of your genome fasta file. If you are using a genome that is provided on the server you will need to use the full path name, in this case it would be: /storage/bin/Applications/GBSpipeline-master/Genomes/TheGenomeYouWant
You should still see the other fields that you have filled in so far still sitting there as well. If you do not see them you should re-enter them with the same specifications you had entered earlier. If they are not labelled the same way as before, then some things might be over-written or files might be misplaced so it is best to keep things simple and the same. Once you have edited your "GBS.conf" file appropriately you should be ready to start the next function!
Calling the Align program:
As described in the Friendly Server Usage tutorial, if you know the servers are clear and you are ready to run the program you should set up a screen session to run it in. Once you have a screen open and you are in the correct directory with ALL of your GBS files, it is time to run the command. Enter the following command into your terminal:
/storage/bin/Applications/GBSpipeline/GBS_pipeline.pl align
This should get the program running on the server. Aligning reads to a genome can take an incredibly long time (upwards of 3-4 days) depending on the size of your genome, the amount of reads or the number of samples you have. Be patient! There is only one more step after this so hold on tight! When you run the program you should see a process bar and some text pop up just like the programs before this one. Feel free to take a peek at the progress bar every once in a while if you so desire. Otherwise you can exit the screen session and continue on with other work while you wait. Don't forget to take a quick peek at the summary file from the align step as well! This file can give excellent insight on the alignment of reads, uniqueness of reads and much more information that can be useful to you. Also if you are interested, you can monitor your own server usage while your program is running! Using the same htop command you should be able to see 20 processes running on 20 different processors. This is one way to check and see if your command has actually worked. That is all for this stage of the pipeline so settle in for the wait.
The tutorial for SNP calling can be found here: http://knowpulse.usask.ca/portal/node/1772193