Follow up on Nov 2015 Webinar: Variant Effect Prediction using the standalone perl scripts

In  November 2015 Gramene Webinar, Matt Geniza, (a graduate student, Jaiswal Lab at Oregon State University) presented how to use standalone Perl script and command line protocols for Variant Effect Predictor (VEP) analysis. This option is for advanced users who are familiar with the Unix/Linux command line. Also, users are expected to have a SNP DATA file before proceeding for this exercise. For more information please visit the VEP homepage.

For this tutorial, we used SNP data generated in the Jaiswal laboratory at Oregon State University from two cultivars of diploid wheat Triticum monococcum: i) DV92: cultivated spring wheat and ii) G3116: wild winter wheat. For details see Fox SE, et al. 2014. To generate SNP data file in VCF format, transcripts reads were aligned to the “A” chromosomes of the Triticum aestivum genome (IWSGC1.0) and VarScan 2 program (Koboldt et al., 2012) was used to call SNPs and indels – output.

The full webinar recording is available on Gramene’s Youtube Channel.

For benefit of all users, we are providing the answer to a question raised by one of the participant here:

How to Use Gramene BioMart to obtain functional information on genes with predicted consequences

change to the variant_effect_predictor directory

~]$ cd ensembl-tools-release-82/scripts/variant_effect_predictor/

~/variant_effect_predictor]$ ls

convert_cache.pl*  example_GRCh37.vcf  example_GRCh38.vcf  filter_vep.pl*  gtf2vep.pl*  INSTALL.pl*  README.txt  t/  variant_effect_predictor.pl*

to create a .vep cache using your own annotations in .gtf format use gtf2vep.pl* 

$perl gtf2vep.pl -i <input.gtf> -f <ref_genome_that_gtf_was_aligned_to.FASTA> -s <cache_name> -d <database_number>

cache will be installed in ~/.vep

Example:

$perl gtf2vep.pl -i Chinese_spring_stress_experiment.gtf -f Triticum_aestivum_genome.fa -s chinese_spring_stress -d 1

2015-11-17 17:00:50 - Checking/creating FASTA index

2015-11-17 17:01:02 - Processing chromosome 1A

2015-11-17 17:01:02 - Processing chromosome 7A

2015-11-17 17:01:02 - Processing chromosome 2A

2015-11-17 17:01:02 - Processing chromosome 5A

2015-11-17 17:01:02 - Processing chromosome 4A

2015-11-17 17:01:02 - Processing chromosome 3A

2015-11-17 17:01:02 - Processing chromosome 6A

2015-11-17 17:13:44 - All done!

when the custom cache is created, it will be installed in ~/.vep

Example

/.vep]$ ls

chinese_spring_stress/

the cache name will have the database number you have in the script

/.vep]$ ls chinese_spring_stress/

1/

The custom cache can be called by using offline mode and specifying the cache name and version number

$ perl variant_effect_predictor.pl --offline --dir_cache ~/.vep

--species chinese_spring_stress  --cache_version 1 -i Chinese_spring_variants.vcf -o Chinese_spring_variant_effect_output