The Genomic Diversity and Phenotype Data Model (GDPDM) Database Schema Version 4.0

Genome sequencing continues to become less expensive and faster which is great news for researchers; however, manipulating and storing vast amounts of sequence data has become a challenge. Version 4.0 of the Genomic Diversity and Phenotype Data Model (GDPDM) database schema has a solution. Instead of storing individual SNP values, the 4.x version stores long haplotypes as Binary Large OBjects (BLOBs). Each SNP value only requires four bits of memory. Additionally, BLOBs are used to store physical positions and indel lengths. Version 4.1 added another BLOB to store SNP id values. And version 4.2 adds tables for defining association studies and storing calculated values (i.e., p-values). For more details, diagrams, SQL scripts, and such, please see the GDPDM homepage.