gpmap
¶
The Pandas DataFrame for genotype-phenotype (GP) map data.

The GenotypePhenotypeMap
is a core object for a suite of packages written
in the Harms Lab. It organizes and standardizes genotype-phenotype map data.
Basic Example¶
# Import the GenotypePhenotypeMap
from gpmap import GenotypePhenotypeMap
# The data
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.5, 0.2, 0.8]
stdeviations = [0.05, 0.05, 0.05, 0.05]
# Initialize a GenotypePhenotype object
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes,
stdeviations=stdeviations)
# Show the dataFrame
gpm.data

Documentation¶
Simulating genotype-phenotype maps¶
The GPMap package comes with a suite of objects to simulate genotype-phenotype
maps following models in the literature. They are found in the gpmap.simulate
module.
All Simulation objects inherit the GenotypePhenotypeMap
object as their base
class. Thus, anything you can do with a GenotypePhenotypeMap, you can do with the
simulation objects.
NK landscape¶
Construct a genotype-phenotype map using Kauffman’s NK Model. [1] The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotypes.
For example, imagine an NK simulation with \(N=5\) and \(K=2\). To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table: “01”, “10”, “01”, “11”, “10”. Sum their values together.
# import the NKSimulation class
from gpmap.simulate import NKSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = NKSimulation.from_length(6, K=3)
House of Cards landscape¶
Construct a ‘House of Cards’ fitness landscape. This is a limit of the NK model where \(K=N\). It represents a fitness landscape with maximum roughness.
# import the HouseOfCardsSimulation class
from gpmap.simulate import HouseOfCardsSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = HouseOfCardsSimulation.from_length(6)
Mount Fuji landscape¶
Construct a genotype-phenotype map from a Mount Fuji model. [2]
A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength.
Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random roughness value to each genotype.
where \(\nu\) is the roughness parameter, \(c\) is the field strength, and \(d\) is the hamming distance between genotype \(g\) and the reference genotype.
# import the HouseOfCardsSimulation class
from gpmap.simulate import MountFujiSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = MountFujiSimulation.from_length(6
roughness_width=0.5,
roughness_dist='normal'
)
References¶
[1] | Kauffman, Stuart A., and Edward D. Weinberger. “The NK model of rugged fitness landscapes and its application to maturation of the immune response.” Journal of theoretical biology 141.2 (1989): 211-245. |
[2] | Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005. |
Reading/Writing¶
The GenotypePhenotypeMap
object is a Pandas DataFrame at its core. Most
tabular formats (i.e. Excel files, csv, tsv, …) can be read/written.
Excel Spreadsheets¶
Excel files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: Excel spreadsheet file (“data.xlsx”)
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the spreadsheet directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")
CSV File¶
CSV files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: CSV File
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the csv directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")
JSON Format¶
The only keys recognized by the json reader are:
- genotypes
- phenotypes
- stdeviations
- mutations
- n_replicates
All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.
{
"genotypes" : [
'000',
'001',
'010',
'011',
'100',
'101',
'110',
'111'
],
"phenotypes" : [
0.62344582,
0.87943151,
-0.11075798,
-0.59754471,
1.4314798,
1.12551439,
1.04859722,
-0.27145593
],
"stdeviations" : [
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
],
"mutations" : {
0 : ["0", "1"],
1 : ["0", "1"],
2 : ["0", "1"],
}
"n_replicates" : 12,
"title" : "my data",
"description" : "a really hard experiment"
}