Introduction¶
The gpmap package standardizes a data structure for genotype-phenotype (GP) maps. Subset, manipulate, extend, etc. genotype-phenotype maps easily. Calculate statistics, model evolutionary trajectories, predict phenotypes. Efficient memory usage and manipulation, using Pandas Dataframe/Series.
This package includes modules for simulating computational genotype-phenotype maps using methods described in the literature. See the Simulating page.
The GenotypePhenotypeMap object can be easily ported to network graphs (via NetworkX and GPGraph).

GenotypePhenotypeMap¶
The GenotypePhenotypeMap
class is main entry-point to the gpmap
package.
It offers intuitive and useful methods and attributes
for analyzing genotype-phenotype data. We’ve also created a number of other packages that
easily interact with the GenotypePhenotypeMap
.
Example¶
from gpmap import GenotypePhenotypeMap
# Create list of genotypes and phenotypes
wildtype = "AA"
genotypes = ["AA", "AV", "AM", "VA", "VV", "VM"]
phenotypes = [1.0, 1.1, 1.4, 1.5, 2.0, 3.0]
# Create GenotypePhenotypeMap object
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)
BinaryMap¶
All GenotypePhenotypeMap
objects append a BinaryMap
instance to a the binary
attribute. The BinaryMap
class creates a binary representation of all genotypes and maps
them to a genotype-phenotype map. Most attributes in the GenotypePhenotypeMap
also exist in
the binary
object, updated with the binary genotype representations.
Simulating¶
The GPMap package comes with a suite of objects to simulate genotype-phenotype
maps following models in the literature. They are found in the gpmap.simulate
module.
All Simulation objects inherit the GenotypePhenotypeMap
object as their base
class. Thus, anything you can do with a GenotypePhenotypeMap, you can do with the
simulation objects.
NK model¶
Construct a genotype-phenotype map using Kauffman’s NK Model. [1] The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotypes.
For example, imagine an NK simulation with \(N=5\) and \(K=2\). To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table: “01”, “10”, “01”, “11”, “10”. Sum their values together.
# import the NKSimulation class
from gpmap.simulate import NKSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = NKSimulation.from_length(6, K=3)
House of Cards model¶
Construct a ‘House of Cards’ fitness landscape. This is a limit of the NK model where \(K=N\). It represents a fitness landscape with maximum roughness.
# import the HouseOfCardsSimulation class
from gpmap.simulate import HouseOfCardsSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = HouseOfCardsSimulation.from_length(6)
Mount Fuji model¶
Construct a genotype-phenotype map from a Mount Fuji model. [2]
A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength.
Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random roughness value to each genotype.
where \(\nu\) is the roughness parameter, \(c\) is the field strength, and \(d\) is the hamming distance between genotype \(g\) and the reference genotype.
# import the HouseOfCardsSimulation class
from gpmap.simulate import MountFujiSimulation
# Create an instance of the model. Using `from_length` makes this easy.
gpm = MountFujiSimulation.from_length(6)
# add roughness, sampling from a range of values.
gpm.set_roughness(range=(-1,1))
References¶
[1] | Kauffman, Stuart A., and Edward D. Weinberger. “The NK model of rugged fitness landscapes and its application to maturation of the immune response.” Journal of theoretical biology 141.2 (1989): 211-245. |
[2] | Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005. |
Read/Write¶
The GenotypePhenotypeMap
object is really just a container of Pandas Series that
can be easily read/written as a DataFrame. Any tabular format (i.e. Excel files,
csv, tsv, ...) can be loaded directly into the object. It requires two columns
for genotypes and phenotypes, and optionally takes stdeviations and n_replicates as input.
read_excel¶
Excel files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: Excel spreadsheet file (“data.xlsx”)
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the spreadsheet directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")
read_csv¶
CSV files are supported through the read_excel
method. This method requires
genotypes and phenotypes columns, and can include n_replicates and
stdeviations as optional columns. All other columns are ignored.
Example: CSV File
genotypes | phenotypes | stdeviations | n_replicates | |
---|---|---|---|---|
0 | PTEE | 0.243937 | 0.013269 | 1 |
1 | PTEY | 0.657831 | 0.055803 | 1 |
2 | PTFE | 0.104741 | 0.013471 | 1 |
3 | PTFY | 0.683304 | 0.081887 | 1 |
4 | PIEE | 0.774680 | 0.069631 | 1 |
5 | PIEY | 0.975995 | 0.059985 | 1 |
6 | PIFE | 0.500215 | 0.098893 | 1 |
7 | PIFY | 0.501697 | 0.025082 | 1 |
8 | RTEE | 0.233230 | 0.052265 | 1 |
9 | RTEY | 0.057961 | 0.036845 | 1 |
10 | RTFE | 0.365238 | 0.050948 | 1 |
11 | RTFY | 0.891505 | 0.033239 | 1 |
12 | RIEE | 0.156193 | 0.085638 | 1 |
13 | RIEY | 0.837269 | 0.070373 | 1 |
14 | RIFE | 0.599639 | 0.050125 | 1 |
15 | RIFY | 0.277137 | 0.072571 | 1 |
Read the csv directly into the GenotypePhenotypeMap.
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")
read_json¶
The only keys recognized by the json reader are:
- genotypes
- phenotypes
- stdeviations
- mutations
- n_replicates
- log_transform
All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.
{
"genotypes" : [
'000',
'001',
'010',
'011',
'100',
'101',
'110',
'111'
],
"phenotypes" : [
0.62344582,
0.87943151,
-0.11075798,
-0.59754471,
1.4314798,
1.12551439,
1.04859722,
-0.27145593
],
"stdeviations" : [
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
0.01,
],
"mutations" : {
0 : ["0", "1"],
1 : ["0", "1"],
2 : ["0", "1"],
}
"n_replicates" : 12,
"log_transform" : false,
"title" : "my data",
"description" : "a really hard experiment"
}