Introduction

The gpmap package standardizes a data structure for genotype-phenotype (GP) maps. Subset, manipulate, extend, etc. genotype-phenotype maps easily. Calculate statistics, model evolutionary trajectories, predict phenotypes. Efficient memory usage and manipulation, using Pandas Dataframe/Series.

This package includes modules for simulating computational genotype-phenotype maps using methods described in the literature. See the Simulating page.

The GenotypePhenotypeMap object can be easily ported to network graphs (via NetworkX and GPGraph).

_images/gpm.png

GenotypePhenotypeMap

The GenotypePhenotypeMap class is main entry-point to the gpmap package. It offers intuitive and useful methods and attributes for analyzing genotype-phenotype data. We’ve also created a number of other packages that easily interact with the GenotypePhenotypeMap.

Example

from gpmap import GenotypePhenotypeMap

# Create list of genotypes and phenotypes
wildtype = "AA"
genotypes = ["AA", "AV", "AM", "VA", "VV", "VM"]
phenotypes = [1.0, 1.1, 1.4, 1.5, 2.0, 3.0]

# Create GenotypePhenotypeMap object
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)

BinaryMap

All GenotypePhenotypeMap objects append a BinaryMap instance to a the binary attribute. The BinaryMap class creates a binary representation of all genotypes and maps them to a genotype-phenotype map. Most attributes in the GenotypePhenotypeMap also exist in the binary object, updated with the binary genotype representations.

Simulating

The GPMap package comes with a suite of objects to simulate genotype-phenotype maps following models in the literature. They are found in the gpmap.simulate module.

All Simulation objects inherit the GenotypePhenotypeMap object as their base class. Thus, anything you can do with a GenotypePhenotypeMap, you can do with the simulation objects.

NK model

Construct a genotype-phenotype map using Kauffman’s NK Model. [1] The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotypes.

For example, imagine an NK simulation with \(N=5\) and \(K=2\). To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table: “01”, “10”, “01”, “11”, “10”. Sum their values together.

# import the NKSimulation class
from gpmap.simulate import NKSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = NKSimulation.from_length(6, K=3)

House of Cards model

Construct a ‘House of Cards’ fitness landscape. This is a limit of the NK model where \(K=N\). It represents a fitness landscape with maximum roughness.

# import the HouseOfCardsSimulation class
from gpmap.simulate import HouseOfCardsSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = HouseOfCardsSimulation.from_length(6)

Mount Fuji model

Construct a genotype-phenotype map from a Mount Fuji model. [2]

A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength.

Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random roughness value to each genotype.

\[f(g) = \nu (g) - c \cdot d(g_0, g)\]

where \(\nu\) is the roughness parameter, \(c\) is the field strength, and \(d\) is the hamming distance between genotype \(g\) and the reference genotype.

# import the HouseOfCardsSimulation class
from gpmap.simulate import MountFujiSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = MountFujiSimulation.from_length(6)

# add roughness, sampling from a range of values.
gpm.set_roughness(range=(-1,1))

References

[1]Kauffman, Stuart A., and Edward D. Weinberger. “The NK model of rugged fitness landscapes and its application to maturation of the immune response.” Journal of theoretical biology 141.2 (1989): 211-245.
[2]Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005.

Read/Write

The GenotypePhenotypeMap object is really just a container of Pandas Series that can be easily read/written as a DataFrame. Any tabular format (i.e. Excel files, csv, tsv, ...) can be loaded directly into the object. It requires two columns for genotypes and phenotypes, and optionally takes stdeviations and n_replicates as input.

read_excel

Excel files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: Excel spreadsheet file (“data.xlsx”)

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the spreadsheet directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")

read_csv

CSV files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: CSV File

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the csv directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")

read_json

The only keys recognized by the json reader are:

  1. genotypes
  2. phenotypes
  3. stdeviations
  4. mutations
  5. n_replicates
  6. log_transform

All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.

{
    "genotypes" : [
        '000',
        '001',
        '010',
        '011',
        '100',
        '101',
        '110',
        '111'
    ],
    "phenotypes" : [
        0.62344582,
        0.87943151,
        -0.11075798,
        -0.59754471,
        1.4314798,
        1.12551439,
        1.04859722,
        -0.27145593
    ],
    "stdeviations" : [
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
    ],
    "mutations" : {
        0 : ["0", "1"],
        1 : ["0", "1"],
        2 : ["0", "1"],
    }
    "n_replicates" : 12,
    "log_transform" : false,
    "title" : "my data",
    "description" : "a really hard experiment"
}

API Documentation

The GenotypePhenotypeMap is the main entry point to the gpmap package. Load in your data using the read methods attached to this object. The following subpackages include various objects to analyze this object.

Subpackages

gpmap.binary module

gpmap.errors module

gpmap.mapping module

gpmap.sample module

gpmap.stats module

gpmap.utils module

gpmap.simulate

gpmap.simulate.base module
gpmap.simulate.fuji module
gpmap.simulate.hoc module
gpmap.simulate.nk module
Module contents

GenotypePhenotypeMap

Indices and tables