gpmap

The Pandas DataFrame for genotype-phenotype (GP) map data.

_images/gpm.png

The GenotypePhenotypeMap is a core object for a suite of packages written in the Harms Lab. It organizes and standardizes genotype-phenotype map data.

Basic Example

# Import the GenotypePhenotypeMap
from gpmap import GenotypePhenotypeMap

# The data
wildtype = 'AA'
genotypes = ['AA', 'AT', 'TA', 'TT']
phenotypes = [0.1, 0.5, 0.2, 0.8]
stdeviations = [0.05, 0.05, 0.05, 0.05]

# Initialize a GenotypePhenotype object
gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes,
                           stdeviations=stdeviations)

# Show the dataFrame
gpm.data
_images/basic-example-df.png

Documentation

Simulating genotype-phenotype maps

The GPMap package comes with a suite of objects to simulate genotype-phenotype maps following models in the literature. They are found in the gpmap.simulate module.

All Simulation objects inherit the GenotypePhenotypeMap object as their base class. Thus, anything you can do with a GenotypePhenotypeMap, you can do with the simulation objects.

NK landscape

Construct a genotype-phenotype map using Kauffman’s NK Model. [1] The NK fitness landscape is created using a table with binary, length-K, sub-sequences mapped to random values. All genotypes are binary with length N. The fitness of a genotype is constructed by summing the values of all sub-sequences that make up the genotype using a sliding window across the full genotypes.

For example, imagine an NK simulation with \(N=5\) and \(K=2\). To construct the fitness for the 01011 genotype, select the following sub-sequences from an NK table: “01”, “10”, “01”, “11”, “10”. Sum their values together.

# import the NKSimulation class
from gpmap.simulate import NKSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = NKSimulation.from_length(6, K=3)

House of Cards landscape

Construct a ‘House of Cards’ fitness landscape. This is a limit of the NK model where \(K=N\). It represents a fitness landscape with maximum roughness.

# import the HouseOfCardsSimulation class
from gpmap.simulate import HouseOfCardsSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = HouseOfCardsSimulation.from_length(6)

Mount Fuji landscape

Construct a genotype-phenotype map from a Mount Fuji model. [2]

A Mount Fuji sets a “global” fitness peak (max) on a single genotype in the space. The fitness goes down as a function of hamming distance away from this genotype, called a “fitness field”. The strength (or scale) of this field is linear and depends on the parameters field_strength.

Roughness can be added to the Mount Fuji model using a random roughness parameter. This assigns a random roughness value to each genotype.

\[f(g) = \nu (g) + c \cdot d(g_0, g)\]

where \(\nu\) is the roughness parameter, \(c\) is the field strength, and \(d\) is the hamming distance between genotype \(g\) and the reference genotype.

# import the HouseOfCardsSimulation class
from gpmap.simulate import MountFujiSimulation

# Create an instance of the model. Using `from_length` makes this easy.
gpm = MountFujiSimulation.from_length(6
    roughness_width=0.5,
    roughness_dist='normal'
)

References

[1]Kauffman, Stuart A., and Edward D. Weinberger. “The NK model of rugged fitness landscapes and its application to maturation of the immune response.” Journal of theoretical biology 141.2 (1989): 211-245.
[2]Szendro, Ivan G., et al. “Quantitative analyses of empirical fitness landscapes.” Journal of Statistical Mechanics: Theory and Experiment 2013.01 (2013): P01005.

Reading/Writing

The GenotypePhenotypeMap object is a Pandas DataFrame at its core. Most tabular formats (i.e. Excel files, csv, tsv, …) can be read/written.

Excel Spreadsheets

Excel files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: Excel spreadsheet file (“data.xlsx”)

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the spreadsheet directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_excel(wildtype="PTEE", filename="data.xlsx")

CSV File

CSV files are supported through the read_excel method. This method requires genotypes and phenotypes columns, and can include n_replicates and stdeviations as optional columns. All other columns are ignored.

Example: CSV File

genotypes phenotypes stdeviations n_replicates
0 PTEE 0.243937 0.013269 1
1 PTEY 0.657831 0.055803 1
2 PTFE 0.104741 0.013471 1
3 PTFY 0.683304 0.081887 1
4 PIEE 0.774680 0.069631 1
5 PIEY 0.975995 0.059985 1
6 PIFE 0.500215 0.098893 1
7 PIFY 0.501697 0.025082 1
8 RTEE 0.233230 0.052265 1
9 RTEY 0.057961 0.036845 1
10 RTFE 0.365238 0.050948 1
11 RTFY 0.891505 0.033239 1
12 RIEE 0.156193 0.085638 1
13 RIEY 0.837269 0.070373 1
14 RIFE 0.599639 0.050125 1
15 RIFY 0.277137 0.072571 1

Read the csv directly into the GenotypePhenotypeMap.

from gpmap import GenotypePhenotypeMap

gpm = GenotypePhenotypeMap.read_csv(wildtype="PTEE", filename="data.csv")

JSON Format

The only keys recognized by the json reader are:

  1. genotypes
  2. phenotypes
  3. stdeviations
  4. mutations
  5. n_replicates

All other keys are ignored in the epistasis models. You can keep other metadata stored in the JSON, but it won’t be appended to the epistasis model object.

{
    "genotypes" : [
        '000',
        '001',
        '010',
        '011',
        '100',
        '101',
        '110',
        '111'
    ],
    "phenotypes" : [
        0.62344582,
        0.87943151,
        -0.11075798,
        -0.59754471,
        1.4314798,
        1.12551439,
        1.04859722,
        -0.27145593
    ],
    "stdeviations" : [
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
        0.01,
    ],
    "mutations" : {
        0 : ["0", "1"],
        1 : ["0", "1"],
        2 : ["0", "1"],
    }
    "n_replicates" : 12,
    "title" : "my data",
    "description" : "a really hard experiment"
}

API Documentation

The GenotypePhenotypeMap is the main entry point to the gpmap package. Load in your data using the read methods attached to this object. The following subpackages include various objects to analyze this object.

Subpackages

gpmap.errors module
gpmap.sample module
gpmap.stats module
gpmap.utils module
gpmap.simulate
gpmap.simulate.base module
gpmap.simulate.fuji module
gpmap.simulate.hoc module
gpmap.simulate.nk module
Module contents

GenotypePhenotypeMap

Indices and tables