Quantitative Modeling of
Signaling Interactions in the Breast Tumor Microenvironment
Specific Aims
Aim 1: Identify sets of stromal-epithelial interactions that suggest candidate hypotheses for the causal mechanisms through which stroma expression levels can affect tumor functions.
Aim 2: Identify modules in the stromal-epithelial signaling network that are conserved between mouse and human, supporting the assignment of degrees of confidence to predictions regarding human disease made based on mouse models.
Aim 3: Integrate candidate causal mechanisms through which stroma expression levels can affect tumor functions into one or more stochastic models and use these to develop statistical measures that can distinguish or reject these alternative hypotheses based on experimental data that can be feasibly obtained.
Data I was given
I was given mouse
mammary transcriptome data. Breast tissue had been collected from 8 week old
virgin Fvb/N mice, under four conditions: either wild type ErbB2 or mutant
ErbB2, combined with either normal PTEN or conditional knockout of PTEN in the
fibroblasts only. This tissue had been
separated into four cell types: epithelial cells, fibroblasts, macrophages, and
endothelial cells. Then microarrays were used to probe the expression levels of
17548 genes. Among different probes for the same gene, the one with the maximum
value had been chosen as the representative entry in the data that I have. If different probes reflected different
isoforms of the same gene, this information has been lost in the file I have.
I was given
mouse mammary fibroblast secretome data.
Mouse mammary fibroblasts, with either normal PTEN or PTEN knockout,
were cultured. The medium was collected and proteomics were run to determine
which proteins were present. I was given the names of 67 genes whose protein
products were secreted in both the normal and PTEN knockout conditions, and 54
other genes whose protein products were detected only in the PTEN knockout
condition.
These are the two datasets to be used for Aim 1.
These are the two datasets to be used for Aim 1.
I was given
gene expression data from tumor epithelial cells and from matched adjacent
stromal cells from 123 human breast cancer patients. In this case the expression levels were
obtained by averaging the microarray probes in those cases where there were
multiple probes (as I interpret the note: “Technical replicates averaged. There
are multiple probes for some genes.”)
I was given exome
data from 21 of the human breast cancer patients, of which 12 had matched tumor
and normal samples and 9 had only tumor samples.
These are the
two datasets to be used in conjunction with the previous two, for Aim 2. The results of Aims 1 and 2 are to be
integrated for Aim 3.
Most of the
transcripts in the mouse transcriptome data have protein products and some are
noncoding RNAs. The same is true of the patient transcriptome data.
What I have done
I have
mapped the gene names used in the mouse transcriptome data to currently used
gene names. I resolved most of these by
querying MGI. I resolved some by following up on deprecated Ensembl transcript
ids, resolving genes with two possible current symbols using the chromosomal
location in the data file, following the
history of changed Entrez gene ids, looking on Vesiclepedia, and using the
Ensembl transcript ids. I mapped most of the genes to Ensembl gene ids and
Uniprot accessions. The gene Hist2h2aa1 appeared twice in the transcriptome
data files, on the + and – strands of chr3.
I edited my copy of the data files to reflect that Hist2h2aa1 is on the
+ strand and Hist2h2aa2 is on the – strand.
I need to similarly update the nomenclature for the human genes in the McGill transcriptome data.
I extracted the human orthologs of the mouse genes from the Ensembl ortholog files, as Ensembl protein ids for proteins and Ensembl transcript ids for genes.
I need to similarly update the nomenclature for the human genes in the McGill transcriptome data.
I extracted the human orthologs of the mouse genes from the Ensembl ortholog files, as Ensembl protein ids for proteins and Ensembl transcript ids for genes.
I ran the
first step of the exome analysis pipeline I was given, FastQC, on the exome
data. I looked at the results of FastQC (although this did not seem to be part
of the protocol) and found that some sort of unidentified adapters had been
left in the data. Upon reviewing the
literature I chose PEAT (Paired End Adapter Trimming) for trimming adapters as
it requires no a priori adapter sequence.
FastQC also indicated quality dropoff at the ends of the reads. I used Sickle to trim the low-quality bases.
I looked over
the steps in the exome analysis pipeline I was given and noticed that some of
them seemed to use outdated methods. For
instance, bwa aln was included in the pipeline, whereas bwa mem has been
recommended for read lengths longer than 70 base pairs for several years. Furthermore,
disk space was at a premium on the OSC cluster. Modern next-generation
sequencing pipelines pipe intermediate results in memory from one process to
the next for several steps, saving on intermediate steps. I switched to the Blue Collar Bioinformatics
suite, developed at the Harvard School of Public Health and now widely used for
many bioinformatics tasks and began running it. This suite includes a standard
pipeline for variant calling of matched tumor-normal samples. As given, this
requires running one job for each sample that pools all lanes of the tumor
sequence and all lanes of the normal sequence.
This seems to require tinkering with the resource allocation when
submitting the job to OSC, such that the job is not killed in the middle of
alignment, yet the resource allocation is not so large that the scheduler never
schedules it. Another alternative is to downsample each tumor and normal exome
sample. I asked Xing Tang for help with all this when she has time. The Blue Collar Bioinformatics blog also
describes a suggested analysis for tumor-only samples.
Tuncbag et al
developed a method “Simultaneous Reconstruction of Signaling Pathways Using
Prize-Collecting Steiner Forests” that, given a set of proteins thought to be
involved in a signaling process within a cell, uses biological network data to
hypothesize what other proteins might be involved in this process. They use a message-passing algorithm from
Bayesian networks. I developed a novel
plate model (plate models are used in probabilistic graphical models) to extend
this from cell-autonomous processes to intercellular processes, in order to span
intercellularly between cells of different types (e.g., epithelial cells and
fibroblasts) as well as in the extracellular space.
I have
downloaded the msgsteiner and OmicsIntegrator software from the Tuncbag et al
paper. I am modifying it to work intercellularly using my plate model. Also, it comes with generic protein-protein
interaction data. I will incorporate the ligand-receptor database used by Fuhai
Li in his CCCExplorer, and also the dbPTM database of post-translational
modifications of proteins, such as phosphorylation. Their method uses epigenome data to constrain
the possible transcription factors that might be hypothesized. I have found epigenome
data from mouse mammary luminal cells of virgin Fvb/N mice, and mouse dermal
fibroblast cells of Balb/c mice.
Once I have
incorporated the dbPTM database and the Li ligand-receptor database, I will try
the OmicsIntegrator method without the plate model on the fibroblast secretome
data alone.
I attended as
a guest several sessions of the Pathology of Inflammation class taught by Traci
Wilgus. One key thing I learned is that
during wound healing, neutrophils that traffick to the wound don’t necessarily
all apoptose there as had previously been thought. In zebrafish, they have been
visualized trafficking back and forth repeatedly, to the wound and back out to
the blood vessel. Neutrophils have also
been imaged tethering one to another, i.e., a neutrophil grabs onto another
neutrophil. The class described the
several steps required for inflammatory cells to traffick from a blood vessel
to the wounded tissue: rolling with weak binding to the vessel wall via
selectins, activation with stronger binding to the vessel wall by integrins;
firm adhesion and crawling along the vessel wall, and transmigration through
the tissue along a chemokine gradient.
These steps require multiple specializations. I formulated a hypothesis,
which Wilgus thought novel, that the reverse trafficking of neutrophils back
out from the wound to the blood vessel may be in order to go fetch stem cells
from elsewhere. In this way the stem
cells would not require all the specializations that neutrophils have in order
to extravasate themselves; the neutrophils could carry them. If this were correct, and if macrophages do
the same thing, then the Tumor Microenvironment of Metastasis (TMEM) might be a
co-option of a similar process in
macrophages. In this case, understanding the gradients that lead the inflammatory
cells back to the wound carrying the stem cells could help illuminate the
reasons why metastasis occurs to particular sites, and what local and systemic
factors might affect when it occurs.
I presented a poster at the T2C Wound Healing conference at OSU, constituting a literature review of wound healing processes that might be related to metastatic and post-metastatic cancer. I found one novel connection.
I presented a poster at the T2C Wound Healing conference at OSU, constituting a literature review of wound healing processes that might be related to metastatic and post-metastatic cancer. I found one novel connection.
I learned
from Lisa Christian that the Experience Sampling Method is used in many studies
to track measurements of cytokine levels in the bloodstream of individual
patients, longitudinally or under different conditions, with clinical
variables. Data from such studies might be integrated into the model to be
developed in Aim 3, for instance to select the most useful molecules or
biomarkers to include in order to incorporate systemic factors affecting the
tumor microenvironment.