maandag 19 mei 2014

Week 4 (14-18 May 2014)

At the end of the week I use one of the outcomming table that was also provide by R with the enssemble ID of all genes that were found and change pointed against the logFC, Fold Change, Average Expresssion, T value (outcome from the statistical T-test), P-value, adjusted P-value and the B value for the analysis in PathVisio, this file has to put in for the import of the expression import.
In PathVisio I select the human gene database HS_Derby_20130701.bridge. I also import the wikipathways Homo sapiens Curation-Tutorial (a gpml file).
In pathvisio I can created a visualization. I wanted to show the up- and down regulated geneexpression and the p-value.
At last I did an statistical test for all the pathway. I wanted the pathways to be ranked following the criteria ([logFC]>0.585 OR [logFC]<-0.585) AND [P-value]<0.05.

For the group where the treated group is compared with the group with acute malaria, the first then pathways that follows the criteria are shown.
Pathwaypositive (r)measured (n)total%Z Scorep-value (permuted)
RB in Cancer9921049,78%6,020,001
Neurotransmitter uptake and Metabolism In Glial Cells121350,00%5,280
Transport of Glycerol from Adipocytes to the Liver by Aquaporins12750,00%5,280,003
Activation of Chaperone Genes by ATF6-alpha281625,00%5,10,005
Signal amplification2115618,18%4,230,003
Thrombin signalling through proteinase activated receptors (PARs)2135315,38%3,820,005
Activation of Matrix Metalloproteinases2156613,33%3,480,009
FAS pathway and Stress induction of HSP regulation335438,57%3,150,022
miR-targeted genes in leukocytes - TarBase61081285,56%3,110,006


For the group where the experimental effected group is compared with the baseline group, the first then pathways that follows the criteria are shown.

Pathwaypositive (r)measured (n)total%Z Scorep-value (permuted)
Type II interferon signaling (IFNG)5353814,29%10,40
RIG-I/MDA5 mediated induction of IFN-alpha/beta pathways4481818,33%6,880
Heme Biosynthesis182812,50%4,310,013
NOD pathway230436,67%4,260,004
Serotonin Transporter Activity191511,11%4,040,027
Interferon alpha/beta signaling234965,88%3,950,017
Regulation of toll-like receptor signaling pathway41201523,33%3,850,004
Quercetin and Nf-kB/ AP-1 induced cell apoptosis111259,09%3,610,03
TAK1 activates NFkB by phosphorylation and activation of IKKs complex111309,09%3,610,035


positive (r) -- the number of genes on the pathway that fulfill the criterion
meassured (n) -- the number of genes on the pathway that have been measured in the data set
total -- the total number of genes on the pathway
% -- the percentage of measured genes that fulfill the criterion
z-score -- the z-score as computed by a fisher exact test on overrepresentation
p-value (permuted) -- the change

Next week I planned to take a better look at these pathways, and compare these two groups (differences and comparisons), to try to link this with biological reasons.
And to take a look at the gene with a high FC and a significant p-value that is not founded by PathVisio



2 opmerkingen:

  1. Please do put in links to actual data loaded pathways. We do say "never publish a list like this" during courses on pathway analysis for a reason. I understand that for blog posts you can post the list as part of an ongoing process, but the next step really is to look at the pathways themselves. Pathway statistics often does not make a lot of sense on its own. I can give you some reasons for that. Adding a bunch of non regulated genes to pathway for instance lowers its ranking without changing the biology, some pathways highly overlap and they may all show up for that reason while that points to only one sub process really. Also some individual steps in metabolism or regulation can often be done by different genes. Now having one active and differently regulated gene in one step where actually 10 could be active is not necessarily less relevant than having 1 out of 1 for the next one. Having 2 out of 10 could be less meaningful if the next step is not regulated at all. These lists thus are *just* a link to a further interpretation of the outcome.

  2. I was also wondering why you describe the wanted minimal Fold Change as: | logFC | >0.585 , what you really mean is a real change of 50% up or down, right? Did you try lower values? In large processes many small changes actually can make sense.