Internship Systems Biology solutions to Malaria: mei 2014

donderdag 29 mei 2014

Week 5&6

In this week I took a look at the pathways and I did not find anything special to mention. Also there were not enough genes that followed the criterion from ( [logFC] < -0.585 OR [logFC] > 0.585) AND [P.Value] < 0.0 so I change it in ( [logFC] < -0.263 OR [logFC] > 0.263) AND [P.Value] < 0.0

The folowing tables represend the ranking with the new mentioned criterion.

Z-score Experimental VS baseline

Z-score Acute VS treated

The positive gene that followed the criterion rose, and also the ranking differs from the previous criterion (these tables are shown in the previous blog

More inflamation pathways are shown like Nf-kb, IL pathways, toll like receptors pathways

Quercetin and Nf-Kb/AP-1 induced cell appoptosis. The highest ranked pathway in the acute treated group.

It is expected that afther treatment genes in inflamation pathway will be downregulated. So I wanted to take a look at gene with an high (low) FC and a significant p-value, that were not found in pathvisio.

--> TREATED VS ACUUT

Table group acuut VS treated_genes with the lowest logFC

Table group acuut VS treated_genes with the highest logFC

Table group acuut VS treated_genes with the lowest p-value

--> EXPERIMENTAL VS BASELINE

Table group experimental VS baseline_ genes with the lowest logFC

Tab;e group experimental VS baseline_genes with the highest log FC

Table group experimental VS baseline_genes with the moest significantly p-value

maandag 19 mei 2014

Week 4 (14-18 May 2014)

At the end of the week I use one of the outcomming table that was also provide by R with the enssemble ID of all genes that were found and change pointed against the logFC, Fold Change, Average Expresssion, T value (outcome from the statistical T-test), P-value, adjusted P-value and the B value for the analysis in PathVisio, this file has to put in for the import of the expression import.
In PathVisio I select the human gene database HS_Derby_20130701.bridge. I also import the wikipathways Homo sapiens Curation-Tutorial (a gpml file).
In pathvisio I can created a visualization. I wanted to show the up- and down regulated geneexpression and the p-value.
At last I did an statistical test for all the pathway. I wanted the pathways to be ranked following the criteria ([logFC]>0.585 OR [logFC]<-0.585) AND [P-value]<0.05.

For the group where the treated group is compared with the group with acute malaria, the first then pathways that follows the criteria are shown.

Pathway	positive (r)	measured (n)	total	%	Z Score	p-value (permuted)
RB in Cancer	9	92	104	9,78%	6,02	0,001
Neurotransmitter uptake and Metabolism In Glial Cells	1	2	13	50,00%	5,28	0
Transport of Glycerol from Adipocytes to the Liver by Aquaporins	1	2	7	50,00%	5,28	0,003
Activation of Chaperone Genes by ATF6-alpha	2	8	16	25,00%	5,1	0,005
Signal amplification	2	11	56	18,18%	4,23	0,003
Thrombin signalling through proteinase activated receptors (PARs)	2	13	53	15,38%	3,82	0,005
Activation of Matrix Metalloproteinases	2	15	66	13,33%	3,48	0,009
Adipogenesis	7	122	132	5,74%	3,47	0,008
FAS pathway and Stress induction of HSP regulation	3	35	43	8,57%	3,15	0,022
miR-targeted genes in leukocytes - TarBase	6	108	128	5,56%	3,11	0,006

For the group where the experimental effected group is compared with the baseline group, the first then pathways that follows the criteria are shown.

Pathway	positive (r)	measured (n)	total	%	Z Score	p-value (permuted)
Type II interferon signaling (IFNG)	5	35	38	14,29%	10,4	0
RIG-I/MDA5 mediated induction of IFN-alpha/beta pathways	4	48	181	8,33%	6,88	0
Heme Biosynthesis	1	8	28	12,50%	4,31	0,013
NOD pathway	2	30	43	6,67%	4,26	0,004
Serotonin Transporter Activity	1	9	15	11,11%	4,04	0,027
Interferon alpha/beta signaling	2	34	96	5,88%	3,95	0,017
Regulation of toll-like receptor signaling pathway	4	120	152	3,33%	3,85	0,004
Apoptosis	3	80	85	3,75%	3,62	0,009
Quercetin and Nf-kB/ AP-1 induced cell apoptosis	1	11	25	9,09%	3,61	0,03
TAK1 activates NFkB by phosphorylation and activation of IKKs complex	1	11	30	9,09%	3,61	0,035

***
positive (r) -- the number of genes on the pathway that fulfill the criterion
meassured (n) -- the number of genes on the pathway that have been measured in the data set
total -- the total number of genes on the pathway
% -- the percentage of measured genes that fulfill the criterion
z-score -- the z-score as computed by a fisher exact test on overrepresentation
p-value (permuted) -- the change

Next week I planned to take a better look at these pathways, and compare these two groups (differences and comparisons), to try to link this with biological reasons.
And to take a look at the gene with a high FC and a significant p-value that is not founded by PathVisio

week 4 (12-13 May 2014)

In the fisrt period of this week Lars help me writiing the script for the statistical analysis of the data and we analyse if the figures were oke to use for futher statistical processes. This was done with the R project for statistical computing. This program made use of limma, bioDist, gplots and some other functions to make the outcomming plots for the statistical modeling.

In the first experiment I wanted to compare the effect of the treatment in malaria patient in comparison with the not treated patient with acute malaria. The groups were unpaired groups.

P-value histogram frequency, This is not what we wanted, since we want more change with a significantly and less change in the futher gene expression

Adapted fold change diagram

In the second experiment I wanted to test what the change in geneexpression in PBMC is in people that experimental got infected with malaria (presymtomatic effects) compared with the baseline group. The groups were paired.

P-value frequency histogram, this one look much more as what we want in an good experiment (in comparison with experiment 1).

Adapted fold change diagram

maandag 12 mei 2014

Week 3 (5 -11 May 2014)

At the start of this week I run the array analysis from the two different groups separately. This gave already a much better quality rapport which could be used for further analysis.

The first dataset (baseline VS experimental group) was not yet completely good. Some outliners should still be erase from the dataset.

In this NUSE the data of the baseline and the experimental group is shown. There are still lots of data which are far from the median

Even though in the NUSE plot, there is a clearly marge in the PCA1/2 (first PCA plot). ALso in PCA2/3 (second PCA plot) a marge can be seen. This all afther normalization. Still lots of outliners.

In this cluster dendogram the different group are cluster togeheter, this afther normalization. Still some outliner which we might consider to exclude.

From the second dataset (acute malaria VS treated malaria) only one subjected, treated 9, was a outliner. So also this dataset I had to run without this subject. The only disadvantage would be that there is no baseline for this group.

The hybridization controls intensities and cells were OK

In the NUSE plot still lots of outliners can be seen.

No clearly marge can be seen in the PCA plot, the ouliner might have affect on this

At the end of this week I came together with Lars and his students and we discuss our dataset, rapport and the statistical analysis. Lars would help everyone with writing a script for the statistical analyses. He wrote it in the weekend of this week and at the start of next week we will discuss the scripts.

This week I also joined the first science caffé, where a lecture about statistical analysis was given.

week 2 (28 April-4 May 2014)

At the begin of this week I started to make a good summary of the publication of my chosen dataset from which I run the quality rapport with array analysis.

I also showed my quality rapport that week together with Egon Willighagen, Bart Smeets and discuss my founded quality rapport of the array analysis. After the discussion it could be concluded that we better could delete one of the baseline subject because this one was an outliner in all the founded tables and graphs. This can also have effect on the other results (subjects). So I run the analysis again without subject B11. The quality rapport was better, but still not what we wanted.

The green dot on the botom is Baseline 11

The green one at the top is Baseline 11

It can be seen in these MA plots that the expression of the array in Baseline 11 clearly differs from the other Baseline. As example taken Baseline 10, 12 and 13, but the rest was comparable.

Also in in de correlation data, baseline 11 was clearly different.

Later that week Lars Eijssen took a look at the rapport and further discuss it with Egon Willighagen, Bart Smeets and some other members of the department of bioinformatics at the Maastricht University. Lars advise me to split my dataset in two different groups. Because the subjects of this research came from different countries (VS and Africa) and because they also had two different project.

In this case the first dataset would contain a group from the VS that first was the baseline group and then got infected with Malaria and became the experimental group from which the pre-symptomatic effect of Malaria where compare against the baseline.

The second group would consist only subjects from Africa from which acute malaria was compared against a group with treated malaria.

At the end of this week Lars Eijsen gave a statistical lecture which all interns could (and others) from department bioinformatics could join.

maandag 5 mei 2014

Week 1 (21-27 April 2014)

In the first week (21-27 April) I started with my internship at the department of bioinformatics at Maastricht University. The project is from Egon Willighagen and it is about searching a new solution to cure Malaria by using systems biology. I will try to use a systems biology approach to analyse and to find a target which plays a role in the development of Malaria.

In ArrayExpress I search for dataset in which malaria was compared between healthy persons. From all the founded dataset I made a table. Finally I chose for a known dataset which I will further use in which they tested the changes in gene expression in people and with and without Malaria. In the subjects with Malaria they also tested the differences between pre-symptomatic and symptomatic changes in the disease. And thereafter I will read some articles of people which did some research in the field of the effects of Malaria and a cure for Malaria and try to find some inks between them. And if possible even find a new cure to fight Malaria.

In this first week I have read some articles which I could use for my project.

In one article the author compared four different groups (a baseline, a group which they gave malaria, a group which has acute malaria, and a last group which is treated from malaria). Because I though this article might be useful for my project I run the dataset in arrayanalysis.org at the end of this week and planned to take a look at the results next week.

Publication: Christian F. Ockenhouse, Wan-CHung Hu, Kent E. Kester, James F. Cummings, Ann Stewart, D. Gray Heppner, ANne E. Jedlicka, Alan L. Scott, Nathan D. Wolfe, Maryanne Vahey and Donald S. Burke -- Common and Divergent Immune Response Signaling Pathways Discovered in Peripheral Blood Mononuclear Cell Gene Expression Patterns in Presymptomatic and Clinically Apparent Malaria -- Infect. Immun. 2006, 74(10): 5561. DOI: 10.1128/IAI.00408-06.