CDF plots and mislabeled samples Cumulative distribution plots, not to be confused with probability distributions, are a common method for showing the effect of a miRNA on gene expression. The reason for this is although one predicted target for a miRNA may or may not change upon addition or depletion of the miRNA, a group of such targets will show an effect. I performed a lot of work on small RNAs so I was interested in being able to replicate CDF plots in different publications and generating them for my own work. Matplotlib does not have a CDF function, so I wrote my own. Here is an example of an image my code will produce: The code is available here. An interesting application of CDF plots that I have not seen people use is in RNA interference experiments with microarrays. When performing these experiments there are off-target effects of the interfering RNA, but these effects can be predicted based on the seed region of the RNA. To account for this what is currently done is two different targeting RNAs are used, and the genes whose expression is consistently altered in these experiments are seen as being regulated by the targeted gene. And this works fine, although I do wonder if we can't come up with a more advanced algorithm that takes into account the off-target effects. An interesting application of the off-target effects of RNA interference is to identify if your core facility mislabeled your samples. You might say "but I can just check if the targeted gene decreased in the correct samples", which is true, but what if your gene isn't on the chip? Also, if you are using multiple targeting RNAs you would expect your targeted gene to decrease in all of them, although I suppose you don't mind if these samples are mislabeled, you are mostly concerned with the control expriment. I had a situation where we targeted a lncRNA which was not on the CHIP, so there was not a simple way to check if the samples were mislabeled. I'm very careful when it comes to data analyses, so the first thing I wanted to be sure of was that the samples were what I thought they were. Using the sequence of the targeting RNAs I was able to identify the predicted targets and observed large shifts in the CDF plots, confirming that the samples were correctly labeled. I'm not an expert in RNA interference experiments, but this should probably be step number one.