Yet another TCGA data portal? There are already TCGA data portals created by dozens of developers and untold grant money, so why is another one necessary, especially by someone without funding or web development experience? Well, I just wasn't happy with what was available. One of the main things that bothered me was the prevalence of p-hacking1 with cBioPortal (post by Jeff Leek on the topic). Yes, it's great that cBioPortal allows researchers to perform survival analyses, but I would constantly see people alter the expression cutoff until they got as low a p-value as possible. Does that really mean their gene is correlated to survival? And how does that p-value compare to other genes for that cancer? Yes, you can also p-hack with OncoLnc, and even more efficiently since OncoLnc allows a better splitting of patients than cBioPortal's Onco Query Language! But the great thing about OncoLnc is I don't only give p-values, I also give rank of the correlation in the search results. If you think someone might be exaggerating the significance of a correlation with OncoLnc, you can always go check what the rank of the Cox analysis for that gene is. With cBioPortal there was no way of knowing the strength of the correlation relative to other genes. And yes, I am contributor on papers where other authors used p-hacking for figures. What can I say? It just can't be helped, people love to make beautiful Kaplan-Meier plots for their publications. I am concerned that the primary use of OncoLnc will be for researchers to make beautiful Kaplan-Meier plots to throw into their publications and suggest their gene is involved in cancer. I don't really mind if it's a minor part of a larger argument for the gene's role in cancer, but if the paper is claiming a role in cancer simply because of a random Kaplan-Meier plot then that is concerning. Hopefully cancer researchers will check the rank with OncoLnc before doing studies on genes that only have Kaplan-Meier plots implicating them in cancer. Another thing about cBioPortal that bothers me is that Oncoprint uses z-scores instead of the raw expression values. So when you perform a survival analysis you have no idea how different the low expressing group is from the high expressing group. They could have nearly identical expression and you would have no idea. Or even worse, the gene you are looking at could have an expression of 0 in almost every sample and you will be happily sitting there making survival curves for a publication (hopefully just a PLOS ONE publication). I feel that if an online data portal is going to offer survival analyses it should offer Cox regression analyses, which is the standard method for survival correlations, and make it simple to download the data used in the analysis. As a bonus it would be nice if the gene names are updated, including miRNA names which allow for analyses of the 5p and 3p arms instead of just the stem-loop, and as a super bonus include some lncRNA data. And if the tool also made publication quality plots for users that would be a cherry on top. I come from the background of a medical student researcher studying noncoding RNAs, so this is my perspective. If you think OncoLnc is missing something essential for your work let me know, and maybe I'll be able to incorporate it into OncoLnc V2: bigger, faster, and prettier. And yes, I've thought about making OncoLnc open source. I don't have experience working on open source software but I have been in discussion with The Hyve about the possibility and also the possibility of including some of the data of OncoLnc into cBioPortal. 1. I may be using the term p-hacking a little loosely. You can think of going into a survival analysis with a defined hypothesis, for example, the top 25% of patients will show different survival than the bottom 25% of patients. If you then alter this hypothesis to improve the p-value without correcting for multiple calculations you can then be considered p-hacking.