Two Master of Science in Biomedical Informatics graduates discuss how developing a pipeline to study cancer genes for their capstone project generated results that laid the groundwork for important future work in a little understood area of cancer.
Abhi Kakuturu and Emma Clark, both recent graduates of the Master of Science in Biomedical Informatics (MScBMI) program, focused their capstone project on a little understood area of cancer involving overly active genes in the X chromosome. While both had previous experience working in cancer labs, the entirely data-based foundation of their project, along with the coding skills required to arrive at their solution, was a new and exciting challenge for them. What’s more, they both pursued a Master of Science in Biomedical Informatics precisely to learn more about these methods.
“I knew that I wanted to incorporate my biology background moving forward,” said Kakuturu, who is currently working as a data scientist before applying to medical school. “But so much of biology these days involves using and manipulating data that it seemed vital that I develop some of the important data science skill sets I didn’t get a chance to learn as an undergrad. That’s what brought me to the MScBMI program and the capstone project was an excellent way to bring everything I learned while there together.”
"So much of biology these days involves using and manipulating data that it seemed vital that I develop some of the important data science skill sets I didn’t get a chance to learn as an undergrad. That’s what brought me to the MScBMI program and the capstone project was an excellent way to bring everything I learned while there together.”
Studying Atypical Results in Gene Production
Their project focused on a specific mechanism called X-chromosome inactivation (XCI) that serves to regulate the expression of X chromosomes. Easier to study in female genes, which are made up of two X chromosomes (as opposed to male genes made up of an X and a Y chromosome), the XCI mechanism becomes disrupted at times leading to what are known as “escape genes” that subvert inactivity and allow for continued expression. While a certain percentage of genes are known to escape in healthy individuals, there are also known instances where escape genes lead to abnormal gene dosing and atypical results in gene production.
“It’s known that individuals with cancer are prone to exhibit increased levels of escape genes,” said Clark, who is a student at the College of Medicine at Ohio State University. “But there isn’t a complete understanding yet of how particular cancer types affect escape genes. Both the type and quantity of escape genes may vary by a given cancer type. Our research looked specifically at the expression of escape genes in various cancers to see if specific genes are prevalent in certain cancers and whether our findings were corroborated in the literature.”
Valuable Collaboration with Experts
Drawing on data from The Cancer Genome Atlas (TCGA) that they processed using the UChicago high-performance computing cluster, Kakuturu and Clark’s goal was to create a bioinformatics pipeline with the ability to process tumor files and determine the escape status of X chromosome genes. They worked closely with Dr. Lixing Yang, assistant professor at the University of Chicago’s Ben May Department for Cancer Research, as well as with other members of his lab, organizing and setting up the data to generate outputs using the R script they wrote.
“Working with Dr. Yang and other members of his lab was a huge help and a learning experience in itself,” said Kakuturu. “Setting up the data and processing it, and then processing it further to get it in a state where we could generate graphs and charts that would give us a visual representation of what we were looking for was all very new terrain for me and Clark. Having them available to answer questions was pivotal to the success of our project.”
They also worked with biomedical informatics instructor Larry Helseth, whose current role at NorthShore University HealthSystem as a translational bioinformatician involves developing software to analyze genomic data for integration into the patient’s medical record.
"In the beginning, we were meeting with Larry every week,” Clark said. “Although he doesn’t work in the specific area we were investigating, his familiarity with the various coding languages we were using was extremely helpful. He was able to point us in the right direction and give us a sense for the bigger picture.”
Laying the Foundation for the Future
In the end, they were able to show using TCGA data that cancer can complicate XCI and that the escape gene liable to be over-expressed varies depending on cancer type. They analyzed and plotted 190 samples, visualizing approximately 690 genes at the chromosomal level while highlighting, of the many genes they found with escape statuses deviating from normal, 15 with potential roles in complicating cancer.
More importantly, however, their project developed the infrastructure for important future research to be done in the area.
“While we ran data through our pipeline to get some initial results, that was in large part just as a proof of concept,” said Clark. “Our study indicated that there may be differences in XCI and escape gene status for various cancers, but future studies using our pipeline should be able to go further and indicate patterns in XCI expression in cases of non-typical genotypes. That could lead to a better understanding of the role of XCI escape genes in cancer morphology.”