Skip to Content



Insights: Medical Experts on How Big Data is Being Used to Treat Cancer

November 02, 2016 00:00 AM UTC
- Updated May 06, 2020 14:58 PM UTC

They say one problem is that there still isn't enough data. (November 2016)

[MUSIC PLAYING] CLIFTON LEAF: So curing cancer seems like something that would happen in the hospitals or the clinics in your doctor's office, not in computer labs. In this space here, we saw the for Watson for Oncology, but applying analytics to human DNA and DNA of cancer cells is a promising frontier of cancer research, as we've heard and seen this week. It can help patients get the best treatment for the type of cancer they have, minimize the negative impact of that treatment on them, and ultimately, save lives. The huge size, though, of the data sets and the complexity of sharing those data sets among researchers threatens to put the brakes on progress, but the experts who are committed to the cure are racing to the finish line. GREG SIMON: Big data is right now the big yellow object everybody is in love with. But we still live in an information scarce medical world. Compared to in finance, in the 1920s, they created-- I mean, going back to the 1920s at the University of Chicago, you can find the-- every stock transaction ever done between then and now, and yet, you can't get your medical records out of your hospital. So big data is an idea, and it's an idea that's happening in a few places, but genomics is a very small part of it. Proteomics takes you from 30% knowledge about what could happen to 60% or 70%. It's very expensive to do. It's very personal to do. When the vice president cut the ribbon, so to speak, on the genomic data commons at the University of Chicago, that's just raw genomic data from the Cancer Genome Atlas. There were 14,000 people. Now there are 32,000 people, soon to be several thousand people. It's been accessed 5 million times since June. That's the good news. The bad news is when they started the Cancer Genome Atlas, they spent $100 million to get data from 50,000 people-- 24,000 people-- half of it was not usable. So big data is right in the middle of bad data being created by lack of pathology standards and lack of medical standards, and lack of transmission of the data ubiquitously instantly the way we do with financial data, the way we do we do with weather data, you name it. So can it make a difference? Yes, but we have to change the culture of sharing information, because our ability to share is far outpaced our attitude about sharing. DR. KEN ROBERT SMITH: We have a lot of genealogical data, and we have been able to bring millions and millions and millions of records of genealogies into electronic format, link that to two cancer records, and as much as we want to and we and we need to be focusing on genomic information, genetic testing, a very important component to big data in cancer is risk stratification. So I want to know whether I'm in a high risk family or am I in a normal risk family. And if I can do that, in many ways, that can be as valuable-- perhaps even more valuable-- than providing my DNA to a sequencing platform to determine whether I carry this variant or that variant. They can both be used in conjunction, of course, but it's an aspect of a big data that I think we need to be talking more about. CLIFTON LEAF: Has there been more of an effort into categorizing all of this information that doesn't come out of the lab? GREG SIMON: Well, we've been part of creating a lot of collaborations around data. There's a protease genomics project with the DOD and the DOE, and the NCI. With the genomic data comments I mentioned earlier, we have a very simple program on Facebook called Cancer Base. All of our cancer statistical data is five years looking backwards. So if you go to on Facebook, and it says, do you have cancer, what kind do you have, what stage are you in, where do you live, and it creates a global map of people with your cancer, where they live, and the second and third phase of that program will tell you how those other people are being treated so that you can compare what your situation is with them. Because some of the most important big data is what happened to your parents and where do you live. Those two pieces of information are more important than all your genome, unless you have a single gene disease, which you probably got from your parents who had the same thing. If you don't know what happened to your parents and we don't know where you live, then we're not going to intervene in the right way at the right time. CLIFTON LEAF: It's funny, if you follow some of the election coverage, there has been reports of companies like Catalyst, which have down to the level door bell analysis of every voter, whether they voted, what priority registration, but also where they voted in previous-- what magazines they subscribe to. We have that for voters, but we don't have that for cancer history. I mean, Ken, you've been working with the most, not homogeneous, but a group of people that have consented to a lot of data information and personal information about themselves. How unusual is that? How do we export that model to other public research? DR. KEN ROBERT SMITH: So what we've tried to do in Utah is to, both for the purposes of addressing cancer, but really for most of the serious diseases is to try to marshal all of the data that we can legitimately have access to. Interestingly, you mentioned voting, we have agreements with our voter registration office in Utah to get all of the voter registration data from the population. We also get it for all the driver's license holders in the state of Utah, and we have a legal system for doing this, and I mention this only because the insights about where the risks are for cancer or any other disease can really come from almost any type of data. So you mentioned about knowing who your parent-- what your parents risk was and where you live, well, we need to be able to look wherever we can look to get that information. And so our strategy has been to develop these relationships with our departments of health and other organizations so that we are-- they are able to share the data with us. We create this amazing connection of all the data, and you can create that big data sphere that you can query, and there are peculiar stream-- I'll call it peculiar streams of data that you may not have thought about that would go into it for the purposes of doing cancer surveillance, but they all contribute to it. So if I know where you vote, I know what district you're in, I know where you live, and from that, I can attribute a great deal of information about you, and that's another piece of information. So we're trying to go from the genome to the menome, and learn as much about you as we can, and just to be clear, we are very sensitive-- you mentioned consent earlier, so we have all sorts of protections about how we do that in a way that makes everybody comfortable and provides the benefit. [MUSIC PLAYING]