Project Title: Breast Cancer Analysis And Prediction With R Programming
BASIS Advisor: Dr. Dornhoffer
Internship Location: Intel
Onsite Mentor: Mr. Debashis Chatterjee
Breast cancer is one of the most prevalent diseases around the world. This disorder is often characterized by a variety of abnormal structural and mechanical cell nucleus properties such as nuclear size, nuclear shape and mitosis dynamics. As the diagnosis and characterization of tumor cells rely heavily upon imaging studies, there is a need for scientists and health care providers to develop sophisticated algorithms to find faster and more accurate ways to predict or diagnose a patient’s likelihood of getting breast cancer. Thus, the goal for this project is to use a set of sample data to predict the probability of getting breast cancer. In order to complete this project, I will utilize the R programming language to analyze a medical database, Breast Cancer Wisconsin Dataset. I will analyze the data by first determining the statistics of all the attributes and creating visualizations for a better understanding. Based on that information, I will determine the risk factors that most likely to lead to breast cancer by using a set of training data. Then, I will code three different prediction algorithms and determine the most accurate algorithm following a statistical analysis. Finally, I will test the selected algorithm using a set of test data to predict the final outcomes. Overall, I believe this project will help doctors and medical researchers expedite the process of diagnosis and find new ways to characterize breast cancer cells in order to prevent and treat more effectively.
My Posts
Week 11: Wrapping it up
Hello Guys! So, for this week, I’m finally wrapping it up. I just confirmed all the results, as well as getting ready for the presentation even though it was a long busy week for me. I just want to reflect on this project. So, overall, I think I did a good job of analyzing the large […]
Week 10: Almost Done
Okay, so now, I have all the results that I need from the analyses that I conducted. I have finished creating the presentation and am working to finish practicing it. Basically, I collected the results of several statistical tests, including crosstables, to help justify different conclusions that I will be making. I will also show […]
Week 9: Almost Done
What’s up guys? I can’t believe I’m almost done with my Senior Project. So, for this week, I managed to get most of my analysis done in the Jupyter Notebook and will wrap this up next week. To summarize what I did, I first, after bypassing several bugs, I implemented the databases as file objects […]
Week 8: A small break
This week was different for me. I was out of town for a good portion of the week and thus, wasn’t able to work on my project. So, instead of talking about my current progress, I decided to explain more about the technical side of my analysis in the Jupyter Notebook. The Jupyter Notebook version […]
Week 7: More debugging
Well, this week was a less stressful than last week, so I will get into it. First of all, I decided to make a change in my schedule and decided to analyze my second dataset in R (which I will explain later). Doing this will help me to get my R analysis complete for the […]
Week 6: Coding and Testing
It was a busy week in Week 6. I first finished developing the entire decision tree algorithm in Python that I described about in my previous blog post. It was combination of helper methods (methods that are complementary to the overall algorithm) and classes arranged to analyze each set data like that in a decision […]
Week 5: Modeling in Python
Wow! I finished setting up the decision tree algorithm in R and am now onto building the same model but this time in Python. Specifically, I am using the Jupyter Notebook that is supported by the Anaconda environment to build this model, which enables me to test any part of the model I want at […]
Week 4: Continuing to Build the Models
In week 4, I continued to build the algorithms that I started in week 3. Instead of focusing on building the Support Vector Machine, I shifted my focus to two other types of algorithms: K- Nearest Neighbors, and Decision Trees. Unlike Support Vector Machines, K- Nearest Neighbors and Decision Trees involve more analysis and often […]
Week 3: Analysis and Building First Model
Finally! I am off to work! For this week, I did some analysis of the data that I am going to use for this project. In order to complete a thorough analysis, I first had to load the data into Rstudio, which took some time due to the massive data set that had to be […]
Week 2: Setting Up The Project
It was a great week settling into the project. I first set up two datasets by first extracting the data from two massive files and importing them into a spreadsheet for clarity. I also made sure that the files were marked as csv files. After that, I decided to learn more about the algorithms that […]
Week 1: Introduction
Week 1 was about getting ready for the project by understanding more about the biology of breast cancer, as well as getting some experience in machine learning using the R language. After reading several articles and publications about breast cancer, I have a better understanding of how breast cancer can develop and spread throughout the […]