• Project Title: Breast Cancer Analysis And Prediction With R Programming

  • BASIS Advisor: Dr. Dornhoffer

  • Internship Location: Intel

  • Onsite Mentor: Mr. Debashis Chatterjee

Breast cancer is one of the most prevalent diseases around the world. This disorder is often characterized by a variety of abnormal structural and mechanical cell nucleus properties such as nuclear size, nuclear shape and mitosis dynamics. As the diagnosis and characterization of tumor cells rely heavily upon imaging studies, there is a need for scientists and health care providers to develop sophisticated algorithms to find faster and more accurate ways to predict or diagnose a patient’s likelihood of getting breast cancer. Thus, the goal for this project is to use a set of sample data to predict the probability of getting breast cancer. In order to complete this project, I will utilize the R programming language to analyze a medical database, Breast Cancer Wisconsin Dataset. I will analyze the data by first determining the statistics of all the attributes and creating visualizations for a better understanding. Based on that information, I will determine the risk factors that most likely to lead to breast cancer by using a set of training data. Then, I will code three different prediction algorithms and determine the most accurate algorithm following a statistical analysis. Finally, I will test the selected algorithm using a set of test data to predict the final outcomes. Overall, I believe this project will help doctors and medical researchers expedite the process of diagnosis and find new ways to characterize breast cancer cells in order to prevent and treat more effectively.