CV
Claire Darnell McWhite, Ph.D.
clairemcwhite@arizona.edu | clairemcwhite.github.io | @clairemcwhite | +1 240-418-2083
Research interests
Interpretability of Large Language Models of biomolecules and gene expression, Agentic programming, Algorithms for bioinformatics, Big Data
Positions
(2024-current) Assistant Professor (Tenure-track) Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ, USA
(2025) Expert Consultant for LincSwitch Therapeutics, LLC
- Field: Biotechnology, analysis of single-cell sequencing
(2020-2024) Lewis-Sigler Scholar Princeton University, Princeton, NJ, USA
- Mentors:
- Mona Singh, Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics
- Joshua Akey, Lewis-Sigler Institute for Integrative Genomics
- Martin Jonikas, Department of Molecular Biology
Editorial Responsibilities
- 2025 - current – Associate Editor: Journal of Molecular Evolution (Large Language Model focus)
Professional Affiliations
- International Society for Computational Biology
Degrees
- (2014-2020) Ph.D. Cell and Molecular Biology, University of Texas at Austin, Supervisor: Prof. Edward Marcotte. Thesis: Conservation and Comparison of Protein Interactions Across Evolution
- (2010-2014) B.S. Biochemistry and Cell Biology, Rice University
Other Research Experiences
(2014-2020) The University of Texas at Austin Department of Molecular Biosciences
- (2014-2020) Research group of Dr. Edward Marcotte: Projects related to computational prediction of protein complexes
- (2/2015-8/2015) Research group of Dr. Claus Wilke: Projects related to artifacts in computational measurement of evolutionary rate in influenza
- (11/2014-2/2015) Research group of Dr. John Wallingford: Projects related to functional gene replacement in multicellular organisms using genes from unicellular organisms
(2010-2014) Rice University Department of Chemical and Biomolecular Engineering
- Research group of Dr. Laura Segatori: Projects related to discovering new compounds and gene activators of the proteasome in mammalian and yeast models
(2008-2012) National Cancer Institute, National Institutes of Health, Bethesda, MD
- Research group of Dr. Ilona Linnoila, Experimental Pathology Section: Projects related to cancer and lung epithelial cells in abnormal lesions and chemical damage
Publications
- A Shamail and CD McWhite, “A General Algorithm for Detecting Higher-Order Interactions via Random Sequential Additions”, arXiv:2511.21614, 2025
- A Shamail and CD McWhite, “Automated Protein Motif Localization using Concept Activation Vectors in Protein Language Model Embedding Space”, arXiv:2511.21614, 2025
- R Shaw, SD Love, CD McWhite, “Evaluating pretrained protein language model embeddings as proxies for functional similarity”, Journal of Molecular Evolution, 1–12, 2025
- S Majidian, A Hadziahmetovic, F Langschied, S Pascarelli, … CD McWhite, … et al., “Quest for orthologs in the era of data deluge and AI: challenges and innovations in orthology prediction and data integration”, Journal of Molecular Evolution, 1–18, 2025
- H Xu, R Bierman, D Akey, C Koers, T Comi, CD McWhite, JM Akey, “Landscape of human protein-coding somatic mutations across tissues and individuals”, bioRxiv, 2025.01.07.631808, 2025. In Review at Science
- V Dang, B Voigt, D Yang, G Hoogerbrugge, M Lee, R Cox, O Papoulas, CD McWhite, R Pradeep, JC Leggere, RS Gray, EM Marcotte, “VerteBrain reveals novel neural and non-neural protein assemblies conserved across vertebrate evolution”, bioRxiv, 2025.05.26.656196, 2025. In Review as a Cell Press Multi-Journal Submission
- CD McWhite, W Sae-Lee, Y Yuan, AL Mallam, NA Gort-Freitas, S Ramundo, M Onishi, EM Marcotte, “Alternative proteoforms and proteoform-dependent assemblies in humans and plants”, Molecular Systems Biology, 1–19, 2024
- RM Cox, O Papoulas, S Shril, C Lee, T Gardner, AM Battenhouse, M Lee, K Drew, CD McWhite, D Yang, JC Leggere, D Durand, F Hildebrandt, JB Wallingford, EM Marcotte, “Ancient eukaryotic protein interactions illuminate modern genetic traits and disorders”, bioRxiv, 2024.05.26.595818, 2024. In Review at Cell Genomics
- ES Wallner, A Mair, D Handler, CD McWhite, SL Xu, L Dolan, DC Bergmann, “Spatially resolved proteomics of the Arabidopsis stomatal lineage identifies polarity complexes for cell divisions and stomatal pores”, Developmental Cell, 10.1016/j.devcel.2024.03.001, 2024
- CD McWhite, I Armour-Garb, M Singh, “Leveraging protein language models for accurate multiple sequence alignment”, Genome Research, 33(7):1145-1153, 2023
- M Kafri, W Patena, L Martin, L Wang, G Gomer, AK Sirkejyan, A Goh, AT Wilson, SE Gavrilenko, M Breker, A Roichman, CD McWhite, JD Rabinowitz, FR Cross, M Wuhr, MC Jonikas, “Systematic identification and characterization of novel genes in the regulation and biogenesis of photosynthetic machinery”, Cell 186 (25), 5638–5655.e25, 2023
- L Wang, W Patena, KA Van Baalen, Y Xie, ER Singer, S Gavrilenko, M Warren-Williams, L Han, HR Harrigan, LD Hartz, V Chen, VTNP Ton, S Kyin, HH Shwe, MH Cahn, AT Wilson, M Onishi, J Hu, DJ Schnell, CD McWhite, MC Jonikas, “A chloroplast protein atlas reveals punctate structures and spatial organization of biosynthetic pathways”, Cell, S0092-8674(23)00676-1, 2023
- V de Crecy-Lagard, …, CD McWhite, …, “A roadmap for the functional annotation of protein families: a community perspective”, Database, baac062, 2022
- W Sae-Lee, CL McCafferty, EJ Verbeke, PC Havugimana, O Papoulas, CD McWhite, JR Houser, K Vanuytsel, G Murphy, K Drew, A Emili, DW Taylor, EM Marcotte, “The protein organization of a red blood cell”, Cell Reports 40(3):111103, 2022
- CD McWhite, O Papoulas, K Drew, V Dang, JC Leggere, W Sae-Lee, EM Marcotte, “Co-Fractionation Mass Spectrometry to Identify Protein Complexes”, STAR Protocols 2:1, 2021
- A Zeileis, JC Fisher, K Hornik, R Ihaka, CD McWhite, P Murrell, R Stauffer, CO Wilke, “colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes”, Journal of Statistical Software, 96:1, 2020
- K Drew, C Lee, RM Cox, V Dang, CC Devitt, CD McWhite, O Papoulas, RL Huizar, EM Marcotte, JB Wallingford, “A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating machinery”, Developmental Biology 467:1-2, 2020
- CD McWhite, O Papoulas, K Drew, RM Cox, V June, OX Dong, T Kwon, C Wan, ML Salmi, SJ Roux, KS Browning, ZJ Chen, PC Ronald, EM Marcotte, “A pan-plant protein complex map reveals deep conservation and novel assemblies”, Cell 181:2, 2020
- CD DuPai, CD McWhite, CB Smith, R Garten, S Maurer-Stroh, CO Wilke, “Influenza passaging annotations: what they tell us and why we should listen”, Virus Evolution 5 (1), vez016, 2019
- W Zhao, B Bachhav, CD McWhite, L Segatori, “A yeast selection system for the detection of proteasomal activation”, Protein Engineering, Design and Selection 31 (11), 437–445, 2018
- AH Kachroo, JM Laurent, A Akhmetov, M Szilagyi-Jones, CD McWhite, A Zhao, EM Marcotte, “Systematic bacterialization of yeast genes identifies a near-universally swappable pathway”, eLife 6:e25093, 2017
- KS Drew, C Lee, RL Huizar, F Tu, B Borgeson, CD McWhite, Y Ma, JB Wallingford, EM Marcotte, “Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes”, Molecular Systems Biology 13:932, 2017
- BJ Liebeskind, CD McWhite, EM Marcotte, “Towards Consensus Gene Ages”, Genome Biology and Evolution 8 (6): 1812–1823, 2016
- CD McWhite, AG Meyer, CO Wilke, “Sequence amplification via cell passaging creates spurious signals of positive adaptation in influenza virus H3N2 hemagglutinin”, Virus Evolution 2 (2):vew026, 2016
- BJ Liebeskind, CD McWhite, EM Marcotte, “Applications of comparative evolution to human disease genetics”, Current Opinion in Genetics & Development 35, 16–24, 2015
- W Zhao, M Bonem, CD McWhite, JJ Silberg, L Segatori, “Sensitive detection of proteasomal activation using the Deg-On mammalian synthetic gene circuit”, Nature Communications 5, 2014
- A Demelash, P Rudrabhatla, HC Pant, X Wang, ND Amin, CD McWhite, X Naizhen, RI Linnoila, “Achaete-scute homologue-I (ASH1) stimulates migration of lung cancer cells through Cdk5/p35 pathway”, Molecular Biology of the Cell 23 (15), 2856–2866, 2012
Grants and Scholarships
- 2025 – Arizona Big Ideas Challenge Award - “Summoning microbial allies to reduce nitrogen fertilizer dependency in modern agriculture” - Co-Principal Investigator ($250,000)
- 2020 – NIH F31 Award, “Proteomics of ciliopathy protein complexes”. Percentile: 1%, Impact points: 10 (maximum)
- 2020-2024 – Lewis-Sigler Scholar, Princeton University
- 2017 – Graduate School Summer Fellowship, Summer
- 2014-2015 – Spring/Fall Graduate Fellowship, UT Austin
- 2010-2014 – National Merit Scholarship
- 2010-2014 – Trustee’s Distinguished Scholarship, Rice University
- 2010-2012 – Century Scholarship for Research, Rice University
Academic Service
Student supervision
- Supervision of Doctoral Students (University of Arizona): Samuel Love, Robert Shaw
- Supervision of Master’s Student (University of Arizona): Ahmad Shamail
- Supervision of Undergraduate Students (Princeton University): Vivian Chen, Mark Castellano
Student Committees
- Robert Shaw, Samuel Love, Isabella Johnson, Harvey Ortiz
Refereeing
- I have reviewed articles for international journals including Nature, eLife, Nature Communications, Cell Systems, Journal of Molecular Evolution, Plant Physiology, The Plant Cell, Science Advances, PLOS ONE
Grant Panels
- 2025 – NSF/DBI Capacity Panel
- 2024 – NSF/BIO/DBI Innovation Panel
Awards
- 2025 – FIS3 Idonea, Macrosettore LS2 (Final Ranking 6th position, Top 4 funded)
- 2023 – Princeton President’s Award for Distinguished Teaching
- 2018 – Best Presentation, First International Plant Systems Biology Conference
- 2017 – Student Choice Award, UT Austin Natural Sciences Council’s Art in Science
- 2016 – Honorable Mention, NSF Graduate Research Fellowship Program
- 2016 – Visualizing Science Competition, 2nd Place, UT Austin College of Natural Sciences
- 2016 – Graduate Student Research Award, Society of Systematic Biologists
- Selected Artist, Evolution Art Exhibition, Art.Science.Gallery
- 2015 – Best Poster, UT Austin Institute for Cellular and Molecular Biology Retreat
- 2015 – Best Poster, Big Data in Biology Symposium, Center for Computational Biology and Bioinformatics, Austin, Texas
Published Scientific Software
- vsmsa: A new algorithm for Multiple Sequence Alignment using protein language models
- colorblindr: An R package to aid design of colorblind-interpretable figures
- colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes
- cfmsflow: A Pipeline for analysis of Co-Fractionation MS experiments
Conferences and Seminars
- 2026 – Nobel Symposium, Umeå University, Sweden (upcoming)
- 2026 – Plant Biology 2026, Ottawa, Canada (upcoming)
- 2025 – SynBio Young Speaker Series, virtual
- 2025 – Human Technopole, Milan, Italy
- 2024 – Applied Math Colloquium, University of Arizona, USA
- 2024 – Quest for Orthologs Conference, Montreal, CA
- 2024 – UNIST, Ulsan, South Korea
- 2024 – MIT
- 2023 – GMI Vienna
- 2023 – Cornell University, Department of Molecular Biology
- 2023 – University of Arizona, Department of Molecular Biology
- 2023 – Duke University, Cell Biology and Biostatistics Departments
- 2023 – Rice University, Biosciences Department
- 2022 – Mendel Symposium, GMI Vienna
- 2022 – Systems, Synthetic, and Physical Biology Seminar, Rice University
- 2022 – International Society for Computational Biology Webinar, Virtual
- 2022 – Western Photosynthesis Conference, virtual
- 2021 – Plant Cell Atlas, virtual
- 2020 – Polyploidy Webinar, virtual
- 2019 – Computing on Phenotypes, Iowa State University, Ames, IA
- 2018 – First International Plant Systems Biology Conference, Roscoff, France
Posters
- 2022 – US HUPO, Charleston, SC
- 2021 – US HUPO, virtual
- 2017 – The Society for Molecular Biology and Evolution, Austin, TX
- 2017 – Quest for Orthologs Meeting, Los Angeles, CA
- 2017 – Omics Approaches to Study the Proteome Keystone Conference, Breckenridge, CO
- 2016 – Plant Biology, Austin, TX
- 2016 – Evolution, Austin, TX
- 2015 – Quest for Orthologs Meeting, Barcelona, Spain
- 2015 – Big Data in Biology Symposium, Austin, TX
- 2013 – Rice Undergraduate Research Symposium
- 2012 – Rice Undergraduate Research Symposium
- 2011 – Rice Undergraduate Research Symposium
- 2011 – NIH Summer Research Symposium
Teaching Experience
- 2025 – Instructor, MCB 447/547: Big Data in Molecular Biology and Biomedicine, The University of Arizona, Fall
- 2023 – Instructor, QCB455/MOL455/COS455 Introduction to Genomics and Computational Molecular Biology, Princeton, Fall
- 2022 – Instructor, QCB455/MOL455/COS455 Introduction to Genomics and Computational Molecular Biology, Princeton, Fall
- 2021 – Instructor, QCB455/MOL455/COS455 Introduction to Genomics and Computational Molecular Biology, Princeton, Fall
- 2020 – Instructor, QCB455/MOL455/COS455 Introduction to Genomics and Computational Molecular Biology, Princeton, Fall
- 2019 – Teaching Assistant, Advanced Bash Scripting Short Course, UT Austin, Spring
- 2018 – Teaching Assistant, Intro to Next-Generation Sequencing Short Course, UT Austin, Spring
- 2017 – Co-leader, Peer-led Biocomputing Working Group: Python/Bash, UT Austin, Spring
- 2016 – Instructor (Data visualization in R), Peer-led Biostatistics Seminar, UT Austin, Fall
- 2016 – Teaching Assistant, BCH339N Systems Biology & Bioinformatics, UT Austin, Spring
Outreach
- Fall 2025 – Careers in Biology Talk to Evolutionary Biology Students
- Spring 2025 – Introduction to LLM-assisted coding
- Fall 2024 – Center for Recruitment of Mathematics Teachers Community Game Night
- Fall 2022 – Introduction to Alphafold2
- Fall 2019 – Introduction to Data Visualization in R
- Spring 2018 – Introduction to Network Visualization, Austin R Users Group
- Spring 2018 – Speaker at BAHFest Texas, A festival of bad ad-hoc hypotheses
- Fall 2017 – Designing for Colorblindness, Austin R Users Group
Language
English, Italian (fluent)
