Recent and Current Projects
    Removing Bias in Sequence Models of Protein Fitness, Doctoral Thesis Topic
        Unsupervised sequence models for protein fitness have emerged as powerful tools for protein design in order to engineer therapeutics and industrial enzymes, yet they are strongly biased towards potential designs that are close to their training data. This hinders their ability to generate functional sequences that are far away from natural sequences, as is often desired to design new functions. To address this problem, we introduce a de-biasing approach that enables the comparison of protein sequences across mutational depths to overcome the extant sequence similarity bias in natural sequence models. We demonstrate our method's effectiveness at improving the relative natural sequence model predictions of experimentally measured variant functions across mutational depths. Using case studies proteins with very low functional percentages further away from the wild type, we demonstrate that our method improves the recovery of top-performing variants in these sparsely functional regimes. Our method is generally applicable to any unsupervised fitness prediction model, and for any function for any protein, and can thus easily be incorporated into any computational protein design pipeline. These studies have the potential to develop more efficient and cost-effective computational methods for designing diverse functional proteins and to inform underlying experimental library design to best take advantage of machine learning capabilities.
        
 Pre-print
     
    Deep Learning Prediction of Enzyme Optimum pH, Doctoral Research Topic
        Compiled database of 200+ measurements of point-mutation effects on pH tolerance across 50 enzymes 
        Developed large language modeling methods to infer biological drivers of pH tolerance in enzymes. Major contributor. 
       
Pre-print 
     
               
    Protein Design using Structure-based Residue Preferences, Doctoral Research Topic
    An unsupervised design approach that learns residue mutation preferences from local structural dependencies. Major contributor. 
    Co-author paper submitted to Nature Structural & Molecular Biology June 2023. 
        
Pre-print
     
    An in silico method to assess antibody fragment polyreactivity, Graduate Research Topic
        Used AWS servers to host an online machine learning model to predict nanobody poly-specificity and visualize sequence biometrics. Users can visit 
http://18.224.60.30:3000/ to input nanobody sequences and get predictions. Work published in Nature Communications:
        Harvey, E.P., Shin, JE., Skiba, M.A. et al. An in silico method to assess antibody fragment polyreactivity. Nat Commun 13, 7554 (2022). 
https://doi.org/10.1038/s41467-022-35276-4
     
    Learning PET hydrolase activity from sparse experiment data, Graduate Research Topic
        Developed machine learning methods to learn and predict from sparse, disparate enzyme activity. Work in collaboration with National Research Energy Lab (NREL) to develop plastic-eating enzymes.
         First co-author paper will be submitted in July 2023.
    
    Past Projects
    Vertical Resolution Requirements to Simulate Transpacific Ozone Pollution, Graduate Research Topic
        Observations show that chemical plumes injected to high altitudes (the free troposphere) by
        convection, volcanoes, or stratospheric intrusions can retain their identity as well-defined
        vertical layers for a week or more as they are transported on intercontinental scales. Global
        atmospheric models fail to reproduce these persistent plumes due to rapid numerical diffusion.
        Under realistic shear flow, plumes filament down to the grid scale where any advection scheme
        collapses to first order and rapid numerical dissipation occurs. Eastham and Jacob (2017) found
        that the primary limitation to resolving the plumes was the vertical resolution as General
        circulation models (GCMs) have prioritized increasing horizontal resolution versus vertical
        resolution. The new 132-level (L132) GEOS-5 model has potential to improving transport of transcontinental plumes 
        and better representing intercontinental influences on non-linear surface ozone chemistry. Improvement in transport is
        evaluated via comparisons between inert tracer plumes simulated in L132 and the 72-level
        GEOS-5 model (L72). I focus on transpacific Asian pollution plumes transported to North
        America, for which we have extensive complementary aircraft, satellite, surface measurements
        and ozonesonde data, to evaluate the L132’s quantification of Asian pollution influence on
        western US surface ozone. 
        I primarily identified GEOS-5 model bugs, physical inconsistencies and collaborated with NASA scientists to fix and ensure valid model 
        performance and output. To do this, I developed an understanding of the FORTRAN 77 and 90 model framework and in particular
        familiarized myself with the code structure of the chemistry and dynamics module.        
        Languages: Python (for analysis), FORTRAN
        
Symposium Presentation on Research
     
     Dog Prediction Neural Networks, Fall 2018 AC209a
        We optimized and compared ResNet’s Neural Networks, Convolutional Neural Networks and Artificial Neural Networks 
        capabilities to predict the breeds of 20,000+ purebred dogs. We used Keras machine learning to implement and test 
        our networks.
        Language: Python
        
Project Website
       
    Ozone Laminae Prediction Algorithm, Fall 2017 EPS236
               I developed an algorithm to detect ozone laminae off the coast of Northern California, using data from Trinidad 
               Head, CA ozonesondes. The algorithm was able to filter out high frequency noise, define the free troposphere, 
               recognize high ozone peaks that fit the criteria of free tropospheric ozone laminae. 
               Language: R
               
                   Presentation Download
     
    Analysis of Advection Schemes for Application in a Turbulent Propeller Wake, Fall 2018 AM205
        We  coded and tested  three  advection  schemes:  Essentially  Non-Oscillating  (ENO),  Superbee,  and  
        Monotonic upwind Scheme for Conservation Laws (MUSCL) using 1-D and 2-D standard testing methods. We applied the 
        lowest error  schemes to a steady state velocity field produced by a weather balloon propeller in the stratosphere.
        
        Language: python, MATLAB
    
    Verification of Goldbach’s Conjecture, Spring 2018 CS205
        We designed a simple algorithm in C for verifying Goldbach's conjecture and developed several parallel 
        implementations of the code to identify the best strategies for tackling the problem as integer size increases. 
        We tested the following forms of parallelism: OpenMP shared memory parallelism, MPI distributed memory parallelism,
        Hybrid MPI-OpenMP parallelism, OpenACC GPU accelerated computing. 
        Language: C
        
Project Website 
     
    
             Investigating Extracellular Electron Transpot in Ammonia Oxidizing Bacteria, Undergraduate Research Project
             Anaerobic Ammonia Oxidizing Bacteria can anaerobically capture aqueous carbon dioxide and also convert nitrite and 
             ammonia to dinitrogen. 
             Their slow doubling time of 7-22 days, complicated symbiosis with other microbes, and inability to be isolated
             remains are major hurdles for their usage for both carbon capture and energy efficient wastewater treatment. 
             Scientific research in Anammox focuses on optimizing growth situations for their well-being by understanding
             the carbon fixation pathways, their population dynamics, and exchange of materials between members of the Anammox
             consortium.
             
             Researchers have examined the variation of growth and activity of Anammox with different inorganic or organic 
             electron donors. 
             However, the need for the addition of chemical electron donors could be satisfied by supplying
             electrons directly via current. 
             Extracellular electron transfer (EET) through cellular electron carriers, 
             notably c-type cytochrome, is used for transfer of electrons between cells and toward the surrounding environment.
             Specifically, extracellular c-type cytochrome could allow the Anammox cell to utilize the electric current 
             supplied by electrodes in cellular metabolic processes.
             Extracellular electron transport is attributed to extracellular membrane electron carriers such as quinones 
             (notably or anthraquinone-2,6-disulfonate (AQDS)), phenazines, flavins (Kotloski & Gralnick, 2013) and heme 
             proteins (notably c-type cytochrome(Rosenbaum, Aulenta, Villano, & Angenent, 2011)). 
            
             Furthermore, these surface electron carriers facilitate the transport of electrons on the surface of the cell to 
             the internal metabolic processes within the cell(Thrash & Coates, 2008). 
                
             The examination of the structure of the Anammox extracellular matrix reveals an abundance of c-type cytochrome. These form the majority of the 
             extracellular complex, manifesting in the vermillion color of the Anammox aggregate(Kartal et al., 2011). 
             
             C-type cytochrome, an electron transport protein in the electron transport chain has been previously 
             demonstrated to be a feasible extracellular electron carrier in geobacter, desulfovibrio, and various other 
             denitrifiers(Thrash & Coates, 2008). 
             
             

             
             According to the figure above (de Almeida, Wessels et al. 2016), c-type cytochrome redox is strongly correlated with hydrazine 
             dehydrogenase and hydrazine synthase. 
                
             It may be presumed that the addition of electrons to c-type cytochrome will result in a negative charge 
             accumulation in the Anammoxosome lumen and result in a stronger proton-motive force to generate more ATP.
             Previous studies involving extracellular electron transport via electrodes to biofilm bacteria revealed an 
             increase in population growth and cellular metabolites, which is correlated with discrete current measurements 
             spanning from 10-100mA (Thrash & Coates, 2008).
             
             I applied a series of different currents to an Anammox bioreactor to achieve two purposes: 
             1) To investigate if adding current to Anammox bacteria could increase the metabolic activity and thus the 
             nitrogen removal rate and 2) If it did increase the metabolic rate, what is the optimal current for their growth? 
             >
        
 
    Indoor Air Quality, Undergraduate Research Project
        I determined the volumetric flux above various types of lightbulbs to explore the relationship between flow rate and
        power consumption. I designed and built a wood/plastic sheet greenhouse structure to prevent outside influence from 
        introducing artifacts in my data. This work was presented at Spring 2016 UC Berkeley Student Undergraduate Research 
        Forum
        Language: MATLAB
        
Poster Presentation
       
    Biofuels Technology Club, engineer, Spring 2017
        The goal of the biofuels technology club is to produce biodiesel from cooking oil from the UC Berkeley Dining Commons.
        The ultimate goal would be to to first optimise and develop a laboratory scale process and then begin the process of
        scaling up the project to a pilot plant.
        Each process: the titration removal of free fatty acids, transesterfication, and water washing needed a team of engineers
        who were detail-oriented, were self-driven, and would work well with others. I worked with the water washing to 
        I worked with a team of students on the water washing process: documenting how much water we were using to wash out the
        glycerol and communicating with downstream processes of quality testing to ensure that our biodiesel product met standards.
        
        
Project Website