Projects

Recent and Current Projects


Removing Bias in Sequence Models of Protein Fitness, Doctoral Thesis Topic

Unsupervised sequence models for protein fitness have emerged as powerful tools for protein design in order to engineer therapeutics and industrial enzymes, yet they are strongly biased towards potential designs that are close to their training data. This hinders their ability to generate functional sequences that are far away from natural sequences, as is often desired to design new functions. To address this problem, we introduce a de-biasing approach that enables the comparison of protein sequences across mutational depths to overcome the extant sequence similarity bias in natural sequence models. We demonstrate our method's effectiveness at improving the relative natural sequence model predictions of experimentally measured variant functions across mutational depths. Using case studies proteins with very low functional percentages further away from the wild type, we demonstrate that our method improves the recovery of top-performing variants in these sparsely functional regimes. Our method is generally applicable to any unsupervised fitness prediction model, and for any function for any protein, and can thus easily be incorporated into any computational protein design pipeline. These studies have the potential to develop more efficient and cost-effective computational methods for designing diverse functional proteins and to inform underlying experimental library design to best take advantage of machine learning capabilities. Pre-print

Deep Learning Prediction of Enzyme Optimum pH, Doctoral Research Topic

Compiled database of 200+ measurements of point-mutation effects on pH tolerance across 50 enzymes
Developed large language modeling methods to infer biological drivers of pH tolerance in enzymes. Major contributor. Pre-print

Protein Design using Structure-based Residue Preferences, Doctoral Research Topic

An unsupervised design approach that learns residue mutation preferences from local structural dependencies. Major contributor.
Co-author paper submitted to Nature Structural & Molecular Biology June 2023. Pre-print

An in silico method to assess antibody fragment polyreactivity, Graduate Research Topic

Used AWS servers to host an online machine learning model to predict nanobody poly-specificity and visualize sequence biometrics. Users can visit http://18.224.60.30:3000/ to input nanobody sequences and get predictions. Work published in Nature Communications: Harvey, E.P., Shin, JE., Skiba, M.A. et al. An in silico method to assess antibody fragment polyreactivity. Nat Commun 13, 7554 (2022). https://doi.org/10.1038/s41467-022-35276-4

Learning PET hydrolase activity from sparse experiment data, Graduate Research Topic

Developed machine learning methods to learn and predict from sparse, disparate enzyme activity. Work in collaboration with National Research Energy Lab (NREL) to develop plastic-eating enzymes.
First co-author paper will be submitted in July 2023.

Past Projects


Vertical Resolution Requirements to Simulate Transpacific Ozone Pollution, Graduate Research Topic

Observations show that chemical plumes injected to high altitudes (the free troposphere) by convection, volcanoes, or stratospheric intrusions can retain their identity as well-defined vertical layers for a week or more as they are transported on intercontinental scales. Global atmospheric models fail to reproduce these persistent plumes due to rapid numerical diffusion. Under realistic shear flow, plumes filament down to the grid scale where any advection scheme collapses to first order and rapid numerical dissipation occurs. Eastham and Jacob (2017) found that the primary limitation to resolving the plumes was the vertical resolution as General circulation models (GCMs) have prioritized increasing horizontal resolution versus vertical resolution. The new 132-level (L132) GEOS-5 model has potential to improving transport of transcontinental plumes and better representing intercontinental influences on non-linear surface ozone chemistry. Improvement in transport is evaluated via comparisons between inert tracer plumes simulated in L132 and the 72-level GEOS-5 model (L72). I focus on transpacific Asian pollution plumes transported to North America, for which we have extensive complementary aircraft, satellite, surface measurements and ozonesonde data, to evaluate the L132’s quantification of Asian pollution influence on western US surface ozone.
I primarily identified GEOS-5 model bugs, physical inconsistencies and collaborated with NASA scientists to fix and ensure valid model performance and output. To do this, I developed an understanding of the FORTRAN 77 and 90 model framework and in particular familiarized myself with the code structure of the chemistry and dynamics module.
Languages: Python (for analysis), FORTRAN
Symposium Presentation on Research

Dog Prediction Neural Networks, Fall 2018 AC209a

We optimized and compared ResNet’s Neural Networks, Convolutional Neural Networks and Artificial Neural Networks capabilities to predict the breeds of 20,000+ purebred dogs. We used Keras machine learning to implement and test our networks.
Language: Python
Project Website

Ozone Laminae Prediction Algorithm, Fall 2017 EPS236

I developed an algorithm to detect ozone laminae off the coast of Northern California, using data from Trinidad Head, CA ozonesondes. The algorithm was able to filter out high frequency noise, define the free troposphere, recognize high ozone peaks that fit the criteria of free tropospheric ozone laminae.
Language: R
Presentation Download

Analysis of Advection Schemes for Application in a Turbulent Propeller Wake, Fall 2018 AM205

We coded and tested three advection schemes: Essentially Non-Oscillating (ENO), Superbee, and Monotonic upwind Scheme for Conservation Laws (MUSCL) using 1-D and 2-D standard testing methods. We applied the lowest error schemes to a steady state velocity field produced by a weather balloon propeller in the stratosphere.
Language: python, MATLAB

Verification of Goldbach’s Conjecture, Spring 2018 CS205

We designed a simple algorithm in C for verifying Goldbach's conjecture and developed several parallel implementations of the code to identify the best strategies for tackling the problem as integer size increases. We tested the following forms of parallelism: OpenMP shared memory parallelism, MPI distributed memory parallelism, Hybrid MPI-OpenMP parallelism, OpenACC GPU accelerated computing.
Language: C
Project Website

Investigating Extracellular Electron Transpot in Ammonia Oxidizing Bacteria, Undergraduate Research Project

Anaerobic Ammonia Oxidizing Bacteria can anaerobically capture aqueous carbon dioxide and also convert nitrite and ammonia to dinitrogen. Their slow doubling time of 7-22 days, complicated symbiosis with other microbes, and inability to be isolated remains are major hurdles for their usage for both carbon capture and energy efficient wastewater treatment. Scientific research in Anammox focuses on optimizing growth situations for their well-being by understanding the carbon fixation pathways, their population dynamics, and exchange of materials between members of the Anammox consortium. Researchers have examined the variation of growth and activity of Anammox with different inorganic or organic electron donors. However, the need for the addition of chemical electron donors could be satisfied by supplying electrons directly via current. Extracellular electron transfer (EET) through cellular electron carriers, notably c-type cytochrome, is used for transfer of electrons between cells and toward the surrounding environment. Specifically, extracellular c-type cytochrome could allow the Anammox cell to utilize the electric current supplied by electrodes in cellular metabolic processes. Extracellular electron transport is attributed to extracellular membrane electron carriers such as quinones (notably or anthraquinone-2,6-disulfonate (AQDS)), phenazines, flavins (Kotloski & Gralnick, 2013) and heme proteins (notably c-type cytochrome(Rosenbaum, Aulenta, Villano, & Angenent, 2011)). Furthermore, these surface electron carriers facilitate the transport of electrons on the surface of the cell to the internal metabolic processes within the cell(Thrash & Coates, 2008). The examination of the structure of the Anammox extracellular matrix reveals an abundance of c-type cytochrome. These form the majority of the extracellular complex, manifesting in the vermillion color of the Anammox aggregate(Kartal et al., 2011). C-type cytochrome, an electron transport protein in the electron transport chain has been previously demonstrated to be a feasible extracellular electron carrier in geobacter, desulfovibrio, and various other denitrifiers(Thrash & Coates, 2008).
anammox process
According to the figure above (de Almeida, Wessels et al. 2016), c-type cytochrome redox is strongly correlated with hydrazine dehydrogenase and hydrazine synthase. It may be presumed that the addition of electrons to c-type cytochrome will result in a negative charge accumulation in the Anammoxosome lumen and result in a stronger proton-motive force to generate more ATP. Previous studies involving extracellular electron transport via electrodes to biofilm bacteria revealed an increase in population growth and cellular metabolites, which is correlated with discrete current measurements spanning from 10-100mA (Thrash & Coates, 2008). I applied a series of different currents to an Anammox bioreactor to achieve two purposes: 1) To investigate if adding current to Anammox bacteria could increase the metabolic activity and thus the nitrogen removal rate and 2) If it did increase the metabolic rate, what is the optimal current for their growth?
>


Indoor Air Quality, Undergraduate Research Project

I determined the volumetric flux above various types of lightbulbs to explore the relationship between flow rate and power consumption. I designed and built a wood/plastic sheet greenhouse structure to prevent outside influence from introducing artifacts in my data. This work was presented at Spring 2016 UC Berkeley Student Undergraduate Research Forum
Language: MATLAB
Poster Presentation

Biofuels Technology Club, engineer, Spring 2017

The goal of the biofuels technology club is to produce biodiesel from cooking oil from the UC Berkeley Dining Commons. The ultimate goal would be to to first optimise and develop a laboratory scale process and then begin the process of scaling up the project to a pilot plant. Each process: the titration removal of free fatty acids, transesterfication, and water washing needed a team of engineers who were detail-oriented, were self-driven, and would work well with others. I worked with the water washing to I worked with a team of students on the water washing process: documenting how much water we were using to wash out the glycerol and communicating with downstream processes of quality testing to ensure that our biodiesel product met standards.
Project Website
Contact Info
110G Pierce Hall
Cambridge, Massachusetts 02138
ayshaw "at" g "dot" harvard "dot" edu
designed by Ada Shaw 2016. Email ashaw3895@gmail.com for web inquiries.