Work Experience

Knights of Columbus

02 June 2025 — 25 July 2025

This internship was an amazing hands on learning experience. Not only did I do interesting work, I was given useful and meaningful projects.

The first project I was given was to input and verify data in the Knight’s yearly expense study. It was an interesting project, because I was able to comb through the excel workbook and investigate how it all was put together. Just as an example, for some processes, expenses were broken down by department, and each department was given weights by how they spend their money. This involved matrix multiplication, which I was not expecting.

After that, my supervisors realized I was more competent in coding with python than they expected. So I was given a new project: creating a logarithmic regression model to predict whether a policy holder will decrement from our active lives pool by death or not. This project was a great challenge for me, because my job was to reverse engineer and debug a script that was already trying to do that. For the sake of longevity and maintainability, I developed a python class with multiple methods and dynamically instantiated attributes. This allowed the code to be more modular, and so small changes and debugging became far easier. I also had to make some interesting features, such as exposure and duration. There were some interesting caveats, however. The study was not done from a policy year perspective, it was done from a calendar year perspective. That leaves interesting questions such as what duration is a policy in if it was created on, say, June 30, 2024 and it is now 2025? Is it in its first year or second? Or both? We decided for the sake of the regression to make this feature continuous, and to simply do a weighted interpolation for the duration. For the previous example, it would have been 0.5*1 + 0.5*2.

That was all the hard part. The easy part was implementing the logarithmic regression model from scikitlearn, and what was cool about that was the fact that I got to brush up on my knowledge because my supervisor gave me a good resource to use (IBM’s website) and helped explain the different parts of the regression. The coolest part, in my opinion, was the interaction between our objective function (which was modified to have our probabilities exponentiated to our total exposure for the year according to best actuarial practices) and the minimization function. It was a really cool thing to code how those two functions interacted.

Another really cool project I had was to automate a process that was identified as having many vectors of user error. Before the automation, it was done in Microsoft Access, and so I had to go on access and reverse engineer the SQL queries and rewrite them in python. What made this project far more interesting was the fact that all our input data was in position delimited files. This required me to go investigating through the process on Access and to find where the positions were actually hardcoded in to read. Then, I had to pull that data into python, clean it, and export it as a list of tuples for pandas to use as an argument to read a fixed width file. After that, the coding was easy. Easy until the verification process, however. I had a suspicion that my output and the expected output were the same, just sorted in a different order. That meant my verification methods (pandas .compare() and .equals()) always returned false. There were some records which were almost duplicates, however, and I suspect that this is because whoever was working with the policyholder mis-inputed something and simply made a new record instead. So, I had records which only differed by potentially one column (out of 40+). This meant that I had to sort each dataframe by every. single. column. Which worked, after I made sure that all the datatypes matched and were correctly typecasted.

A more mathematical project I had was to work with our new and old “Life Insurance Margin Adequacy Test” worksheets. The auditors told my supervisor that the worksheet was too convoluted and needed to be simplified, so a simplified sheet was made. The only problem was that the simplified sheet had different numbers, when supposedly they were supposed to be the same. The reasoning is that the old sheet converted rates from spot rates to forward rates, and was convoluted in that sense. I found that actually the inputs were different. Both the new and old worksheets were pulling from the same financial service’s data for the risk free rates, but the old method only pulled specific rates, and interpolated the rest. The reasoning behind this was because our data for spread only for specific years.