
The things I most often call out when reviewing pull requests
Josh, Tech Lead, dives into what he looks for when reviewing pull requests at Cleo. If you're looking to write code that scales better, this is a must read.
This is some text inside of a div block with Cleo CTA
CTASigning up takes 2 minutes. Scan this QR code to send the app to your phone.
Part three of Murad's Computer Science 101 series.
This article is part of a series on computer science fundamentals. If you’re new here, you should probably catch up with previous pieces before reading on.
In the previous post, we’ve learned that inserting values into a Linked List is much faster than doing the same thing, but with an Array. But how much faster? Worry not, the computer science nerds already have just the thing for you, and it’s called the Big-O notation: O(n)
.
So what do we mean when we say that a mathematical function f(n)
is O(g(n))
?
To try and put it simply, it means that at all values of n
that are big enoughᵀᴹ, functions f(n)
and g(n)
grow at the same pace.
Where did that g(n
) come from? Don’t worry, it’s just maths speak. g(n)
could be simply n
, then we’re talking about O(n)
. If g(n) = n2
, then we’re talking about O(n2)
. If g(n) = a
n
, then we’re talking about O(a
n
)
.
So, we don’t really care about constant multipliers here. For example, 3n = O(n)
. And 30n = O(n)
. and 3000n = O(n)
. Because when n
approaches infinity, these finite numbers dwarf in comparison.
And this is very important to realise. Complexity matters for big numbers of N
.
If you’re processing an array of 10 items, don’t care about complexity. If you’re processing an array of 100 items, don’t care about complexity. If you’re processing an array of 10,000, do care about complexity.
Yes, 10,000 is a bit far from infinity, but we’re not in the science world, we’re in the real world. There’s no infinity in the real world. 10,000 will have to do.
Okay, with Big-O notation, we have the tool to classify the complexity of our algorithms and data structures. How do we measure it, though?
First thing is, we have to understand what N
is for us. N
must be directly related to the inputs (note the plural form) of our algorithm. Let’s consider the algorithm of inserting a new number into an array.
What are its inputs?
Is it only the number K that we need to insert? Ohh no. Our array of size N is also an input.
A few considerations here.
1. Surely, it does not matter how big our number K is, so its value would play not part in the complexity of our algorithm. That’s because we know that all numbers take up the same amount of space in memory. (See this blog post for a little refresher).
2. Let’s assume that we only need to relocate the the numbers after position
. That would mean that we need to relocate anywhere from 0 to N
numbers. However, in complexity estimations we usually tend to estimate the worst case, so we would assume that we need to move N
numbers.
So, we need to move N
numbers, where N
is the size of the array. So, our complexity is O(n)
!
And to close the loop (pun not intended) on linked lists, our complexity for prepending a number would be O(1)
- that’s because we always know where the start is. However, it would still be O(n)
for inserting a number at an arbitrary place - that’s because we would first need to make N jumps to get there from the start of the list. Life sucks!
Let’s look at a few implementations of the infinite change calculator and try to estimate their complexity.
Here, we have two loops, one is nested inside another. The inner loop finds the biggest denomination that is less than or equal to the amount
. The outer loop decrements the amount
by the value of the biggest denomination.
A common misconception is that this algorithm has a complexity of O(n^2)
because of the nested loops.
However, let’s consider our inputs here. We have only got one, amount
, which is our N
. The list of denominations, however, is static, and is always of size 7.
That means that our inner loop has a static complexity, or O(1)
(even though it’s 7, we don’t use multipliers in the Big-O notation).
Our outer loop, however, has a linear complexity. If amount
is 200, we need one loop iteration. If amount
is 2000, we need 10 loop iterations. If amount
is 20,000,000, we need 10,000 loop iterations. So, technically, it’s N/200
, but as we said before, we don’t do multipliers, so it’s O(n)
Now, to find the the complexity of nested loops, we need to multiple the complexities of each individual loop. O(n) * O(1) = O(n)
.
Let’s look at this solution.
It seems like we have only loop here. And this loop has a complexity of O(1)
, as we have seen earlier.
divmod
is a nifty little method that does integer division and returns both the result of the division, as well as its modulo, aka remainder. E.g., 250.divmod(200) == 1, 50
.
The cool thing about these divisions is that they’re done by the highly efficient CPU. While not as quick as integer addition, integer division is still significantly faster than our naive algorithm from the previous section. For all intents and purposes, we can consider integer division to have complexity of O(1)
.
So what, is our total complexity a coveted O(1)
now?
Not exactly.
If we’re calculating change for 20,000,000 again, we would need to spend 10,000 loop iterations just to build the result. All that CPU efficiency goes straight into the bin!
Can we solve it? Yes!
With one simple substitution, get rid of the one final O(n)
operation, thus bringing our complexity to O(1)
. Congratulations!
Another common thing computer science sometimes considers is the memory complexity. Similarly to compute complexity, which basically measures how the amount of loop iterations grows with the inputs, memory complexity measures how memory consumption grows with the same inputs.
For example, say we needed to square every number in an array. We could write something like:
which would have memory complexity of O(n)
. Why? Because map
leaves the original array_of_numbers
intact, and instead stores our array_of_squares
in a new place. So, if our input array is a 10GB beast, we would end up using another 10GB on top.
How could we fix it?
map!
modifies the array in place. In fact, that’s true for most Enumerable
methods with a bang: select!
, map!
, compact!
, etc.
Boom, we’re at O(1)
again. This algorithm’s memory consumption does not grow with its input size.
Hopefully, we have learned what Big-O notation means, and even how to use it to estimate compute complexity. With any luck, it should make reasoning about all the new and exciting data structures that I’m about to introduce… just a tiny bit easier.
Next on our (linked) list are trees. The exciting things are finally coming, I promise!
Josh, Tech Lead, dives into what he looks for when reviewing pull requests at Cleo. If you're looking to write code that scales better, this is a must read.
Last year, our Heroku Postgres database was fast approaching its 3TB storage limit, the maximum size that Heroku supports at time of writing. To make sure our community of 4 million people could continue to work on their financial health with Cleo, it was essential to migrate away from Heroku as quickly and smoothly as possible.
Benj, Head of Data Science, explains how we're using the latest advances in Large Language Models (LLMs) to enhance Cleo's chat.