Algorithms — why, when, where

It’s vital to know good algorithms to be good programmers, and at the age of Machine Learning and AI, it’s crazy important to know them well. If you know and can create programmes using good algorithms, businesses don’t have to spend a lot on hardware, although it’s inexpensive, it still costs a lot for scalability if not programmed well.

It’s no rocket science and shouldn’t get intimidated, once we study them in University but almost never have to write them or think much about them as we get to use such good frameworks which take care of them for us. We tend to forget about them and going back to them feels like good back to basics or feels a bit boring. Though you don’t have to remember them all by heart knowing some of them which you can think of in terms of their applications in real-world programs is all that matters.

Recursion

When we have data in the hierarchal/nested format, iteration through recursion becomes first choice to do that. In recursion, method (function) call themselves until a certain condition is met. But have to be very careful as if it doesn’t end (condition never met), it cause Stackoverflow. As I explained in my previous article on Data structure, Stack DS is best for function calls, when we put something indefinitely in any data structure, that’s overflow.

Remember the movie Inception, dream within dreams and if they couldn’t wake up, they have to live in dream indefinitely, by the way, still not sure if in the end it was real or dream. That was recursion of dreams.

It’s very easy to program but as I said before, has to make sure it ends.

Or as our lovely Dory says, Just keep swimming, just keep swimming in FindingDory movie… she wants to swim recursively ;)

Recursion can be also replaced by Iterative approach (loops) and vice versa so it’s your choice as a programmer, I usually prefer recursion over iteration due to its more dynamic and less code approach. Based on the following example, you be the judge, I agree, it’s a bit easily confusing and difficult to debug sometimes

In terms of the application of recursion, if dealing with a tree or converting into a tree data structure, sorting it, searching in it, recursion is more advisable. rule of thumb for me is, you can use recursion in almost everywhere iteration is used so wherever you see solution of sort Divide and Conquer the problem when dividing into subproblems and keeping instances in same nature, you can go for recursion over iteration.

Be mindful, with recursion if not used properly, space complexity and StackOverflow can happen.

Sorting

Sorting is very important especially when dealing with big data, most of the time when we working/developing commercial business applications, its no big deal but when we need to work on medium to the large size of application with large data to process, sorting could be quite a time/space complexity wise.

The good thing is we don’t have to reinvent the wheel and could use framework based sorting algorithms but still knowing which algorithm to use is the key to success here as there is no single winner algorithm there it all depends how much data is there and what kind of data we dealing with. so based on the nature of input we determine which sorting algorithm is right.

Bubble Sort

Probably one the easiest to implement and worst-performing in most cases as no pain no gain. The reason I want to start with this is its simple to implement. so just to revise it, you compare each element with the next one and swap elements if needed and you keep doing it until you don’t see it needed any more, but as it is done through nested loops so you can imagine its time complexity is O(n ^ 2) but space complexity is not bad. Just to jog your memory look at the code below

Selection Sort

It’s not much different from Bubble sort int terms of performance but implementation wise. It doesn’t swap with next element instead it swaps with the smallest item in the series.

so time complexity wise O (n²) but space complexity is good.

for implementation wise, still required nested loops which is nay in an ideal world

Insertion Sort

It’s better than the above two in many ways and quite natural in terms of implementation. As when you go through the items, you compare with next one if it is smaller you swap but not only with itself, you keep comparing and swapping with all previous items and insert it into its right place once for all.

Normally you do this when a data set is small and already quite sorted. The implementation below shows it’s still using nested loops cause more than linear time complexity O (n²)

Merge Sort

Finally, some sorting algorithm which doesn’t rely on nested loops and has better time complexity than O(n²), in fact, its worst, best and avg times are same O(n log(n))

Divide & Conquer is technique played here, first, you divide the list into half then it’s half until you left each item individually to compare and then you start a comparison and use bubble sort to sort at first stage then you compare each item with its pair and next pair and start a group in sorted form. It’s more like a Divide and Merge.

It’s fast and good, maybe a sound bit confusing and complicated but the following gif should help

As I mentioned before, recursion is a good way to implement merge sort.

One thing I want you to google yourself, Stable vs Unstable Sort algorithms… and find out if Merge sort is stable or unstable?

Quick sort

Quicksort also works in Divide & conquer rule and very efficient. Performance-wise in avg and best scenario, it’s same as Merge sort but in worst case scenario it’s quite worst than Merge sort so chooses wisely, know your data before you choose your algorithm.

In terms of implementation, you choose one element in the list and lets called it PIVOT and then you compare with rest of the elements and try to bring it to its actual place. once you do that, PIVOT element might end up somewhere in middle and you have two lists, one on its right and other on its left same as Merge sort. in those sides (lists) you chose another PIVOT and start comparing and keep doing it until you sort everything, recursion would be best fit for such implementations.

Now the million $ question: which sort to use when

Obviously, there is no hard and fast rule but I’ll try to give you an idea.

Insertion sort : a small set of list with mostly sorted data.

Bubble sort : kind of never should have used but probably mostly used as its super easy, still ok with a small set of data. You mostly use in conjunction with other algo like Merge sort.

Selection sort : don’t use it, I repeat don’t use it. performance very bad.

Merge sort : probably should be mostly used, even in a worst-case scenario, it sorting time is same as best time complexity O(n log(n)) but space complexity is high O(n) so if you doing it in memory with big data, know your limits. Its stable sort as well ;-)

Quick sort : probably the most popular one, but if you don’t pick PIVOT properly then worst case scenario time complexity is very bad but normally in avg time and best time scenario it’s bit better than merge sort. especially in terms of space complexity O(log(n))

Heap sort

Heap sort applies on Binary Heap data structure only and very efficient, especially in term of space complexity as it sorts in place so no extra space is needing.

It is similar to selection sort where we first find the maximum element and place the maximum element at the end. We repeat the same process for the remaining elements.

Heap sort performance is almost same as aforementioned sorting algo O(n log(n)) but space complexity is much better but obviously, you can apply this on Binary heap data structure.

so the question is if we can do better than O(n log(n)) … the answer is YES but…. aforementioned algos are comparison sorts so you have to think out of the box to achieve better performance which in this case would be non-comparison sort. so the question is if there are better and faster algos why don’t we just use them, the answer is not every algo is universally applicable. Non-comparison sorting algorithms like counting and Radix sort are only applicable on numerical (integer type) data eventually you achieve time complexity of O (n+k) or O(nk) which is bit confusing and debatable as not everyone thinks it’s faster always

Counting sort visualization

Radix sort visualization

There is way more number of sorting algorithms and different algorithms are there for different types of data. You probably have never needed to implement any algorithm from scratch but knowing them pays well. Good luck…

ref: bigocheatsheet.com and wikipedia.com

Search Algorithms

There are different search algorithms e.g, linear, binary, BFS, DFS. For revision purposed I’ll talk about some of these and others:

Linear search

Definitely most simplest and that’s why most popular but kind of not very efficient kind of search it is as in worst case you have to go through all items at least ones. Nevertheless, you can use this search on any kind of data, its universally applicable and can be applied to any type of data whereas other search. Time complexity wise O(n)

Binary search

When you hear binary, quickly understand, its something to do with either this or that, with binary search, if there is a list of items, you don’t start your search from start or bottom of the list, you start from the middle if it doesn’t match, you see if the number is bigger or smaller if smaller you move to left otherwise left and then again compare the middle item of that side of the list. so, in that case, you rule out very quickly one side of the list which saves time but it can’t be applied on non sorted data if data is sorted then only you can apply otherwise you never going to get right results. Time complexity wise its O(n log(n))

I believe, as its only applicable on sorted data so obviously sorting times needs to be added but it all depends, if you have more search-oriented application or CRUD oriented, that’s why its not like Binary search is better than Linear then you only and always implement Binary but it all depends on application/components need.

Traversel Techniques — BFS vs DFS

When we learn tree/graph data structure, we are explained in simple numbers but in real world applications we have more complex data than just simple numbers, we have objects containing different kind of information like names, age, address and when we store them in tree/graph format and have to search through them, we can’t just compare each node with binary search it becomes more of a linear search then as we have to go through all nodes so then we need to decide what would be best technique, how we added our data in tree structure. With BFS, we search all elements on each level and with DFS we search child nodes on one side then move to other, its better explained in illustration below:

Do you remember, InOrder, PreOrder, PostOrder…. does it ring any bell, without googling, if you do, mention in a comment what you remember

BFS is very good to find closest, imagine in real-world, closest or shortest Path in google maps, related items on amazon, closest friend on FaceBook, closest connection on LinkedIn. It takes more memory than DFS.

DFS is good to find if there is something exists, find the deepest connection, further-est connection or can find if that path even exists. Bit slower than BFS when searching but it depends on how data arranged so not always the case. Better memory-wise operations.

BFS and DFS are not searching algorithms rather more of a traversal technique which are used in searching algorithms like Dijkstra (BFS based with weighted graph and non-negative weights to find shortest paths) or Bellman-Ford (shortest pathfinding with negative weight as well, time complexity wise O(n²) which is quite a time taking).

Love writing, learning, sharing and love being sarcastic :)