Saylor Academy

Massachusetts Institute of Technology: Srini Devadas' "Insertion Sort and Merge Sort"

Why bother with sorting data? This lecture discusses that question and then goes into two different sorting methods. You will notice that many search algorithms assume sorted data, especially with the large data volumes that are common today.

Transcript:

PROFESSOR: So today's lecture is on sorting. We'll be talking about specific sorting algorithmstoday. I want to start by motivating why we're interested in sorting, which should be fairly easy.Then I want to discuss a particular sorting algorithm that's called insertion sort. That's probablythe simplest sorting algorithm you can write, it's five lines of code. It's not the best sortingalgorithm that's out there and so we'll try and improve it. We'll also talk about merge sort, whichis a divide and conquer algorithm and that's going to motivate the last thing that I want to spendtime on, which is recurrences and how you solve recurrences. Typically the recurrences thatwe'll be looking at in double o six are going to come from divide and conquer problems likemerge sort but you'll see this over and over.

So let's talk about why we're interested in sorting. There's some fairly obvious applications like ifyou want to maintain a phone book, you've got a bunch of names and numbers corresponding toa telephone directory and you want to keep them in sorted order so it's easy to search, mp3organizers, spreadsheets, et cetera. So there's lots of obvious applications. There's also someinteresting problems that become easy once items are sorted. One example of that is finding amedian.

So let's say that you have a bunch of items in an array a zero through n and a zero through ncontains n numbers and they're not sorted. When you sort, you turn this into b 0 through n,where if it's just numbers, then you may sort them in increasing order or decreasing order. Let'sjust call it increasing order for now. Or if they're records, and they're not numbers, then you haveto provide a comparison function to determine which record is smaller than another record. Andthat's another input that you have to have in order to do the sorting.

So it doesn't really matter what the items are as long as you have the comparison function. Thinkof it as less than or equal to. And if you have that and it's straightforward, obviously, to check that3 is less than 4, et cetera. But it may be a little more complicated for more sophisticated sortingapplications.

But the bottom line is that if you have your algorithm that takes a comparison function as aninput, you're going to be able to, after a certain amount of time, get B 0 n. Now if you wanted tofind the median of the set of numbers that were originally in the array A, what would you do onceyou have the sorted array B?

AUDIENCE: Isn't there a more efficient algorithm for median?

PROFESSOR: Absolutely. But this is sort of a side effect of having a sorted list. If you happen tohave a sorted list, there's many ways that you could imagine building up a sorted list. One way isyou have something that's completely unsorted and you run insertion sort or merge sort. Anotherway would be to maintain a sorted list as you're getting items put into the list.

So if you happened to have a sorted list and you need to have this sorted list for some reason,the point I'm making here is that finding the median is easy. And it's easy because all you haveto do is look at-- depending on whether n is odd or even-- look at B of n over 2. That would giveyou the median because you'd have a bunch of numbers that are less than that and the equalset of numbers that are greater than that, which is the definition of median.

So this is not necessarily the best way, as you pointed out, of finding the median. But it'sconstant time if you have a sorted list. That's the point I wanted to make.

There are other things that you could do. And this came up in Erik's lecture, which is the notionof binary search-- finding an element in an array-- a specific element. You have a list of items--again a 0 through n. And you're looking for a specific number or item.

You could, obviously, scan the array, and that would take you linear time to find this item. If the array happened to be sorted, then you can find this in logarithmic time using what's called binarysearch. Let's say you're looking for a specific item. Let's call it k. Binary search, roughlyspeaking, would work like-- you go compare k to, again, B of n over 2, and decide, given that Bis sorted, you get to look at 1/2 of the array.

If B of n over 2 is not exactly k, then-- well, if it's exactly k you're done. Otherwise, you look at theleft half. You do your divide and conquer paradigm. And you can do this in logarithmic time. Sokeep this in mind, because binary search is going to come up in today's lecture and again inother lectures.

It's really a great paradigm of divide and conquer-- probably the simplest. And it, essentially,takes something that's linear-- a linear search-- and turns it into logarithmic search. So those area couple of problems that become easy if you have a sorted list. And there's some not soobvious applications of sorting-- for example, data compression.

If you wanted to compress a file, one of the things that you could do is to-- and it's a set of items-- you could sort the items. And that automatically finds duplicates. And you could say, if I have100 items that are all identical, I'm going to compress the file by representing the item once and,then, having a number associated with the frequency of that item-- similar to what documentdistance does.

Document distance can be viewed as a way of compressing your initial input. Obviously, youlose the works of Shakespeare or whatever it was. And it becomes a bunch of words andfrequencies. But it is something that compresses the input and gives you a differentrepresentation. And so people use sorting as a subroutine in data compression.

Computer graphics uses sorting. Most of the time, when you render scenes in computergraphics, you have many layers corresponding to the scenes. It turns out that, in computergraphics, most of the time you're actually rendering front to back because, when you have a bigopaque object in front, you want to render that first, so you don't have to worry about everythingthat's occluded by this big opaque object. And that makes things more efficient. And so you keepthings sorted front to back, most of the time, in computer graphics rendering.

But some of the time, if you're worried about transparency, you have to render things back tofront. So typically, you have sorted lists corresponding to the different objects in both orders--both increasing order and decreasing order. And you're maintaining that. So sorting is a realimportant subroutine in pretty much any sophisticated application you look at.

So it's worthwhile to look at the variety of sorting algorithms that are out there. And we're goingto do some simple ones, today. But if you go and look at Wikipedia and do a Google search,there's all sorts of sorts like cocktail sort, and bitonic sort, and what have you. And there'sreasons why each of these sorting algorithms exist. Because in specific cases, they end upwinning on types of inputs or types of problems.

So let's take a look at our first sorting algorithm. I'm not going to write code but it will be in thenotes. And it is in your document distance Python files. But I'll just give you pseudocode hereand walk through what insertion sort looks like because the purpose of describing this algorithmto you is to analyze its complexity. We need to do some counting here, with respect to thisalgorithm, to figure out how fast it's going to run in and what the worst case complexity is.

So what is insertion sort? For i equals 1, 2, through n, given an input to be sorted, what we'regoing to do is we're going to insert A of i in the right position. And we're going to assume that weare sort of midway through the sorting process, where we have sorted A 0 through i minus 1.And we're going to expand this to this array to have i plus 1 elements. And A of i is going to getinserted into the correct position.

And we're going to do this by pairwise swaps down to the correct position for the number that isinitially in A of i. So let's go through an example of this. We're going to sort in increasing order.Just have six numbers. And initially, we have 5, 2, 4, 6, 1, 3. And we're going to take a look atthis.

And you start with the index 1, or the second element, because the very first element-- it's asingle element and it's already sorted by definition. But you start from here. And this is what wecall our key. And that's essentially a pointer to where we're at, right now. And the key keepsmoving to the right as we go through the different steps of the algorithm.

And so what you do is you look at this and you have-- this is A of i. That's your key. And youhave A of 0 to 0, which is 5. And since we want to sort in increasing order, this is not sorted. Andso we do a swap.

So what this would do in this step is to do a swap. And we would go obtain 2, 5, 4, 6, 1, 3. So allthat's happened here, in this step-- in the very first step where the key is in the second position--is one swap happened.

Now, your key is here, at item 4. Again, you need to put 4 into the right spot. And so you dopairwise swaps. And in this case, you have to do one swap. And you get 2, 4, 5. And you'redone with this iteration.

So what happens here is you have 2, 4, 5, 6, 1, 3. And now, the key is over here, at 6. Now, atthis point, things are kind of easy, in the sense that you look at it and you say, well, I know thispart is already started. 6 is greater than 5. So you have to do nothing.

So there's no swaps that happen in this step. So all that happens here is you're going to movethe key to one step to the right. So you have 2, 4, 5, 6, 1, 3. And your key is now at 1. Here, youhave to do more work. Now, you see one aspect of the complexity of this algorithm-- given thatyou're doing pairwise swaps-- the way this algorithm was defined, in pseudocode, out there, wasI'm going to use pairwise swaps to find the correct position.

So what you're going to do is you're going to have to swap first 1 and 6. And then you'll swap-- 1is over here. So you'll swap this position and that position. And then you'll swap-- essentially, do4 swaps to get to the point where you have 1, 2, 4, 5, 6, 3. So this is the result. 1, 2, 4, 5, 6, 3.

And the important thing to understand, here, is that you've done four swaps to get 1 to thecorrect position. Now, you could imagine a different data structure where you move this overthere and you shift them all to the right. But in fact, that shifting of these four elements is going tobe computed in our model as four operations, or four steps, anyway. So there's no getting awayfrom the fact that you have to do four things here. And the way the code that we have forinsertion sort does this is by using pairwise swaps.

So we're almost done. Now, we have the key at 3. And now, 3 needs to get put into the correctposition. And so you've got to do a few swaps. This is the last step. And what happens here is 3is going to get swapped with 6. And then 3 needs to get swapped with 5. And then 3 needs toget swapped with 4. And then, since 3 is greater than 2, you're done. So you have 1, 2, 3, 4, 5,6.

And that's it. So, analysis. How many steps do I have?

AUDIENCE: n squared?

PROFESSOR: No, how many steps do I have? I guess that wasn't a good question. If I think of astep as being a movement of the key, how many steps do I have? I have theta n steps. And inthis case, you can think of it as n minus 1 steps, since you started with 2. But let's just call ittheta n steps, in terms of key positions.

And you're right. It is n square because, at any given step, it's quite possible that I have to dotheta n work. And one example is this one, right here, where I had to do four swaps. And ingeneral, you can construct a scenario where, towards the end of the algorithm, you'd have to dotheta n work. But if you had a list that was reverse sorted. You would, essentially, have to do, onan average n by two swaps as you go through each of the steps. And that's theta n.

So each step is theta n swaps. And when I say swaps, I could also say each step is theta ncompares and swaps. And this is going to be important because I'm going to ask you aninteresting question in a minute. But let me summarize. What I have here is a theta n squaredalgorithm. The reason this is a theta n squared algorithm is because I have theta n steps andeach step is theta n.

When I'm counting, what am I counting it terms of operations? The assumption here-- unspokenassumption-- has been that an operation is a compare and a swap and they're, essentially, equalin cost.

And in most computers, that's true. You have a single instruction and, say, the x86 or the MIPSarchitecture that can do a compare, and the same thing for swapping registers. So perfectlyreasonably assumption that compares and swaps for numbers have exactly the same cost. But ifyou had a record and you were comparing records, and the comparison function that you usedfor the records was in itself a method call or a subroutine, it's quite possible that all you're doingis swapping pointers or references to do the swap, but the comparison could be substantiallymore expensive.

Most of the time-- and we'll differentiate if it becomes necessary-- we're going to be countingcomparisons in the sorting algorithms that we'll be putting out. And we'll be assuming that eithercomparison swaps are roughly the same or that compares are-- and we'll say which one, ofcourse-- that compares are substantially more expensive than swaps. So if you had either ofthose cases for insertion sort, you have a theta n squared algorithm. You have theta n squaredcompares and theta n squared swaps.

Now, here's a question. Let's say that compares are more expensive than swaps. And so, I'mconcerned about the theta n squared comparison cost. I'm not as concerned, because of theconstant factors involved, with the theta n squared swap cost.

This is a question question. What's a simple fix-- change to this algorithm that would give me abetter complexity in the case where compares are more expensive, or I'm only looking at thecomplexity of compares. So the theta whatever of compares. Anyone? Yeah, back there.

AUDIENCE: [INAUDIBLE]

PROFESSOR: You could compare with the middle. What did I call it? I called it something. Whatyou just said, I called it something.

AUDIENCE: Binary search.

PROFESSOR: Binary search. That's right. Two cushions for this one. So you pick them up afterlecture. So you're exactly right. You got it right. I called it binary search, up here.

And so you can take insertion sort and you can sort of trivially turn it into a theta n log nalgorithm if we are talking about n being the number of compares. And all you have to do to dothat is to say, you know what, I'm going to replace this with binary search. And you can do that--and that was the key observation-- because A of 0 through i minus 1 is already sorted. And soyou can do binary search on that part of the array.

So let me just write that down. Do a binary search on A of 0 through i minus 1, which is alreadysorted. And essentially, you can think of it as theta log i time, and for each of those steps. And sothen you get your theta n log n theta n log n in terms of compares.

Does this help the swaps for an array data structure? No, because binary search will requireinsertion into A of 0 though i minus 1. So here's the problem. Why don't we have a full-fledgedtheta n log n algorithm, regardless of the cost of compares or swaps? We don't quite have that.We don't quite have that because we need to insert our A of i into the right position into A of 0through i minus 1.

You do that if you have an array structure, it might get into the middle. And you have to shiftthings over to the right. And when you shift things over to the right, in the worst case, you maybe shifting a lot of things over to the right. And that gets back to worst case complexity of theta n.

So a binary search in insertion sort gives you theta n log n for compares. But it's still theta n squared for swaps. So as you can see, there's many varieties of sorting algorithms. We justlooked at a couple of them. And they were both insertion sort.

The second one that I just put up is, I guess, technically called binary insertion sort because itdoes binary search. And the vanilla insertion sort is the one that you have the code for in the docdis program, or at least one of the doc dis files.

So let's move on and talk about a different algorithm. So what we'd like to do, now-- this class isabout constant improvement. We're never happy. We always want to do a little bit better. Andeventually, once we run out of room from an asymptotic standpoint, you take these other classeswhere you try and improve constant factors and get 10%, and 5%, and 1%, and so on, and soforth.

But we'll stick to improving asymptotic complexity. And we're not quite happy with binaryinsertion sort because, in the case of numbers, our binary insertion sort has theta n squaredcomplexity, if you look at swaps. So we'd like to go find an algorithm that is theta n log n.

And I guess, eventually, we'll have to stop. But Erik will take care of that. There's a reason tostop. It's when you can prove that you can't do any better. And so we'll get to that, eventually.

So merge sort is also something that you've probably seen. But there probably will be a couple of subtleties that come out as I describe this algorithm that, hopefully, will be interesting to thoseof you who already know merge sort. And for those of you who don't, it's a very pretty algorithm.

It's a standard recursion algorithm-- recursive algorithm-- similar to a binary search. What we do,here, is we have an array, A. We split it into two parts, L and R. And essentially, we kind of do nowork, really.

In terms of the L and R in the sense that we just call, we keep splitting, splitting, splitting. And allthe work is done down at the bottom in this routine called merge, where we are merging a pair ofelements at the leaves. And then, we merge two pairs and get four elements. And then wemerge four tuples of elements, et cetera, and go all the way up.

So while I'm just saying L terms into L prime, out here, there's no real explicit code that you cansee that turns L into L prime. It happens really later. There's no real sorting code, here. Ithappens in the merge routine. And you'll see that quite clearly when we run through an example.

So you have L and R turn into L prime and R prime. And what we end up getting is a sortedarray, A. And we have what's called a merge routine that takes L prime and R prime and mergesthem into the sorted array.

So at the top level, what you see is split into two, and do a merge, and get to the sorted array.The input is of size n. You have two arrays of size n over 2. These are two sorted arrays of sizen over 2. And then, finally, you have a sorted array of size n.

So if you want to follow the recursive of execution of this in a small example, then you'll be ableto see how this works. And we'll do a fairly straightforward example with 8 elements. So at thetop level-- before we get there, merge is going to assume that you have two sorted arrays, andmerge them together. That's the invariant in merge sort, or for the merge routine. It assumes theinputs are sorted-- L and R. Actually I should say, L prime and R prime.

So let's say you have 20, 13, 7, and 2. You have 12, 11, 9, and 1. And this could be L prime. Andthis could be R prime. What you have is what we call a two finger algorithm.

And so you've got two fingers and each of them point to something. And in this case, one ofthem is pointing to L. My left finger is pointing to L prime, or some element L prime. My rightfinger is pointing to some element in R prime.

And I'm going to compare the two elements that my fingers are pointing to. And I'm going tochoose, in this case, the smaller of those elements. And I'm going to put them into the sortedarray.

So start out here. Look at that and that. And I compared 2 and 1. And which is smaller? 1 issmaller. So I'm going to write 1 down. This is a two finger algo for merge. And I put 1 down.

When I put 1 down, I had to cross out 1. So effectively, what happens is-- let me just circle thatinstead of crossing it out. And my finger moves up to 9. So now I'm pointing at 2 and 9. And Irepeat this step.

So now, in this case, 2 is smaller. So I'm going to go ahead and write 2 down. And I can crossout 2 and move my finger up to 7. And so that's it. I won't bore you with the rest of the steps.

It's essentially walking up. You have a couple of pointers and you're walking up these two arrays.And you're writing down 1, 2, 7, 9, 11, 12, 13, 20. And that's your merge routine.

And all of the work, really, is done in the merge routine because, other than that, the body issimply a recursive call. You have to, obviously, split the array. But that's fairly straightforward.

If you have an array, A 0 through n-- and depending on whether n is odd or even-- you couldimagine that you set L to be A 0 n by 2 minus 1, and R similarly. And so you just split it halfwayin the middle. I'll talk about that a little bit more. There's a subtlety associated with that that we'llget to in a few minutes.

But to finish up in terms of the computation of merge sort. This is it. The merge routine is doingmost, if not all, of the work. And this two finger algorithm is going to be able to take two sortedarrays and put them into a single sorted array by interspersing, or interleaving, these elements.And what's the complexity of merge if I have two arrays of size n over 2, here? What do I have?

AUDIENCE: n.

PROFESSOR: n. We'll give you a cushion, too. theta n complexity. So far so good.

I know you know the answer as to what the complexity of merge sort is. But I'm guessing thatmost of you won't be able to prove it to me because I'm kind of a hard guy to prove something to.And I could always say, no, I don't believe you or I don't understand.

The complexity-- and you've said this before, in class, and I think Erik's mentioned it-- the overallcomplexity of this algorithm is theta n log n And where does that come from? How do you provethat?

And so what we'll do, now, is take a look at merge sort. And we'll look at the recursion tree. Andwe'll try and-- there are many ways of proving that merge sort is theta n log n.

The way we're going to do this is what's called proof by picture. And it's not an established prooftechnique, but it's something that is very helpful to get an intuition behind the proof and why theresult is true. And you can always take that and you can formalize it and make this somethingthat everyone believes. And we'll also look at substitution, possibly in section tomorrow, forrecurrence solving.

So where we're right now is that we have a divide and conquer algorithm that has a merge stepthat is theta n. And so, if I just look at this structure that I have here, I can write a recurrence formerge sort that looks like this. So when I say complexity, I can say T of n, which is the work donefor n items, is going to be some constant time in order to divide the array.

So this could be the part corresponding to dividing the array. And there's going to be twoproblems of size n over 2. And so I have 2 T of n over 2. And this is the recursive part. And I'mgoing to have c times n, which is the merge part. And that's some constant times n, which iswhat we have, here, with respect to the theta n complexity.

So you have a recurrence like this and I know some of you have seen recurrences in 6.042. Andyou know how to solve this. What I'd like to do is show you this recursion tree expansion that,not only tells you how to solve this occurrence, but also gives you a means of solvingrecurrences where, instead of having c of n, you have something else out here. You have f of n,which is a different function from the linear function. And this recursion tree is, in my mind, thesimplest way of arguing the theta n log n complexity of merge sort.

So what I want to do is expand this recurrence out. And let's do that over here. So I have c of non top. I'm going to ignore this constant factor because c of n dominates. So I'll just start with cof n. I want to break things up, as I do the recursion.

So when I go c of n, at the top level-- that's the work I have to do at the merge, at the top level.And then when I go down to two smaller problems, each of them is size n over 2. So I do c timesn divided by 2 [INAUDIBLE]. So this is just a constant c. I didn't want to write thetas up here. Youcould. And I'll say a little bit more about that later. But think of this cn as representing the theta ncomplexity. And c is this constant.

So c times n, here. c times n over 2, here. And then when I keep going, I have c times n over 4,c times n over 4, et cetera, and so on, and so forth. And when I come down all the way here, n iseventually going to become 1-- or essentially a constant-- and I'm going to have a bunch of c'shere.

So here's another question, that I'd like you to answer. Someone tell me what the number oflevels in this tree are, precisely, and the number of leaves in this tree are, precisely.

AUDIENCE: The number of levels is log n plus 1.

PROFESSOR: Log n plus 1. Log to the base 2 plus 1. And the number of leaves? You raisedyour hand back there, first. Number of leaves.

AUDIENCE: I think n.

PROFESSOR: Yeah, you're right. You think right. So 1 plus log n and n leaves. When nbecomes 1, how many of them do you have? You're down to a single element, which is, bydefinition, sorted. And you have n leaves.

So now let's add up the work. I really like this picture because it's just so intuitive in terms ofgetting us the result that we're looking for. So you add up the work in each of the levels of thistree. So the top level is cn. The second level is cn because I added 1/2 and 1/2, cn, cn. Wow.What symmetry.

So you're doing the same amount of work modulo the constant factors, here, with what's goingon with the c1, which we've ignored, but roughly the same amount of work in each of the levels.And now, you know how many levels there are. It's 1 plus log n.

So if you want to write an equation for T of n, it's 1 plus log n times c of n, which is theta of n log n. So I've mixed in constants c and thetas. For the purposes of this description, they'reinterchangeable. You will see recurrences that look like this, in class. And things like that.

Don't get confused. It's just a constant multiplicative factor in front of the function that you have.And it's just a little easier, I think, to write down these constant factors and realize that theamount of work done is the same in each of the leaves. And once you know the dimensions ofthis tree, in terms of levels and in terms of the number of leaves, you get your result.

So we've looked at two algorithm, so far. And insertion sort, if you talk about numbers, is theta n squared for swaps. Merge sort is theta n log n. Here's another interesting question. What is oneadvantage of insertion sort over merge sort?

AUDIENCE: [INAUDIBLE]

PROFESSOR: What does that mean?

AUDIENCE: You don't have to move elements outside of [INAUDIBLE].

PROFESSOR: That's exactly right. That's exactly right. So the two guys who answered thequestions before with the levels, and you. Come to me after class.

So that's a great answer. It's in-place sorting is something that has to do with auxiliary space.And so what you see, here-- and it was a bit hidden, here. But the fact of the matter is that youhad L prime and R prime. And L prime and R prime are different from L and R, which were theinitial halves of the inputs to the sorting algorithm.

And what I said here is, we're going to dump this into A. That's what this picture shows. Thissays sorted array, A. And so you had to make a copy of the array-- the two halves L and R-- inorder to do the recursion, and then to take the results and put them into the sorted array, A.

So you needed-- in merge sort-- you needed theta n auxiliary space. So merge sort, you needtheta n extra space. And the definition of in-place sorting implies that you have theta 1--constant-- auxiliary space.

The auxiliary space for insertion sort is simply that temporary variable that you need when youswap two elements. So when you want to swap a couple of registers, you gotta store one of thevalues in a temporary location, override the other, et cetera. And that's the theta 1 auxiliaryspace for insertion sort.

So there is an advantage of the version of insertion sort we've talked about, today, over mergesort. And if you have a billion elements, that's potentially something you don't want to store inmemory. If you want to do something really fast and do everything in cache or main memory, andyou want to sort billions are maybe even trillions of items, this becomes an importantconsideration.

I will say that you can reduce the constant factor of the theta n. So in the vanilla scheme, youcould imagine that you have to have a copy of the array. So if you had n elements, youessentially have n extra items of storage. You can make that n over 2 with a simple coding trickby keeping 1/2 of A.

You can throw away one of the L's or one of the R's. And you can get it down to n over 2. Andthat turns out-- it's a reasonable thing to do if you have a billion elements and you want to reduceyour storage by a constant factor. So that's one coding trick.

Now it turns out that you can actually go further. And there's a fairly sophisticated algorithm that'ssort of beyond the scope of 6.006 that's an in-place merge sort. And this in-place merge sort iskind of impractical in the sense that it doesn't do very well in terms of the constant factors.

So while it's in-place and it's still theta n log n. The problem is that the running time of an in-placemerge sort is much worse than the regular merge sort that uses theta n auxiliary space.

So people don't really use in-place merge sort. It's a great paper. It's a great thing to read. Itsanalysis is a bit sophisticated for double 0 6. So we wont go there. But it does exist. So you cantake merge sort, and I just want to let you know that you can do things in-place.

In terms of numbers, some experiments we ran a few years ago-- so these may not becompletely valid because I'm going to actually give you numbers-- but merge sort in Python, ifyou write a little curve fit program to do this, is 2.2n log n microseconds for a given n.

So this is the merge sort routine. And if you look at insertion sort, in Python, that's something like0.2 n square microseconds. So you see the constant factors here.

If you do insertion sort in C, which is a compiled language, then, it's much faster. It's about 20times faster. It's 0.01 n squared microseconds. So a little bit of practice on the side. We do askyou to write code. And this is important. The reason we're interested in algorithms is becausepeople want to run them.

And what you can see is that you can actually find an n-- so regardless of whether you're Pythonor C, this tells you that asymptotic complexity is pretty important because, once n gets beyondabout 4,000, you're going to see that merge sort in Python beats insertion sort in C.

So the constant factors get subsumed beyond certain values of n. So that's why asymptoticcomplexity is important. You do have a factor of 20, here, but that doesn't really help you interms of keeping an n square algorithm competitive. It stays competitive for a little bit longer, butthen falls behind.

That's what I wanted to cover for sorting. So hopefully, you have a sense of what happens withthese two sorting algorithms. We'll look at a very different sorting algorithm next time, usingheaps, which is a different data structure.

The last thing I want to do in the couple minutes I have left is give you a little more intuition as torecurrence solving based on this diagram that I wrote up there. And so we're going to useexactly this structure. And we're going to look at a couple of different recurrences that I won'treally motivate in terms of having a specific algorithm, but I'll just write out the recurrence. Andwe'll look at the recursion tree for that. And I'll try and tease out of you the complexity associatedwith these recurrences of the overall complexity.

So let's take a look at T of n equals 2 T of n over 2 plus c n squared. Let me just call that c-- noneed for the brackets. So constant c times n squared.

So if you had a crummy merge routine, and it was taking n square, and you coded it up wrong.It's not a great motivation for this recurrence, but it's a way this recurrence could have come up.

So what does this recursive tree look like? Well it looks kind of the same, obviously. You have c n square; you have c n square divided by 4; c n square divided by 4; c n square divided by 16,four times. Looking a little bit different from the other one.

The levels and the leaves are exactly the same. Eventually n is going to go down to 1. So youwill see c all the way here. And you're going to have n leaves. And you will have, as before, 1plus log n levels. Everything is the same.

And this is why I like this recursive tree formulation so much because, now, all I have to do isadd up the work associated with each of the levels to get the solution to the recurrence. Now,take a look at what happens, here. c n square; c n square divided by 2; c n square divided by 4.And this is n times c. So what does that add up to?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Yeah, exactly. Exactly right. So if you look at what happens, here, this dominates.All of the other things are actually less than that. And you said bounded by two c n squarebecause this part is bounded by c n square and I already have c n square up at the top.

So this particular algorithm that corresponds to this crummy merge sort, or wherever thisrecurrence came from, is a theta n squared algorithm. And in this case, all of the work done is atthe root-- at the top level of the recursion. Here, there was a roughly equal amount of work donein each of the different levels. Here, all of the work was done at the root.

And so to close up shop, here, let me just give you real quick a recurrence where all of the workis done at the leaves, just for closure. So if I had, magically, a merge routine that actuallyhappened in constant time, either through buggy analysis, or because of it was buggy, then whatdoes the tree look like for that?

And I can think of this as being theta 1. Or I can think of this as being just a constant c. I'll stickwith that. So I have c, c, c. Woah, I tried to move that up. That doesn't work.

So I have n leaves, as before. And so if I look at what I have, here, I have c at the top level. Ihave 2c, and so on and so forth. 4c. And then I go all the way down to nc.

And so what happens here is this dominates. And so, in this recurrence, the whole thing runs intheta n. So the solution to that is theta n. And what you have here is all of the work being done atthe leaves.

We're not going to really cover this theorem that gives you a mechanical way of figuring this out because we think the recursive tree is a better way of looking at. But you can see that, depending on what that function is, in terms of the work being done in the merge routine, you'd have different versions of recurrences. I'll stick around, and people who answered questions, please pick up you cushions. See you next time.

Source: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-3-insertion-sort-merge-sort/
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License.

Last modified: Friday, September 22, 2017, 12:56 PM

CS201: Elementary Data Structures

Massachusetts Institute of Technology: Srini Devadas' "Insertion Sort and Merge Sort"