All about memory management in .NET and important related topics.
Introduction
This article is about memory management in .NET framework and some of the important topics and concepts related to it.
Agenda
These are the topics we are going to cover in this article.
👉 Stack Memory
👉 Heap Memory
👉 Variables Allocation
👉 What About System.String?
👉 Boxing and Unboxing
👉 Garbage Collection
Heap Memory Types
What Is Garbage?
Performance Measures
Small Objects Heap (SOH) Memory Generations
What About Large Objects Heap (LOH)?
Garbage Collection Triggers
Garbage Collection Process
👉 Dispose and Finalize
Managed and Unmanaged
Memory Leak
Finalization Process
Finalizer Implementation
Dispose and Finalize Design Pattern
Stack Memory
Stack memory is allocated into computer’s RAM. It is used for static memory allocation.
The advantages of the Stack memory are:
Variables allocated are stored directly to the memory.
Allocation is done when the program is compiled.
Access to this memory is very fast.
Allocations on the Stack are reserved in Last In First Out (LIFO) manner which means that the most recently reserved block is always the next block to be freed. That’s why the Stack memory is used to keep track of the nested functions calls.
Let’s say that we have the following simple piece of code.
Assuming that the comments actually represent some code to get executed, when we start calling Function1, this is what is going to happen.
As you can see, at the start of every function, a Stack frame is created and this frame is not deallocated till the function is fully executed. When a child function is called, a child frame is created and so on,…
Worth to mention here is that Stack memory is not shared between threads. In other words, for every thread, a new Stack memory is allocated which is not shared with any other threads.
Heap Memory
Heap memory is allocated into computer’s RAM. It is used for dynamic memory allocation.
Allocations on the Heap are done at run time and can be accessed randomly.
Accessing the Heap memory is a bit slower, but the Heap size is only limited by the size of the virtual memory.
The Heap Memory is divided into two parts:
Small Objects Heap (SOH)
Large Objects Heap (LOH)
More details about this would come later in this article.
Worth to mention here is that Heap memory is shared between threads.
Variables Allocation
Any variable used in .NET would be one of two types; Value type or Reference type.
Value types are primitive data types like Byte, SByte, Int16, Int32, Int64, UInt16, UInt32, UInt64, Single, Double, Boolean, Char, Decimal, IntPtr, UIntPtr, and Structs.
Reference types are the other types defined by the developer including Classes, Interfaces, Delegates, Strings and Object.
Now, you might ask:
Where are these variables stored? are they stored in the Stack memory? are they stored in the Heap memory?
The answer of this question is:
A Reference type object is stored in the Heap memory and a reference to it is stored in the Stack memory.
A Value type object is always stored in the Stack memory unless it is defined on the first level of a class. In this case, it is stored in the Heap memory.
You don’t believe me? Let me show you.
Let’s say that we defined the following two classes:
Now, let’s say that we have this simple application code:
Here we would examine all cases one by one and see what would be allocated into the Heap memory. To do this, we would use Rider memory tracer which shows us the Heap memory allocations and monitor the changes happening between calls.
The first step is to examine when defining an int as we did on line 6. So, we added a breakpoint on lines 6 and 10. When we hit the breakpoint on line 6, and before the line itself was executed, we refreshed the Heap memory tracer on Rider. Then we hit continue so that the debugger hits the breakpoint on line 10. Then, we refreshed the Heap memory tracer on Rider again to see the difference.
And this was the difference:
See, nothing is allocated into the Heap Memory.
Now, let’s examine when defining an instance of a class which doesn’t have any Value type defined on its first level. Also, it has a method where we defined a Value type as a local member of the method.
So, repeating the same steps with breakpoints on lines 10 and 15, we would get the following result:
See, we have only one new entry allocated into the Heap memory for the instance of the class itself. However, we don’t have any allocations for the int id variable defined inside the method F1 into the Heap memory.
Now, let’s examine when defining an instance of a class which has a Value type defined on its first level.
So, repeating the same steps with breakpoints on lines 15 and 17, we would get the following result:
See, we have only one new entry allocated into the Heap memory for the instance of the class itself. However, when we check this entry into the Heap memory, we would find that it also includes the int Id class member.
Therefore, as a quick summary:
Reference types are always stored into Heap memory with a reference into Stack memory.
If a Value type is defined into a function, it would be stored into the Stack memory.
If a Value type is defined as a class first citizen (on the first level of a class), it would be stored into the Heap memory.
If you are interested into watching a video with an explanation of this topic, I really recommend watching Nick Chapsas's video. I stumbled on it while preparing for this article and I liked so much the way he simplified the whole thing.
Thanks Nick Chapsas for this great video 🙂
Worth to mention here, I had already published an article about the different ways of passing a parameter to a method in .NET C#
It is important to know about this topic as well. Therefore, if you are interested, you can read the article Passing Parameters to a .NET C# Method.
What About System.String?
As a .NET developer, you might have heard about that nature of System.String. It is somehow unique and with this uniqueness comes some abnormalities.
When dealing with System.String, you need to keep in mind that it is not just a Reference type, it is a special one. Understanding this would for sure affect the way you are dealing with strings in your application.
I have already written an fully detailed article about this. I really recommend that you read it. If you are interested, you can read the article How String In .NET C# Works.
Boxing and Unboxing
If you are a .NET developer, most probably you heard before about Boxing and Unboxing.
Let me show you a quick code sample:
At line 1, we defined an int. On line 2, we defined a new Object variable and set it to the same int we defined on line 1.
What happens behind the scenes is that the int variable is wrapped into an object and stored into the Heap memory. This is what we are calling Boxing.
On line 3, we are casting the Object variable back to an int. Therefore, the object is unwrapped back to an int. This is what we are calling Unboxing.
Now, you might ask:
Why would we do something like that? Isn’t it weird?
Actually no. Sometimes Boxing and Unboxing happens without you even know due to the design of some .NET built in functions and modules.
Let me show you something interesting. Let’s say that you are implementing a simple function which would accept any kind of object as a parameter and do some basic thing like calling .ToString() on this object.
You might think that this is a good way of doing it:
public static void Func(object param)
{
param.ToString();
}
It would work and do the trick but it would also trigger Boxing if you call this function passing in an int or some struct.
Do want to know how to fix this? Let me show you.
And calling these methods would be as follows:
And the memory analysis would be as follows:
See, this is actually one of the benefits of using Generics. However, this is not always an easy option…
In our example, it is easy to write the same function using Generics because it is only about one parameter. However, what if it is about unlimited number of parameters which could have different types?
Let’s examine the implementation of String.Format in .NET
Do you notice the expected parameters? they could be of any count and any type. Therefore, if we try to implement this method using Generics, we would fail.
Therefore, in this kind of cases, we can’t avoid Boxing.
Furthermore, just as a fun fact, when you do something like this:
int id = 1;return $"The generated id equals {id}";
You are actually doing Boxing because this would eventually use String.Format method.
Garbage Collection
In some languages like C and C++, the developer is responsible for cleaning up the allocated objects which is an overhead beside the risk of missing some allocated objects which would cause a memory leak at the end.
In .NET, it is different. The .NET framework takes over the task of allocating objects into the Heap and it manages object allocations on your behalf.
As a .NET developer, all what you need to do is to define your variables and .NET will take care of creating, initializing and placing the object on the right memory.
Furthermore, .NET keeps track of your objects and knows when an object is not needed anymore so that it could be deallocated.
So, now you might ask:
What is garbage collection?
Before answering this question, let me tell you about some important things that you first need to understand before jumping to garbage collection.
Heap Memory Types
As we said before, the Heap memory is divided into two parts; Small Objects Heap (SOH) and Large Objects Heap (LOH).
The Small Object Heap (SOH) holds the allocated objects that are less than 85K in size.
The Large Object Heap (LOH) holds the allocated objects that are greater than 85K in size.
The SOH and LOH are different in the way that the .NET framework manages them. Just keep this in mind and we will get back later to this point.
What Is Garbage?
The word Garbage refers to the objects allocated which are not used anymore and could be totally removed and deallocated by the Common Language Runtime (CLR) from the memory.
Now, you might ask:
How does the CLR know if an object is garbage?
The answer is simply by checking if the object is referenced -directly or indirectly- by a root.
I know that you might ask:
And what a root is?
A root could be one of the following:
Reference in Stack memory.
Global object.
Static object.
When we say that an object is directly referenced by a root, this means that the object is referenced by one of the root types described above.
On the other hand, when we say that an object is indirectly referenced by a root, this means that the object is referenced by another object in the Heap which is already referenced -directly or indirectly- by a root.
This means that there could be a series of references between objects and unless the first object in the series is directly referenced by a root, the whole series is considered as Garbage.
What the CLR does is that it maintains an updated graph of rooted objects to be used whenever the CLR needs to check if a certain object is rooted or not.
Therefore, now the CLR has all the information needed to decide if an object is Garbage or not, right?
Performance Measures
Now we know how the CLR decides if a certain object is Garbage or not. However, we need to keep in mind the number of objects allocated into the Heap.
At a certain moment, there could be a huge number of objects already allocated into the Heap. Therefore, if the CLR just checks all these objects every time some memory is needed for a new object to get allocated, this would have a bad impact on the overall performance.
That’s why the CLR needed to follow another approach when analyzing and handling memory allocations.
Genius minds at Microsoft thought of a great idea to tackle this challenge.
Small Objects Heap (SOH) Memory Generations
The genius minds at Microsoft decided to divide the SOH into three parts so that the CLR can only analyze and process one part at a time.
This means that the CLR would be able to deal with fewer number of objects every time instead of dealing with the huge number of objects.
Now, you might ask:
Based on what these three parts are decided?
This is a good question. The idea behind the three parts is based on the nature of objects being allocated into the SOH.
Simply, new objects tend to get deallocated faster than older objects which could be referenced by global or static roots.
Based on that concept, the CLR divides the SOH into three parts which we call Generations.
This means that the SOH is divided into:
Generation 0: Objects allocated here tend to get deallocated faster than objects in Generation 1 and 2.
Generation 1: Objects allocated here tend to get deallocated faster than objects in Generation 2.
Generation 2: Objects allocated here tend to stay for longer time.
Now, you might ask:
Are the three generations all have the same size?
The answer is simply no.
The sizes of these three generations are decided by the CLR at runtime based on the number of objects that get allocated into each generation.
In other words, the CLR adapts the sizes of these three generations at runtime aiming for the best performance based on the history of objects allocations.
What About Large Objects Heap (LOH)?
The CLR manages objects allocations in the LOH in a different way. It is not divided into parts or generations.
Actually, people call the LOH as Generation 3 just to follow the same naming convention but in the real world, it doesn’t follow the same process as the SOH.
The LOH is not compacted because as we said before the objects allocated into the LOH are large objects and it would take too long to copy these large objects on top of unused ones. This would have a bad impact on the whole performance.
That’s why the LOH tracks all the free and used memory locations and space, and attempts to allocate new objects into the most appropriately-sized free slots left behind by collected objects.
Now you might ask:
What if there is no an appropriately-sized free slot for a new object to get allocated?
The short answer is Fragmentation.
The LOH needs to undergo fragmentation from time to time to collapse the free spaces between allocated objects so that there is more available appropriately-sized slots for new objects to get allocated.
Garbage Collection Triggers
Now we know what Garbage is and how the CLR decides which objects to mark as Garbage.
Now you might ask:
When exactly the CLR starts deallocating the Garbage?
In SOH, and for each one of the three generations, the CLR sets a threshold so that when this threshold is reached, the CLR triggers the collection mechanism for this generation and all the younger ones.
In other words:
When threshold of Generation 0 is reached, collection of Generation 0 is triggered.
When threshold of Generation 1 is reached, collection of Generation 1 and Generation 0 is triggered.
When threshold of Generation 2 is reached, collection of Generation 2, Generation 1, and Generation 0 is triggered.
Now you might ask:
What about the LOH?
Actually, I need to tell you something now. The LOH is a part of the Generation 2 of the SOH.
Therefore, whenever Generation 2 collection is triggered, LOH collection is triggered. But, you need to keep in mind that the collection process is not that simple and the algorithm is somehow complex to summarize in a few lines.
Garbage Collection Process
As a quick recap, up till now you know the following:
We have four generations (Gen 0, Gen 1, Gen 2, and Gen 3)
Each generation has a threshold set and adapted by the CLR.
When a generation threshold is reached, collection process of this generation is triggered and followed by firing the collection of younger generations.
Now, let’s proceed with more details about the collection process.
When the collection process of a certain generation is fired, the objects allocated into this generation are analyzed and processed as follows:
Rooted objects are spotted and marked using the graph which the CLR has already built. We refer to these objects as Survived objects.
The other objects are now marked as unrooted and ready for collection.
The collection process starts as described before (copying and overwriting for SOH and appropriately-sized replacements for LOH).
Survived objects are promoted to the one level higher generation.
Here is a step by step gif for you to help you visualize the Garbage Collection process.
Hope it makes it easier for you to understand.
Important Notes About Garbage Collection
When the Garbage Collection process starts, it halts everything else. This is why the excessive triggering of the Garbage Collection process would have a bad impact on the performance of the whole system.
Garbage Collection process could be triggered manually by calling System.GC.Collect() method.
Dispose and Finalize
If you have been a .NET developer for a while, most probably you heard about Dispose and Finalize and the design pattern related to implementing them.
However, before jumping into details, I prefer to discuss some important basics related to this topic.
Therefore, let’s take it step by step.
Managed and Unmanaged
In .NET world we have two types of resources and code; managed and unmanaged.
Simply, the managed code is the code that is compiled and maintained by the .NET framework. In this case, the code is compiled into the Microsoft Intermediate Language (MSIL) which is a common language and then run by the .NET framework. This gives the .NET framework the full control over the code, execution, memory allocation, exception checking,….
On the other hand, the unmanaged code is the code that is compiled and maintained outside the boundaries of the .NET framework. In this case, the code is written into another foreign language, compiled into machine language and executed by the operating system directly. This is why the .NET framework has no control over the code, nor the execution, nor the memory allocation,…. Examples of the unmanaged code are file streams, data connections, …
Therefore, if your .NET system deals with some unmanaged code, the .NET framework can’t help you manage the memory allocation done by the unmanaged code. This is not good as the .NET framework needs to know how to deallocate these resources whenever needed.
Therefore, since the .NET framework doesn’t know how to deallocate these resources whenever needed, the .NET framework decided to delegate this responsibility to the developer as he should be the one aware of this.
Now, you might ask:
How can the developer interfere to manage these unmanaged resources?
The simple answer to this question is through implementing a finalizer.
However, to understand what this statement means, proceed to the next section.
Memory Leak
Do you remember when we discussed the Garbage Collection process? Actually, there are some more details which I didn’t mention in order to focus on the GC process itself. But now, this is the right time.
When an object is marked by the CLR as unrooted and ready to be collected, at this point the CLR doesn’t know if this object is using any unmanaged resource or not.
This is a big problem as this means that the object could be using unmanaged resources and if these resources are not deallocated properly there would be a memory leak. By memory leak here we mean that there would be some memory allocations which would live as long as the whole system lives although they are not actually needed.
Furthermore, this memory leak could get bigger and bigger by time whenever the same type of object is created and collected without proper deallocation of contained unmanaged resources.
Now, you might ask:
Then how to tell the CLR that an object uses some unmanaged resources?
There is an agreement between the developer and the .NET framework that both sides should commit to.
This agreement is simply that whenever the developer implements a Finalizer to an object, the CLR should mark this object as using unmanaged resources.
Now, the next question is:
Then what? When and How exactly would this Finalizer be used by the CLR?
Again, to answer this question, let’s proceed to the next sections.
Finalization Process
Now you might want to know how to implement a Finalizer. What I am going to ask you now is to wait a little bit and I will show you this shortly.
For now, let’s assume that you know how to implement a Finalizer and you already implemented it for some of your system objects.
Now, back to the Garbage Collection process. When an object is allocated into the Heap, the CLR checks if this object implements a Finalizer or not.
If not, nothing special happens as we explained in the previous sections.
However, if the object implements a Finalizer, the CLR would add a reference to this object in a special data structure maintained by the CLR. This data structure is known as the Finalize Queue.
Then, when an object is ready for collection, the CLR checks if this object is referenced into the Finalize Queue or not.
If not, nothing special happens as we explained in the previous sections.
However, if the object is referenced into the Finalize Queue, the CLR removes this reference from the Finalize Queue and creates a new reference to the same object into another special data structure maintained by the CLR. This data structure is known as the Freachable Queue.
Till this point, the object is not collected yet as the CLR knows that a Finalizer on this object should be called before the collection of this object.
The Freachable Queue is maintained by a separate run-time thread. If the queue is empty, the thread sleeps till a new entry is added to the queue and then the thread starts calling the Finalizer of all the objects into the Freachable Queue.
This is why it is always recommended to never try to depend on the time the Finalizer would be called as it is -by design- an undetermined process.
Once the Finalizer is called, the reference to the object in the Freachable Queue is removed, but still, the object is not collected yet.
Then, when the Garbage Collection process is triggered again, the CLR is now sure that the object is ready for collection as it is neither rooted, nor referenced by the Finalize Queue, nor referenced by the Freachable Queue.
That’s why it is known that when an object implements a Finalizer, it lives longer for one more Garbage collection cycle.
Finalizer Implementation
In .NET C#, we can implement the Finalizer by defining a Destructor as follows:
Or by overriding the Finalize method as follows:
However, is this the best practice to follow?
No, there is a design pattern to follow when implementing a Finalizer.
Dispose and Finalize Design Pattern
Before diving into the details of the design pattern, let me tell you more about the Dispose and Finalize methods.
Mainly both of them are responsible for clearing the unmanaged resources. The main difference is that the Dispose method is called explicitly by the developer while the Finalize method is called by the CLR in an un-deterministic way as we explained in the previous sections.
There are other differences between them but these are the most important ones for now.
Now, let’s check the implementation of the Dispose and Finalize Design Pattern.
Let’s say I have the following MyClass class:
As you can see, inside our class we are creating an instance of the System.Timers.Timer class which is already implementing IDisposable interface. This is an indicator that we need to call the Dispose method of this timer before disposing our MyClass object.
Therefore, following the Dispose and Finalize Design Pattern as recommended by Microsoft:
What we can notice here:
We defined the private field m_IsDisposed to be used to check if the object has already been disposed before.
We defined protected virtual void Dispose(bool disposing) to centralize the cleanup logic.
If the passed in parameter is true, this means that it is being called by the Dispose method. In this case, everything is cleaned up.
If not, this means that it is being called by the Finalize method. In this case, only the unmanaged resources are cleaned up.
Keep in mind that it is a best practice to check if the managed resource is not already null before calling its Dispose method.
On the public void Dispose() method, we call GC.SuppressFinalize(this); to tell the CLR to skip calling the Finalize method for this object as we are already going to take care of the unmanaged resources while disposing.
Now, let’s say that we have MyChildClass class inheriting from MyClass as follows:
Now, we need to implement the Dispose and Finalize Design Pattern as follows:
What we should notice here:
We didn’t implement the public void Dispose() method again.
We overridden the protected virtual void Dispose(bool disposing) method.
We called base.Dispose(disposing); at the end of the protected override void Dispose(bool disposing) method.
Worth to mention here, implementing Finalizers is not something that you should always do. As we said before, when the CLR recognizes that an object implements a Finalizer, this would extend the life of the object to one more Garbage Collection cycle.
Therefore, only implement a Finalizer when your object is using an unmanaged resource.
Bonus: Read about System.Runtime.InteropServices.SafeHandle and how to use it instead of Finalizers.
Final Thoughts
In this article we have covered memory allocation in .NET framework and some of the important topics related to it.
I encourage you to do your own research and read more about this topic. as you can see, there are a lot of details and that’s why it is not easy to cover all of them in one article.
Also, I had published an article before about Recursion and its impact on memory. May be you would like to check it. If you are interested, you can read the article Curse of Recursion in .NET C#.
Finally, I hope you enjoyed reading this article as I enjoyed writing it.
Commentaires