Understanding memory management in JavaScript: Garbage collection and more

Understanding memory management in JavaScript: Garbage collection and more

·

10 min read

By Siddharth Gelera

Memory management ensures systems perform as they need to

Many businesses and developers opt for JavaScript due to its extensive support across various platforms, including web, mobile, servers, and embedded systems. Understanding how JavaScript manages memory is crucial for developers, as it relies on garbage collection instead of manual memory allocation and deallocation like in low-level languages such as C and C++. This knowledge enables developers to minimise memory overhead, optimise cache creation, and maintain application stability. In short, it means they can ensure systems perform as they need to.

This post will explain this with examples. We’ll cover topics such as:

  • The difference between reference and value types

  • Garbage collection

  • Weak data structures

  • Finalisation

Reference vs value

We'll now go through some basics of JavaScript's type system to provide you with context for the topics covered later on in this article. This won't be a detailed explanation, for more information you can look up JavaScript data structures on MDN.

Value

Any data represented by the primitive types will be considered a value. This means that you can assign the same primitive value to another variable, but you cannot modify the primitive itself.

Taking numbers as an example:

let foo = 1
let bar = foo // => 1

The value 1 is copied into a new variable bar but we can't possibly change the value 1 itself to something else.

Also, if the value of the newly created variable is changed to another primitive, JavaScript won’t change the first variable's data.

let foo = 1
let bar = foo // => 1
bar = 2

console.log(foo) // => 1
console.log(bar) // => 2

More information about the primitive data types in JavaScript can be found on MDN.

Reference

In JavaScript, there is another type known as objects. All other constructs in the language, such as arrays, are either derived from objects or add special behaviours to them, as is the case with functions.

Objects in JavaScript are referential values, meaning that they contain a reference to their location in memory rather than the actual value itself.

The referential nature of objects has an important implication: any modification made to an object will be reflected wherever the object's reference is stored. This means that if you have multiple variables or data structures that reference the same object, any changes made to the object will be visible through all the variables which reference the same object.

Understanding the referential nature of objects is crucial when working with JavaScript. It allows you to manipulate and modify objects effectively, but it also requires careful consideration to avoid unintended side effects when multiple references to the same object exist.

Let's see an example of this concept:

const foo = {
  bar: 1
}

const fooClone = foo

// fooClone now references the same memory address as foo

fooClone.bar = 2

console.log(foo.bar) // => 2 (changed, even though we changed it on the clone)
console.log(fooClone.bar) // => 2 (as expected)

Garbage collection

Garbage collection is the process of freeing up memory by removing anything that the collector deems as no longer needed to be kept in memory.

Different programming languages use different algorithms to determine when memory should be released. Tracing, Reference Counting, and Escape Analysis are a few of them.

Each JavaScript runtime might use a different algorithm, so the algorithm itself is not important for the sake of this post, but the concept of garbage collection is.

Here’s an example:

function greet () {
  const greeting = { hello: "world" }
  const foo = { bar: "baz" }
  return greeting
}

const result = greet()

What we have here is a function that defines an object and passes back its reference, to be assigned to the result variable, which now holds the reference to it.

The variable inside the function, foo, is no longer needed once the function has finished executing, so the garbage collector (GC) can get rid of it in its next run.

Weak data structures

We know that objects are used for everything in JavaScript and that all other constructs are built on top of them. The most basic use case of an object is to store values indexed by a key.

The key is what defines how a particular value can be accessed again, but what if the key gets lost?

This cannot happen when working with regular objects because the keys are primitive values, making them strong keys or keys that cannot be garbage collected. It isn't a big problem, but it also makes it easy for someone to create a large number of references and fill up the memory quickly and unnecessarily.

Here's an example:

const a = {}

for (let i = 0; i <= 10000; i++) {
  a[i] = {
    foo: "bar",
    bar: "foobar",
  }
}

console.log(a[0])

// => {
//   foo: "bar",
//   bar: "foobar",
// }

After the for loop executes, we use a[0]. We are not referencing the other elements elsewhere. However, the garbage collector cannot do anything because the elements are referenced and accessible by their indexes. In other words, the GC cannot make any assumptions about whether the other elements of the array can be garbage collected.

Weak data structures differ from other, more traditional data structures, in that they don’t hold strong references towards the values they contain.

JavaScript provides three types of Weak data structures:

  • WeakMap

  • WeakSet

  • WeakRef

Let's take an example where we define 2 objects and use them as keys and 2 other objects that will act as values. The example tries to demonstrate how un-referenced keys would get garbage collected using a WeakMap.

let keyOne = { id: 1 }
const valueOne = { data: "Some text" }

let keyTwo = { id: 2 }
const valueTwo = { data: "Some other text" }

const wMap = new WeakMap()
wMap.set(keyOne, valueOne)
wMap.set(keyTwo, valueTwo)

// the object initialised by keyOne is now un-referenced and can be GC'd
keyOne = undefined

let obj = wMap.get(keyTwo) // => now holds reference to valueTwo (since it's an object)
obj.data = 1

// the object pointed by keyTwo is now un-referenced but valueTwo still points to it
obj = undefined

// both keyOne and keyTwo aren't being used past this point and can be garbage collected

To understand what happens, we need to consider that keyOne and keyTwo are 2 object references. When we pass them as the key to a WeakMap, the reference becomes the index used to access the value.

Once the reference is cleaned up, there is no way for you to access the value that was stored with it, unless you are also hard referencing it from somewhere else. In our case, valueOne and valueTwo are hard references to the value, so we can still use them and that would avoid them from being GC'd.

Let’s have a look at another example:

const keyOne = { id: 1 }
const keyTwo = { id: 2 }

const wMap = new WeakMap()

wMap.set(keyOne, { value: "1" })
wMap.set(keyTwo, { value: "2" })

let obj = wMap.get(keyTwo) // => now holds reference to valueTwo (as it's an object)
obj.value = 1
obj = undefined 

// lost reference to the object held in the weak map with key `keyTwo`, so if `keyTwo` is no longer used, we lose access to `{ value: "2" }` altogether and it can be now GC'd

Due to how this works, when using weak data structures, it is advisable to carry out null checks, as reference might get lost during program execution due to the GC kicking in, and this is what FinalizationRegistry can help with.

Finalisation

To address the issue of losing references held by weak data structures, it is convenient to have a mechanism that allows us to inform our code when the referenced value has been lost.

In other programming languages, such as Java, where the concept of finalisation is commonly used, a class can have a finalize method, which is called when an instance of the class is no longer reachable. In this method, you can perform necessary cleanup steps or modify flags to prevent further processing of the data associated with the instance.

Similarly, in JavaScript there is a helper construct available, although it is not specifically tied to a class like in Java. This helper function allows you to register a reference/instance that you want to track. Once that reference is cleared, the registered callback function is invoked, passing the reference value that you had previously registered.

In the example below we will see a simple approach at creating a cache for data loaded from the network, with the intent of keeping it around for as long as it's referentially accessible. Once it's lost, we don't need it in the cache and we wish to clear any remains of it and fetch from the network again.

To achieve this, we will need to:

  1. Store the cache in a data structure

  2. Fetch network data

  3. Map the data to the cache in a way that it is garbage collectable

  4. Fetch from the cache and fallback on the network if the cached value no longer exists

// Create a cache object
const cache = new Map()

// Create a FinalizationRegistry
const registry = new FinalizationRegistry((key) => {
    // Callback function when the data object is garbage-collected
    console.log(`Data with key ${key} has been garbage-collected.`)
  cache.delete(key)
})

// Function to fetch data from API
async function fetchDataFromAPI(key) {
    // Fetch data from API
    const data = await fetch(`https://api.example.com/data/${key}`).then((
    response,
  ) => response.json())

    // Store data in cache with a WeakRef
    const weakRef = new WeakRef(data)
    cache.set(key, weakRef)

    // Register the original `data` reference to the FinalizationRegistry 
    // with the `key` that we wish to store the cache
    registry.register(data, key)

  return data
}

// Function to retrieve data from cache
function getDataFromCache(key) {
  const weakRef = cache.get(key)

  if (weakRef) {
    const data = weakRef.deref()

    if (data) {
      // Data still available in cache
      console.log("Getting data from cache:", data)
      return data
    }
  }

// Data not available in cache, fetch from API
 console.log("Fetching data from API...")
  return fetchDataFromAPI(key)
}

// Usage 
examplegetDataFromCache("user").then((data) => {
  console.log("Data:", data)
})

Let's go through the example to gain a better understanding of how it works:

  1. We create a Map to keep track of the hard keys, which will represent the path of the URL used to fetch the network data.

  2. Next, we instantiate a FinalizationRegistry to assist in tracking what has been cleared. In this specific case, the callback will be called with the same key that we might use to make the API call.

  3. We will use a network wrapper function, with the additional usage of something called a WeakRef. Instead of storing the actual data in the cache, we store a WeakRef to it. This approach is necessary because directly setting the data into the cache with a key would create another reference to the data, making it impossible to be garbage collected. By creating a weak reference, the WeakRef will also be lost if the referenced object is lost when the GC runs.

  4. We register the original data reference, which was created during the network call in Step 3. During registration, we also provide a key to the register call. This is done to ensure that the finalisation registry knows what to pass to the callback.

  5. We attempt to retrieve the reference and check if the contained object still exists. The if conditions will perform these checks twice. Firstly, it will check if the key is still part of the Map. Secondly, it will verify if the retrieved WeakRef still points to the network data and if it is accessible. If neither of these conditions is met, the function will fetch the data again.

By following this example, we will have a cache that only retains memory as long as the required data from the cache is being referenced somewhere in our program.

It's important to note that implementing these strategies can be challenging and should be approached only when there’s a real need to optimise an application’s memory footprint.

A deep dive into this fascinating topic can be found in Memory Management on MDN.

Conclusion

Having read this article, you now understand how:

  • The garbage collector works

  • We can run clean up steps once the GC is done cleaning up an instance or reference that we wanted to monitor — a.k.a “What is Finalisation” and FinalizationRegistry

And you also got a simple example of how to create a cache for network data

These concepts represent a subset of the methods available to mitigate memory overhead, create and manage caches, and enhance the fault tolerance of GC-driven processes.