How often have you heard the term “threading” in relation to a computer program, but you weren’t exactly sure what it meant? How about “processes?” You likely understand that a “thread” is somehow closely related to a “program” and a “process,” but if you’re not a computer science major, maybe that’s as far as your understanding goes.
Knowing what these terms mean is absolutely essential if you are a programmer, but an understanding of them also can be useful to the average computer user. Being able to look at and understand the Activity Monitor on the Macintosh, the Task Manager on Windows, or Top on Linux can help you troubleshoot which programs are causing problems on your computer, or whether you might need to install more memory to make your system run better.
Let’s take a few minutes to delve into the world of computer programs and sort out what these terms mean. We’ll simplify and generalize some of the ideas, but the general concepts we cover should help clarify the difference between the terms.
First of all, you probably are aware that a program is the code that is stored on your computer that is intended to fulfill a certain task. There are many types of programs, including programs that help your computer function and are part of the operating system, and other programs that fulfill a particular job. These task-specific programs are also known as “applications,” and can include programs such as word processing, web browsing, or emailing a message to another computer.
Perhaps you’ve heard the programmer’s joke, “There are only 10 types of people in the world, those who understand binary, and those who don’t.”
The end result is the same, however, in that when a program is run, it is loaded into memory in binary form. The computer’s central processing unit (CPU) understands only binary instructions, so that’s the form the program needs to be in when it runs.
Binary is the native language of computers because an electrical circuit at its basic level has two states, on or off, represented by a one or a zero. In the common numbering system we use every day, base 10, each digit position can be anything from 0 to 9. In base 2 (or binary), each position is either a zero or a one. (In a future blog post we might cover quantum computing, which goes beyond the concept of just ones and zeros in computing.)
|Decimal—Base 10||Binary—Base 2|
How Processes Work
The program has been loaded into the computer’s memory in binary form. Now what?
An executing program needs more than just the binary code that tells the computer what to do. The program needs memory and various operating system resources in order to run. A “process” is what we call a program that has been loaded into memory along with all the resources it needs to operate. The “operating system” is the brains behind allocating all these resources, and comes in different flavors such as macOS, iOS, Microsoft Windows, Linux, and Android. The OS handles the task of managing the resources needed to turn your program into a running process.
Some essential resources every process needs are registers, a program counter, and a stack. The “registers” are data holding places that are part of the CPU. A register may hold an instruction, a storage address, or other kind of data needed by the process. The “program counter,” also called the “instruction pointer,” keeps track of where a computer is in its program sequence. The “stack” is a data structure that stores information about the active subroutines of a computer program and is used as scratch space for the process. It is distinguished from dynamically allocated memory for the process that is known as the “heap.”
A Computer Process
There can be multiple instances of a single program, and each instance of that running program is a process. Each process has a separate memory address space, which means that a process runs independently and is isolated from other processes. It cannot directly access shared data in other processes. Switching from one process to another requires some time (relatively) for saving and loading registers, memory maps, and other resources.
This independence of processes is valuable because the operating system tries its best to isolate processes so that a problem with one process doesn’t corrupt or cause havoc with another process. You’ve undoubtedly run into the situation in which one application on your computer freezes or has a problem and you’ve been able to quit that program without affecting others.
How Threads Work
So, are you still with us? We finally made it to threads!
A thread is the unit of execution within a process. A process can have anywhere from just one thread to many threads.
Process vs. Thread
When a process starts, it is assigned memory and resources. Each thread in the process shares that memory and resources. In single-threaded processes, the process contains one thread. The process and the thread are one and the same, and there is only one thing happening.
In multithreaded processes, the process contains more than one thread, and the process is accomplishing a number of things at the same time. (Technically, sometimes it’s almost at the same time—read more on that in the “What about Parallelism and Concurrency?” section below.)
We talked about the two types of memory available to a process or a thread, the stack and the heap. It is important to distinguish between these two types of process memory because each thread will have its own stack, but all the threads in a process will share the heap.
Threads are sometimes called lightweight processes because they have their own stack but can access shared data. Because threads share the same address space as the process and other threads within the process, the operational cost of communication between the threads is low, which is an advantage. The disadvantage is that a problem with one thread in a process will certainly affect other threads and the viability of the process itself.
Threads vs. Processes
So to review:
- The program starts out as a text file of programming code.
- The program is compiled or interpreted into binary form.
- The program is loaded into memory.
- The program becomes one or more running processes.
- Processes are typically independent of each other.
- Threads exist as the subset of a process.
- Threads can communicate with each other more easily than processes can.
- Threads are more vulnerable to problems caused by other threads in the same process.
Processes vs. Threads: Advantages and Disadvantages
|Processes are heavyweight operations.||Threads are lighter weight operations.|
|Each process has its own memory space.||Threads use the memory of the process they belong to.|
|Inter-process communication is slow as processes have different memory addresses.||Inter-thread communication can be faster than inter-process communication because threads of the same process share memory with the process they belong to.|
|Context switching between processes is more expensive.||Context switching between threads of the same process is less expensive<./td>|
|Processes don’t share memory with other processes.||Threads share memory with other threads of the same process.|
What About Concurrency and Parallelism?
A question you might ask is whether processes or threads can run at the same time. The answer is: It depends. On a system with multiple processors or CPU cores (as is common with modern processors), multiple processes or threads can be executed in parallel. On a single processor, though, it is not possible to have processes or threads truly executing at the same time. In this case, the CPU is shared among running processes or threads using a process scheduling algorithm that divides the CPU’s time and yields the illusion of parallel execution. The time given to each task is called a “time slice.” The switching back and forth between tasks happens so fast it is usually not perceptible. The terms, “parallelism” (genuine simultaneous execution) and “concurrency” (interleaving of processes in time to give the appearance of simultaneous execution), distinguish between the two types of real or approximate simultaneous operation.
Why Choose Process Over Thread, or Thread Over Process?
So, how would a programmer choose between a process and a thread when creating a program in which they want to execute multiple tasks at the same time? We’ve covered some of the differences above, but let’s look at a real world example with a program that many of us use, Google Chrome.
When Google was designing the Chrome browser, they needed to decide how to handle the many different tasks that needed computer, communications, and network resources at the same time. Each browser window or tab communicates with multiple servers on the internet to retrieve text, programs, graphics, audio, video, and other resources, and renders that data for display and interaction with the user. In addition, the browser can open many windows, each with many tasks.
Google made a calculated trade-off with the multi-processing design. Starting a new process for each browser window has a higher fixed cost in memory and resources than using threads. They were betting that their approach would end up with less memory bloat overall.
Using processes instead of threads also provides better memory usage when memory gets low. An inactive window is treated as a lower priority by the operating system and becomes eligible to be swapped to disk when memory is needed for other processes. That helps keep the user-visible windows more responsive. If the windows were threaded, it would be more difficult to separate the used and unused memory as cleanly, wasting both memory and performance.
The screen capture below shows the Google Chrome processes running on a MacBook Air with many tabs open. Some Chrome processes are using a fair amount of CPU time and resources, and some are using very little. You can see that each process also has many threads running as well.
The Activity Monitor or Task Manager on your system can be a valuable ally in fine-tuning your computer or troubleshooting problems. If your computer is running slowly, or a program or browser window isn’t responding for a while, you can check its status using the system monitor. Sometimes you’ll see a process marked as “Not Responding.” Try quitting that process and see if your system runs better. If an application is a memory hog, you might consider choosing a different application that will accomplish the same task.
Made It This Far?
We hope this “Tron“-like dive into the fascinating world of computer programs, processes, and threads has helped clear up some questions you might have had.
The next time your computer is running slowly or an application is acting up, you know your assignment. Fire up the system monitor and take a look under the hood to see what’s going on. You’re in charge now.
We love to hear from you.
Are you still confused? Have questions? If so, please let us know in the comments. And feel free to suggest topics for future blog posts.
Addendum — August 18, 2017
I’ve added the example below to illustrate how processes or threads, when properly used, are able to accomplish tasks more effectively. Backblaze recently released Backblaze Computer Backup Version 5.0, which doubles the number of threads available for backup on both Mac and PC (up to 20). On its default settings, our client app will now automatically evaluate what’s best given your environment and set the number of threads accordingly, but you have manual control to set the threads to any number you wish.
The screenshot below from the Macintosh’s Activity Monitor shows one system running 20 threads to upload data to the cloud. This number of threads won’t be optimum for all systems—in fact on some, it could actually slow down uploads. If in doubt, it’s best to leave the client on Automatic Threading, and let it decide what is best for your system.