Two Dimensional (2D) Image basics.md

Last updated on 2025-08-27 | Edit this page

Estimated time: 30 minutes

Overview

Questions

What is a 2D image in the context of digital image processing?
What parameters would describe a 2D image?
How to open and visualise a digital image?

Objectives

Describe how the 2d image is composed of pixels
Explain how colour digital images are often composed of three separate channels (Red, Green, Blue).
Explain size, dimensions, and compressions and their significance
Demonstrate examples of opening and visualizing of 2D images from a general dataset
Identify/extract the data type, size, dimensions, and compression type from an image

Introduction

In this notebook, we will explore the fundamental structure and information contained within a two-dimensional (2D) image. We will also introduce an open-source toolkit for image processing: the Python programming language, along with the scikit-image (skimage) library. When combined with thoughtful experimental design, Python provides a powerful and flexible platform for addressing a wide range of image analysis questions.

Image Basics

Images displayed on electronic devices/ screens, printed on paper, or process through software are ultimately stored in computers as numerical abstractions—digital representations of the visual information we perceive from the real world. Before diving into image processing with Python, it is essential first to understand the underlying principles of how images, these digital representations of visual informatio, work.

Import useful libraries for image processing

We need to import various libraries and modules to help us with image processing. The Numpy library serves numerical calculations such as array operations. The matplotlib library is for plotting and viewing images and gaining access to the tools to plot image analytics such as histograms. The statement from matplotlib import pyplot as plt, loads up the pyplot library, and gives it a shorter name, plt. Next, we import the skimage library, which is widely used for image processing. The skimage.io.imread() function is for loading our images.

PYTHON

"""
 * Python libraries for learning and performing image processing.*
"""
import numpy as np     #Form 1, load all of numpy library into an object called np
import matplotlib.pyplot as plt #Form 2: load from matplotlib library the pyplot module into an object called plt
import skimage         #Form 1, load all the skimage library
import skimage.io      #Form 2, load skimage.io module only
import skimage.color    # Form 2, load skimage.color module only
import ipympl           #Form 1, load all the ipympl library

PYTHON

%matplotlib widget

OUTPUT

Pixels

It is essential to recognise that images are stored as rectangular arrays of hundreds, thousands, or millions of discrete “picture elements,” commonly referred to as pixels. Each pixel can be thought of as a single square point of coloured light. For example, consider the image from my camera and examine the pixels.

PYTHON

# load an image
image = skimage.io.imread(fname='../data/2D_data/swans.jpg')# from the given data directory (folder) choose the swans.jpg image

PYTHON

# display the image
fig, ax = plt.subplots() # (1,1, dpi= 100, figsize=(10,6))

plt.imshow(image)

OUTPUT

To zoom in on the picture, click the square tab on the left and draw a square inside the area you wish to enlarge. We can see the little squares that make up the image. By zooming in further, the squares might appear more prominent, but they are actually much smaller. Note that each square in the enlarged image area - each pixel - is all one colour, but that pixel can have a different colour from its neighbours. Viewed from a distance, these pixels blend to form the image we see.

Explore the image and observe

-1. Using the skimage library we have a few functionalities. By moving the cursor around, we are notising in the bottom of the page that there are the x, y coordinates with a list of three numbers in the brackets. The coordinates give the pixel’s position and the three numbers give the number of the Red, Green and Blue components of the pixel’s intensity, which are blended to make the colour of the individual pixel. The additive colour mixing of the channels gives the intensity of each pixel. - 2. When zooming in on the picture, we can see the square in the enlarged image area - the pixel. Note that each pixel is all one colour, and each pixel can have a different colour from its neighbours. Viewed from a distance, these pixels seem to blend together to form the image we see. - 3. The intensities are in 1 x 3 array (matrix) and each number is in the range from 1 (black) to 255. The array represents the intensities of Red, Green, and Blue and it’s called the RGB array. - 4. The coordinate system is different from the usual cartesian we use.

2. Additive color mixing

24 bit RGB colour

As we observe, the RGB model is an additive colour model, which means that the primary colours are mixed together to form other colours. Most frequently, the amount of the primary colour added is represented as an integer in the closed range [0, 255], as seen in the example. Therefore, 256 discrete amounts of each primary colour can be added to produce another colour. The number of discrete quantities of each colour, 256, corresponds to the number of bits used to hold the colour channel value, which is two to the power of 8 ( \(2^{8}\) =256). A monochrome is 8 bits or a range from [1, 255]. Since we have three channels with 8 bits each (8+8+8=24), this is referred to as 24-bit colour depth.

Any particular colour in the RGB model can be expressed by a triplet of integers in [0, 255], representing the red, green, and blue channels, respectively. A larger number in a channel means that more of that primary colour is present.

Red is represented by [255, 0, 0]
Green is represented by [0, 255, 0]
Blue is represented by [0, 0, 255]

Challenge

Challenge 1:

What does the pixel [255,255,255] represent?
What does the [0,0,0] represent?

Output

OUTPUT

- white colour
-black

3. Coordinate system

When we process images, we can access, examine, and/or change the colour of any pixel we wish. To do this, we need a convention on how to access pixels individually, a way to assign each one a name or an address of some sort. The most common method of doing this, and the one we will use in our programs, is to assign a modified Cartesian coordinate system to the image. The coordinate system we usually see in mathematics has a horizontal x-axis and a vertical y-axis, like the following figure:

The modified coordinate system used for our images will have only positive coordinates, the origin will be in the upper left corner instead of the centre, and the y coordinate values will get larger as they go down instead of up, like the following figure:

Until you have worked with images for a while, the most common mistake you will make with coordinates is to forget that y coordinates get larger as they go down instead of up, as in a standard Cartesian coordinate system. Consequently, it may be helpful to think in terms of counting down rows (r) for the y-axis and across columns (c) for the x-axis. This can be especially helpful in cases where you need to transpose image viewer data provided in x,y format to y,x format. Thus, we will use cx and ry where appropriate to help bridge these two approaches.

4. The image is been stored as an array of numbers (matrix). Numpy libray

PYTHON

#To display the arrays
print(image)

OUTPUT

To deal with arrays we need to use the Numpy library which we have imported as np

PYTHON

#To find what type is the data use type()
type(image)

OUTPUT

PYTHON

#Read the image using matplotlib

# Imread reads the image as an array, then we name the image with the name array,(which is an array of numbers), to distinguish from the image we visualise

array = plt.imread('../data/2D_data/swans.jpg')

OUTPUT

PYTHON

# To check what type are the values in the array we use the method .dtype
array.dtype

OUTPUT

PYTHON

# To check what shape of the array we use the method .shape
array.shape

OUTPUT

We see that the image is an array of 2448 rows by 3264 columns and the number 3 indicates the Red Green and Blue channels that are supperimposed to form the colorfull image.

For further diagnostics of the propertied of teh image we used the methods .itemseze, .size, .nbytes

PYTHON

array.itemsize

OUTPUT

PYTHON

array.size

OUTPUT

PYTHON

array.nbytes

OUTPUT

We write all these calculations neetly by inserting then into the print function.

PYTHON

    """
    * How to obtaine array information.
    *
    """
    
    # In the print function the red words in brackets are strings and they are separeted by comma with the calculated numbers. 
    print('Data type:    ', array.dtype)
    print('Shape:        ', array.shape)
    print('Element size: ', array.itemsize)
    print('Num. elements:', array.size)
    print('Num. bytes:   ', array.nbytes)

OUTPUT

Functions

In Python, we can define a function to perform repetitive tasks, allowing us to execute the same block of code multiple times without rewriting it.

PYTHON

#The definition of the function starts with def and the function name is following. Here the function is called describe_array
    def describe_array(array):
    """
    * How to obtaine array information.
    *
    """
    
    # In the print function the red words in brackets are strings and they are separeted by comma with the calculated numbers. 
    print('Data type:    ', array.dtype)
    print('Shape:        ', array.shape)
    print('Element size: ', array.itemsize)
    print('Num. elements:', array.size)
    print('Num. bytes:   ', array.nbytes)

OUTPUT

Now we can call the function describe_array for our image which we have it saved with the filename array.

PYTHON

describe_array(array)

OUTPUT

Challenge

Challenge 2:

Import the image swans_1.png and name it array_1. Explore its type, shape, size and numbers of bytes. How do you compare it with the image swans.jpg (array)? What do you observe?

Output

OUTPUT

array_1 = plt.imread('../data/2D_data/swans.jpg')
describe_array(array_1)

Observations:

1. Float values
1. Four channels
1. Element size 4 (bigger than befor)
1. Bigger number of elements (2448 x 3264 x4) = 31961088> 23970816
1. Much bigger number of bytes 127844352 > 23970816 ( 124,848 KB > 23,409 KB)

5. Image Compression and various image formats

Before exploring additional image formats, it is helpful to have a basic understanding of image compression. This includes familiarity with bits, bytes, and how these units are used to represent storage capacity in computing. If you’re already comfortable with these concepts, feel free to skip this section.

5.1 Bits and bytes

Before we talk about images, we first need to understand how numbers are stored in a modern digital computer. When we think of a number, we use a decimal, or base-10 place-value number system. For example, a number like 459 is 4 × \(10^{2}\) + 5 × \(10^{1}\)+ 9 × \(10^{0}\). Each digit in the number is multiplied by a power of 10, based on where it occurs, and there are 10 digits that can occur in each position (0, 1, 2, 3, 4, 5, 6, 7, 8, 9).

In principle, computers could be constructed to represent numbers in exactly the same way. But, the electronic circuits inside a computer are much easier to construct if we restrict the numeric base to only two, instead of 10. (It is easier for circuitry to tell the difference between two voltage levels than to differentiate among 10 levels.) So, values in a computer are stored using a binary, or base-2 place-value number system.

In this system, each symbol in a number is called a bit instead of a digit, and there are only two values for each bit (0 and 1). We might imagine a four-bit binary number, 1101. Using the same kind of place-value expansion as we did above for 459, we see that 1101 = 1 × \(2^{3}\) + 1 × \(2^{2}\)+ 0 × \(2^{1}\) + 1 × \(2^{0}\), which if we do the math is 8 + 4 + 0 + 1, or 13 in decimal.

Internally, computers have a minimum number of bits they work with at a given time: eight. A group of eight bits is called a byte. The amount of memory (RAM) and drive space our computers have is quantified by terms like Megabytes (MiB), Gigabytes (GiB), and Terabytes (TiB).

The following table provides more formal definitions for these terms.

Unit	Abbreviation	Size
Kilobyte	KiB	1024 bytes
Megabyte	MiB	1024 KB
Gigabyte	GiB	1024 MB
Terabyte	TiB	1024 GB

5.2 Lets see a few image formats

1. Device-Independent Bitmap (BMP), file format. BMP files store raster graphics images as long sequences of binary-encoded numbers that specify the colour of each pixel in the image. Since computer files are one-dimensional structures, the pixel colours are stored one row at a time. That is, the first row of pixels (those with y-coordinate 0) are stored first, followed by the second row (those with y-coordinate 1), and so on. Depending on how it was created, a BMP image might have 8-bit, 16-bit, or 24-bit colour depth.
  
  24-bit BMP images have a relatively simple file format, can be viewed and loaded across a wide variety of operating systems, and have high quality. However, BMP images are not compressed, resulting in very large file sizes for any useful image resolutions.
  
  The idea of image compression is important to us for two reasons: first, compressed images have smaller file sizes, and are therefore easier to store and transmit; and second, compressed images may not have as much detail as their uncompressed counterparts, and so our programs may not be able to detect some important aspect if we are working with compressed images. Since compression is important to us, we should take a brief detour and discuss the concept.
1. Joint Photographic Experts Group (JPEG). These images are perhaps the most commonly encountered digital images today. JPEG uses lossy compression, and the degree of compression can be tuned to your liking. It supports 24-bit colour depth, and since the format is so widely used, JPEG images can be viewed and manipulated easily on all computing platforms. That’s why the firts image (swan.jpg) you have imported has a smaller number of bytes than the second swan_1.png
1. Portable Network Graphics PNG images are well suited for storing diagrams. It uses a lossless compression and is hence often used in web applications for non-photographic images. The format is able to store RGB and plain luminance (single channel, without an associated color) data, among others. Image data is stored row-wise and then, per row, a simple filter, like taking the difference of adjacent pixels, can be applied to increase the compressability of the data. The filtered data is then compressed in the next step and written out to the disk.
1. Tag Image File Format TIFF images are popular with publishers, graphics designers, and photographers. TIFF images can be uncompressed, or compressed using either lossless or lossy compression schemes, depending on the settings used, and so TIFF images seem to have the benefits of both the BMP and JPEG formats. The main disadvantage of TIFF images (other than the size of images in the uncompressed version of the format) is that they are not universally readable by image viewing and manipulation software.

5.3 Metadata

If you look at the images you acquire with your telephone cameras, you will see some more textual information called Metadata. Metadata holds information about the image itself, such as when the image was captured, where it was captured, what type of camera was used and with what settings, etc. Metadata exists in all digital Medical Images and contain information of the devices, the patient the acquisition the image dimensions and resolution etc. We normally don’t see this metadata when we view a 2D image, but programs exist that can allow us to view it if we wish. In 3D images we will see that metadata is accessible at DICOMS and Nifty formats at their header. At this stage, the important thing to be aware of is that you cannot rely on the metadata of an image being preserved when you use software to process that image. The image processing library we will use in the rest of this lesson, skimage, does not include metadata when saving new images. So remember: if metadata is important to you, take precautions to always preserve the original files. Although skimage does not provide a way to display or explore the metadata associated with an image (and subsequently cannot preserve that metadata when modifying an image file)other software exists that can help you to do so, e.g. Fiji and ImageMagick. We recommend you explore these options if you need to work with the metadata of your images.

Challenge

Challenges examples : Can you compare images of different formats?

open any .png image from the dataset to visualize it
print the size of the png image
save the image as a jpeg file in the same folder
open the .jpeg image and visualize it
Do you observe any visual differences between the png and jpeg?
print the size of the jpeg image
Are there any differences in the file size of the png and jpeg format?

Output

OUTPUT

file size of jpeg is 233 kb

Figures

You can use standard markdown for static figures with the following syntax:

![optional caption that appears below the figure](figure url){alt='alt text for accessibility purposes'}

You belong in The Carpentries!

Callout

Callout sections can highlight information.

They are sometimes used to emphasise particularly important points but are also used in some lessons to present “asides”: content that is not central to the narrative of the lesson, e.g. by providing the answer to a commonly-asked question.

Key Points

2D digital images are matrices and are represented as rectangular arrays of square pixels.
In Python to describe the images we use a left-hand coordinate system, with the origin in the upper left corner, the x-axis running to the right, and the y-axis running down. Some learners may prefer to think in terms of counting down rows for the y-axis and across columns for the x-axis. Thus, we will make an effort to allow for both approaches in our lesson presentation.
Most frequently, digital images use an additive RGB model (Red, Green, Blue), with eight bits for the red, green, and blue channels.
skimage images are stored as multi-dimensional NumPy arrays.
Lossless compression retains all the details in an image, but lossy compression results in loss of some of the original image detail.
Some images, depending on the digital sensor (camera), might give textual information called Metadata. we will see then in Medical Images.