Special Ops : Binary Data and Operators

This chapter is optional. If you expect to be working with individual bits, these operators are very helpful. Otherwise, if you don’t expect to be working with anything other than plain-old decimal numbers, you can skip this chapter.

While we write numbers using decimal digits, in base 10, computers don’t really work that way internally. We touched on the computer’s view in Octal and Hexadecimal – Counting by 8’s or 16’s. Internally, the computer works in binary, base 2, which makes the circuitry very simple and very fast. One of the benefits of using Python is that we don’t need to spend much time on the internals, so this chapter is optional.

We’ll take a close look at data in Bits and Bytes, this will provide some justification for having base 8 and base 16 numbers. We’ll add some functions to see base 8 and base 16 in Different Bases and Representations. Then we’ll look at the operators for working with individual bits in Operators for Bit Manipulation.

Bits and Bytes

The special operators that we’re going to cover in this chapter work on individual bits. First, we’ll have to look at what this really means. Then we can look at what the operators do to those things called bits.

A bit is a “binary digit” . The concept of bit closely parallels the concept of decimal digit with one important difference. There are only two binary digits (0 and 1), but there are 10 decimal digits (0 through 9).

Decimal Numbers. Our decimal numbers are a sequence of digits using base 10. Each decimal digit’s place value is a power of 10. We have the 1,000’s place, the 100’s place, the 10’s place and the 1’s place. A number like 2185 is 2\times1000 + 1\times100 + 8\times10 + 5.

Binary Numbers. Binary numbers are a sequence of binary digits using base 2. Each bit’s place value in the number is a power of 2. We have the 256’s place, the 128’s place, the 64’s place, the 32’s place, the 16’s place, the 8’s, 4’s, 2’s and the 1’s place. We can’t directly write binary numbers in Python. We’ll show them as a series of bits, like this 1-0-0-0-1-0-0-0-1-0-0-1. This starts with a 1 in the 2048’s place, a 1 in the 128’s place, plus a 1 in the 8’s place, plus a 1, which is 2185.

Octal Numbers. Octal numbers use base 8. In Python, we begin octal numbers with a leading zero. Each octal digit’s place value is a power of 8. We have the 512’s place, the 64’s place, the 8’s place and the 1’s place. A number like 04211 is 4 \times 512 + 2 \times 64 + 1 \times 8 + 1. This has a value of 2185.

Each group of three bits forms an octal digit. This saves us from writing out all those bits in detail. Instead, we can summarize them.

Binary: 1-0-0  0-1-0  0-0-1  0-0-1
Octal:    4      2      1      1

Hexadecimal Numbers. Hexadecimal numbers use base 16. In Python, we begin hexadecimal numbers with a leading 0x. Since we only have 10 digits, and we need 16 digits, we’ll borrow the letters a, b, c, d, e and f to be the extra digits. Each hexadecimal digit’s place value is a power of 16. We have the 4096’s place, the 256’s place, the 16’s place and the 1’s place. A number like 0x8a9 is 8 \times 256 + 10 \times 16 + 9, which has a value of 2217.

Each group of four bits forms a hexadecimal digit. This saves us from writing out all those bits in detail. Instead, we can summarize them.

Binary:      1-0-0-0  1-0-1-0  1-0-0-1
Hexadecimal:    8        a        9

Bytes. A byte is 8 bits. That means that a byte contains bits with place values of 128, 64, 32, 16, 8, 4, 2, 1. If we set all of these bits to 1, we get a value of 255. A byte has 256 distinct values. Computer memory is addressed at the individual byte level, that’s why you buy memory in units measured in megabytes or gigabytes.

In addition to small numbers, a single byte can store a single character encoded in ASCII. It takes as many as four bytes to store characters encoded with Unicode.

An integer has 4 bytes, which is 32 bits. In looking at the special operators, we’ll look at them using integer values. Python can work with individual bytes, but it does this by unpacking a byte’s value and saving it in a full-sized integer.

Different Bases and Representations

In Octal and Hexadecimal – Counting by 8’s or 16’s we saw that Python will accept base 8 or base 16 (octal or hexadecimal) numbers. We begin octal numbers with 0, and use digits 0 though 7. We begin a hexadecimal number with 0x and use digits 0 through 9 and a through f.

Python normally answers us in decimal. How can we ask Python to answer in octal or hexadecimal instead?

The hex() function converts its argument to a hexadecimal (base 16) string. A string is used because additional digits are needed beyond 0 through 9; a-f are pressed into service. A leading 0x is placed on the string as a reminder that this is hexadecimal. Here are some examples:

>>> hex(684)
'0x2ac'
>>> hex(1023)
'0x3ff'
>>> 0xffcc33
16763955
>>> hex(_)
'0xffcc33'

Note that the result of the hex() function is technically a string, An ordinary number would be presented as a decimal value, and couldn’t contain the extra hexadecimal digits. That’s why there are apostrophes in our output.

The oct() function converts its argument to an octal (base 8) string. A leading 0 is placed on the string as a reminder that this is octal not decimal. Here are some examples:

>>> oct(512)
'01000'
>>> oct(509)
'0775'

Here are the formal definitions.

hex(number) → string
Creates a hexadecimal string representation of number.
oct(number) → string
Creates an octal string representation of number.

More Hexadecimal and Octal tools. The hex() and oct() functions make a number into a specially-formatted string. The hex() function creates a string using the hexadecimal digit characters. The oct() uses the octal digits. There is a function which goes the other way: it can convert strings of digit characters into proper numbers so we can do arithmetic.

The int() function has two forms. The int(x) form converts a decimal string, x, to an integer. For example int('25') is 25. The int(x,b) form converts a string, x, in base b to an integer.

In case you don’t recall how this works, remember that in the number 1985, we’re implicitly computing 1*10**3 + 9*10**2 + 8*10 + 5. Each digit has a place value that is a power of some number. That number is the “base” for the numbers we’re writing. Python assumes that a string of digits is decimal. A string of digits which begins with 0 is in base 8. A string of digits which begins with 0x is in base 16.

Here are some examples of converting strings that are in other bases to good old base 10 numbers.

>>> int('010101',2)
21
>>> int('321',4)
57
>>> int('2ac',16)
684

In base 2, the place values are 32, 16, 8, 4, 2, 1. The string 10101 is evaluated as 1\times16 + 1times4 + 1 = 21.

In base 4, the place values are 16, 4 and 1. The string 321 is evaluated as 3\times16 + 2\times4 + 1 = 57.

Recall from Octal and Hexadecimal – Counting by 8’s or 16’s that we have to press additional symbols into service to represent base 16 numbers. We use the letters a-f for the digits after 9. The place values are 256, 16, 1; the string 2ac is evaluated as 2\times256 + 10\times16 + 12 = 684.

While it seems so small, it’s really important that numbers in another base are written using strings. To Python, 123 is a decimal number. '123' is a string, and could mean anything. When you say int('123',4), you’re telling Python that the string '123' should be interpreted as base 4 number, which maps to 27 in base 10 notation. On the other hand, when you say int('123'), you’re telling Python that the string '123' should be interpreted as a base 10 number, which is 123.

int(object[, base]) → number
Generates an integer from the value object. If object is a string, and base is supplied, object must be proper number in the given base. If base is omitted, and object is a string, it must be decimal.

Operators for Bit Manipulation

We’ve already seen the usual math operators: +, -, *, /, %, **; as well as a large collection of mathematical functions. While these do a lot, there are still more operators available to us. In this section, we’ll look at operators that directly manipulate the binary representation of numbers. The inhabitants of Binome (see Binary Codes are more comfortable with these operators than we are.

We won’t wait for the FAQ’s to explain why we even have these operators. These operators exist to provide us with a view of the real underlying processor. Consequently, they are used for some rather specialized purposes. We present them because they can help newbies get a better grip on what a computer really is.

In this section, we’ll see a lot of hexadecimal and octal numbers. This is because base 16 and base 8 are also nifty short-hand notation for lengthy base 2 numbers. We’ll look at hexadecimal and octal numbers first. Then we’ll look at the bit-manipulation operators.

There are some other operators available, but, strictly speaking, they’re not arithmetic operators, they’re logic operations. We’ll return to them in Processing Only When Necessary : The if Statement.

Precedence. We know one basic precedence rule that applies to multiplication and addition: Python does multiplication first, and addition later. The second rule is that () ‘s group things, which can change the precedence rules. 2*3+4 is 10, but 2*(3+4) is 14.

Where do these special operators fit? Are they more important than multiplication? Less important than addition? There isn’t a simple rule. Consequently, you’ll often need to use ()‘s to make sure things work out the way you want.

The ~ operator

The unary ~ operator flops all the bits in a plain or long integer. 1’s become 0’s and 0’s become 1’s. Note that this will have unexpected consequences when the bits are interpreted as a decimal number.

>>> ~0x12345678
-305419897
>>> hex(~0x12345678)
'-0x12345679'

What makes this murky is the way Python interprets the number has having a sign. The computer hardware uses a very clever trick to handle signed numbers. First, let’s visualize the unsigned, binary number line, it has 4 billion positions. At the left we have all bits set to zero. In the middle we have a value where the 2-billionth place is 1 and all other values are zero. At the right we have all bits set to one.

../_images/p3c4-fig4.png

The Basic Number Line

Now, let’s redraw the number line with positive and negative signs. Above the line, we put the signed values that Python will show us. Below the line, we put the internal codes used. The positive numbers are what we expected: 0x00000000 is the full 32-bit value for zero, 1 is 0x00000001; no surprise there. Below the 2 billion, we put 0x7fffffff. That’s the full 32-bit value for positive 2 billion (try it in Python and see.) Below the -2 billion, we put 0x80000000, the full 32-bit value for -2 billion. Below the -1, we put 0xffffffff.

../_images/p3c4-fig5.png

Encoding Signs On The Number Line

This works very nicely. Let’s start with -2 (0xfffffffe). We add 1 to this and get -1 (0xffffffff), just what we want. We add 1 to that and get 0x00000000, and we have to carry the 1 into the next place value. However, there is no next place value, the 1 is discarded, and we have a good-old zero.

This technique is called 2’s complement . Consequently, the ~ operation is mathematically equivalent to adding 1 and switching the number’s sign between positive and negative.

This operator has the same very high precedence as the ordinary negation operation, - . Try the following to see what happens. First, what’s the value of -5+4 ? Now, add the two possible () ‘s and see which result is the same: (-5)+4 and -(5+4) . The one the produces the same result as -5+4 reveals which way Python performs the operations.

Here are some examples of special ops mixed with ordinary operations.

>>> -5+4
-1
>>> -(5+4)
-9
>>> (-5)+4
-1

The & operator

The binary & operator returns 1-bits everywhere that the two input bits are both 1. Each result bit depends on one input bit and the other input bit both being 1. The following example shows all four combinations of bits that work with the & operator.

>>> 0&0, 1&0, 1&1, 0&1
(0, 0, 1, 0)

Here’s the same kind of example, combining sequences of bits. This takes a bit of conversion to base 2 to understand what’s going on.

>>> 3 & 5
1

The number 3, in base 2, is 0011 . The number 5 is 0101 . Let’s match up the bits from left to right:

  0 0 1 1
& 0 1 0 1
  -------
  0 0 0 1

This is a very low-priority operator, and almost always needs parentheses when used in an expression with other operators. Here are some examples that show you how & and + combine.

>>> 3&2+3
1
>>> 3&(2+3)
1
>>> (3&2)+3
5

The ^ operator

The binary ^ operator returns a 1-bit if one of the two inputs are 1 but not both. This is sometimes called the exclusive or operation to distinguish it from the inclusive or . Some people write “and/or” to emphasize the inclusive sense of or. They write “either-or” to emphasize the exclusive sense of or.

>>> 3^5
6

Let’s look at the individual bits

  0 0 1 1
^ 0 1 0 1
  -------
  0 1 1 0

Which is the binary representation of the number 6.

This is a very low-priority operator, be sure to parenthesize your expression correctly.

The | operator

The binary | operator returns a 1-bit if either of the two inputs is 1. This is sometimes called the inclusive or to distinguish it from the exclusive or` operator.

>>> 3|5
7

Let’s look at the individual bits.

  0 0 1 1
| 0 1 0 1
  -------
  0 1 1 1

Which is the binary representation of the number 7.

This is a very low-priority operator, and almost always needs parentheses when used in an expression with other operators. When we combine & ‘s and | ‘s we have to be sure we’ve grouped them properly. Here’s the kind of thing that you’ll sometimes see in programs that build up specific patterns of bits.

>>> 3&0x1f | 0x80 | 0x100
387
>>> hex(_)
'0x183'

Let’s look at this in a little bit of detail. Our first expression has two or operations, they’re the lowest priority operators. The first or operation has 3&0x1f or 0x80 . So, Python does the following steps to evaluate this expression.

  1. Calculate the and of 3 and 0x1f . This is 3 (try it and see.) You can work it out by hand if you know that 3 is 0-0-0-1-1 in binary and 0x1f is 1-1-1-1-1.
  2. Calculate the or of the previous result (3) and 0x80 .
  3. Calculate the or of the previous result ( 0x83 ) and 0x100 . This has the decimal value of 387.
  4. Calculate the hex string for the previous result, using the _ short-hand for the previously printed result. This shows that the hex value is 0x183 , what we expected.

The << Operator

The << is the left-shift operator. The left argument is the bit pattern to be shifted, the right argument is the number of bits. This is mathematically equivalent to multiplying by a power of two, but much, much faster. Shifting left 3 positions, for example, multiplies the number by 8.

This operator is higher priority than & , ^ and | . Be sure to use parenthesis appropriately.

>>> 0xA << 2
40

0xA is hexadecimal; the bits are 1-0-1-0. This is 10 in decimal. When we shift this two bits to the left, it’s like multiplying by 4. We get bits of 1-0-1-0-0-0. This is 40 in decimal.

The >> Operator

The >> is the right-shift operator. The left argument is the bit pattern to be shifted, the right argument is the number of bits. Python always behaves as though it is running on a 2’s complement computer. The left-most bit is always the sign bit, so sign bits are shifted in. This is mathematically equivalent to dividing by a power of two, but much, much faster. Shifting right 4 positions, for example, divides the number by 16.

This operator is higher priority than &, ^ and | . Be sure to use parenthesis appropriately.

>>> 80 >> 3
10

The number 80, with bits of 1-0-1-0-0-0-0, shifted right 3 bits, yields bits of 1-0-1-0, which is 10 in decimal.

Tip

Debugging Special Operators

The most common problems with the bit-fiddling operators is confusion about the relative priority of the operations. For conventional arithmetic operators, ** is the highest priority, * and / are lower priority and + and - are the lowest priority. However, among &, ^ and |, << and >> it isn’t obvious what the priorities are or should be.

When in doubt, add parenthesis to force the order you want.

Special Ops Exercises

  1. Bit Masking.

    One common color-coding scheme uses three distinct values for the level of red, green and blue that make up each picture element (pixel) in an image. If we allow 256 different levels of red, green and blue, we can mash a single pixel in 24 bits. We can then cram 4 pixels into 3 plain-integer values. How do we unwind this packed data?

    We’ll have to use our bit-fiddling operators to unwind this compressed data into a form we can process. First, we’ll look at getting the red, green and blue values out of a single plain integer.

    We can code 256 levels in 8 bits, which is two hexadecimal digits. This gives us a red, green and blue levels from 0x00 to 0xFF (0 to 255 decimal). We can string the red, green and blue together to make a larger composite number like 0x0c00a2 for a very bluish purple.

    What is 0x0c00a2 & 0xff? Is this the blue value of 0xa2? Does it help to do hex( 0x0c00a2 & 0xff)?

    What is (0x0c00a2 & 0xff00) >> 8? hex( (0x0c00a2 & 0xff00) >> 8 )?

    What is (0x0c00a2 & 0xff0000) >> 16? hex( (0x0c00a2 & 0xff0000) >> 16 )?

  2. Division.

    How can we break a number down into different digits?

    What is 1956 / 1000? 1956 % 1000?

    What is 956 / 100? 956 % 100?

    What is 56 / 10? 56 % 10?

    What happens if we do this procedure with 1956., 956. and 56. instead of 1956 , 956 and 56? Can we use the // operator to make this work out correctly?

Special Ops FAQ’s

Why is there bit-fiddling?
Some processing requires manipulating individual bits. In particular, sound and image data is often coded up in a way that is best processed using these bit-fiddling operations. Additionally, the various compression schemes like MP3 and JPEG use considerable manipulation of individual bits of data.
Why are there two division operators?
Sometimes we expect division to create precise answers, usually the floating-point equivalents of fractions. Other times, we want a rounded-down integer result. One way to work around this problem is to add lots of int functions to force integer operations. Another way is to provide two division operators with different meanings.

Table Of Contents

Previous topic

Extra Functions: math and random

Next topic

Peeking Under the Hood

This Page