Introduction to Quantum Computing

Introduction

Boiled down, quantum computation allows programmers to efficiently compute matrix multiplications on large spaces in constant time. All of the implications of this are not immediately apparent. Exploiting various quirks and properties of quantum systems can allow for computations to be performed before their dependencies are resolved (thus seeming to break causality), vast input spaces to be analyzed in a single operation, and encryptions methods rendered obsolete. In this post I will try to explain a little about Quantum Mechanics (if you are familiar you may skip that section) then dive into common quantum computation idioms and techniques. Furthermore I'll share an open source python library I have been developing called QIP so you can test algorithms on your own computer with a tensorflow-like api.

I will be referencing Griffiths Introduction to Quantum Mechanics (ISBN: 1107179866) for the physics material and Quantum Computation and Quantum Information by Nielsen and Chuang (ISBN 9781107002173) for the Quantum Computing material.

Quantum Mechanics in a nutshell

Quantum Mechanics has many properties which may be very unintuitive at first, but I aim to give a quick summary of the important points and intuition behind them so that Quantum Computing doesn't seem like magic.

The first hurdle to consider is that of the wave functions. A wave function is a function, commonly written \(\psi\), which maps to a complex number. For example you may look at the wave function with respect to position \(\psi(x)\). The classical intuition of the wave function is that \(\lvert\psi(x)\rvert^2\) is the probability of finding the particle at position \(x\). Since the sum of all probabilities must add up to one, you may see a lot of normalization constants like \(\frac{1}{\sqrt{2}}\) floating around.

These wave functions must obey the Schrödinger Equation: \[ i \hbar \frac{\partial}{\partial_t} \psi(\vec{x},t) = \hat{H}\psi(\vec{x},t) \]

Here, \(\hat{H}\) is the hamiltonian of the system - the total energy operator. An operator is a mathematical object which takes a wave function and returns another, possibly scaled by some value, for example the operator \(\hat{Q}\) on \(\psi_1\) may give \(\hat{Q}\psi_1 = \alpha \psi_2\) (I will usually denote operators with a hat to distinguish them from constants, though keep in mind derivatives are operators as well). The hamiltonian will take a wave function with energy \(E\), let's call it \(\psi_E\), and return that wave function multiplied by \(E\). Since the hamiltonian is an operator which scales certain functions by a fixed value, those functions are said to be the eigenfunctions of the hamiltonian. \[ \hat{H}\psi_E = E \psi_E \] These operators are also linear, so given \(\psi_{E_1}\) and \(\psi_{E_2}\) with respective energies we find that operating on \(\psi = \frac{1}{\sqrt{2}}(\psi_{E_1} + \psi_{E_2})\) gives \[\hat{H}\psi =\frac{1}{\sqrt{2}}\hat{H}(\psi_{E_1} + \psi_{E_2}) = \frac{1}{\sqrt{2}}(E_1 \psi_{E_1} + E_2 \psi_{E_2})\] Notice that \(\frac{1}{\sqrt{2}}(\psi_{E_1} + \psi_{E_2})\) is not "a wave function with energy \(E_1 + E_2\)" since \[\frac{1}{\sqrt{2}}\hat{H}(\psi_{E_1} + \psi_{E_2}) = \frac{1}{\sqrt{2}}( E_1\psi_{E_1} + E_2\psi_{E_2}) \neq \frac{1}{\sqrt{2}}(E_1 + E_2)(\psi_{E_1} + \psi_{E_2})\] Thus \(\frac{1}{\sqrt{2}}(\psi_{E_1} + \psi_{E_2})\) is not an eigenfunction of \(\hat{H}\) and does not have an associated eigenvalue (thus no single associated energy).
This is all to say that the energy of \(\psi_{E_1} + \psi_{E_2}\) is not \(E_1 + E_2\) but rather \(E_1\) or \(E_2\) (more on this later).

These eigenfunctions are orthogonal, meaning we can create a basis out of them and switch over to doing linear algebra to avoid differential equations.

Let the wave function \(\psi_{E_1}\) be the vector \( \left(\begin{smallmatrix}1\\0\end{smallmatrix}\right) \) and let \(\psi_{E_2}\) be \( \left(\begin{smallmatrix}0\\1\end{smallmatrix}\right) \). We may now write our hamiltonian as \[\hat{H}=\begin{bmatrix}E_1 & 0\\0 & E_2\end{bmatrix}\]
A state like \(\psi = \alpha \psi_{E_1} + \beta \psi_{E_2}\) could thus be written as \(\psi = \alpha \left(\begin{smallmatrix}1\\0\end{smallmatrix}\right) + \beta \left(\begin{smallmatrix}0\\1\end{smallmatrix}\right) = \left(\begin{smallmatrix} \alpha \\ \beta \end{smallmatrix}\right) \)
Using the Hamiltonian as an operator is simply matrix multiplication: \[\hat{H} \left(\begin{smallmatrix}\alpha \\ \beta \end{smallmatrix}\right)=\begin{bmatrix}E_1 & 0\\0 & E_2\end{bmatrix} \left(\begin{matrix} \alpha \\ \beta \end{matrix}\right) = \alpha E_1 \left(\begin{matrix} 1\\ 0 \end{matrix}\right) + \beta E_2 \left(\begin{matrix} 0 \\ 1\end{matrix}\right) \] Which agrees with \[\hat{H}(\alpha \psi_{E_1} + \beta \psi_{E_2}) = \alpha E_1 \psi_{E_1} + \beta E_2 \psi_{E_2}\] You can see that all the previously described properties of \(\hat{H}\) are preserved. We will be using this matrix notation for all the Quantum Computing operations described hereafter, though instead of \( \left(\begin{smallmatrix}1\\0\end{smallmatrix}\right) \) and \( \left(\begin{smallmatrix}0\\1\end{smallmatrix}\right) \) for the energy eigenfunctions we will use \(\lvert 0 \rangle\) and \(\lvert 1 \rangle\) (the zero bit is the \(\psi_{E_1}\) state and the one bit is the \(\psi_{E_2}\)). This is called braket notation.

One last note to wrap up braket notation is multiple particles, given two particles in states \(\lvert a \rangle\) and \(\lvert b \rangle\) we will write the combined state as \(\lvert a b \rangle\) (order sensitive). This is simply the product of each state: \(\lvert a b \rangle = \lvert a \rangle \otimes \lvert b \rangle\). Operators must explicitly operate on one of the particles, for example \(\hat{Q}^{(0)}\) may operate on the first particle (switching over to zero indexing to get into a CS mindset) . The remaining indices can be thought of being operated on by the identity: \(\hat{Q}^{(0)} \rightarrow \hat{Q} \otimes \mathbb{I}\)

Let's say \[\hat{Q} =\frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1 \\ 1 & -1\end{bmatrix}\] Where the factor of \(\frac{1}{\sqrt{2}}\) is a normalization constant so that any wave function operated on by \(\hat{Q}\) is still normalized.

Thus \(\hat{Q}^{(0)} \lvert 0 b \rangle \implies (\hat{Q}^{(0)} \lvert 0 \rangle) \otimes \lvert b \rangle = \frac{1}{\sqrt{2}}(\lvert 0 \rangle + \lvert 1 \rangle) \otimes \lvert b \rangle = \frac{1}{\sqrt{2}}(\lvert 0 b \rangle + \lvert 1 b \rangle) \).

Finally, let's discuss measurement. I said before that \(\lvert\psi(x)\rvert^2\) was the probability of finding the particle in position \(x\). This can be done for any state of the particle, not just the position. We can get the probability of finding the particle in state \(\psi_0\), given that it's in state \(\psi\), by taking the inner product between the two functions: \[ \int_x \psi_0^\dagger(x) \psi(x) dx \] For the \(x_0\) position we have \(\psi_0(x) = \psi(x) \delta(x - x_0)\) thus giving \[\int_x \psi(x)^\dagger \delta(x - x_0) \psi(x) dx = \lvert\psi(x_0)\rvert^2\] See Dirac delta functions if this doesn't follow.

This inner product works perfectly with our orthogonal vectors of states, the inner product of two states \(\lvert a \rangle\) and \(\lvert b \rangle\) is written \( \langle a \vert b \rangle\) and can be calculated using the dot product on the vector basis discussed earlier. Thus \(\langle 0 \vert 0 \rangle = 1\) and \(\langle 0 \vert 1 \rangle = 0\). As an example, if we wanted to find the probability of the first particle being a zero in the \(\psi = \frac{1}{\sqrt{2}}(\lvert 0 b \rangle + \lvert 1 b \rangle ) \) from before, we simple take the inner product of all possible states where the first is a zero (let's call this \(\psi_0 = \frac{1}{\sqrt{2}}(\lvert 0 0 \rangle + \lvert 0 1 \rangle)\) Now the inner product \[\langle \psi_0 \vert \psi \rangle = \frac{1}{2}( \langle 0 0 \rvert + \langle 0 1 \rvert ) (\lvert 0 b \rangle + \lvert 1 b \rangle) \] \[ = \frac{1}{2} \langle 0 0 \vert 0 b \rangle + \frac{1}{2} \langle 0 1 \vert 0 b \rangle + \frac{1}{2} \langle 0 0 \vert 1 b \rangle + \frac{1}{2} \langle 0 1 \vert 1 b \rangle \] Since each wave function is orthogonal, keep only the ones which can be equal to one another: \[ = \frac{1}{2} \langle 0 0 \vert 0 b \rangle + \frac{1}{2} \langle 0 1 \vert 0 b \rangle \] So if \(b = 0\) then \(\langle \psi_0 \vert \psi \rangle = \frac{1}{2}\) and if \(b = 1\) then \(\langle \psi_0 \vert \psi \rangle = \frac{1}{2}\). So we have concluded that in the state \(\frac{1}{\sqrt{2}}(\lvert 0 b \rangle + \lvert 1 b \rangle )\) there's a 50% chance of finding the first particle in state \(\lvert 0 \rangle\)

As a shortcut, whenever we are measuring the "chance that a state occurs" we can simply square the sum of all states in \(\psi\) which meet the criteria. It's the same as dotting with all possibilities since the possibilities which aren't in \(\psi\) simply drop away: \[ \frac{1}{\sqrt{2}}(\lvert 0 b \rangle + \lvert 1 b \rangle ) \implies \vert \frac{1}{\sqrt{2}}\lvert 0 b \rangle \vert ^2 = \frac{1}{2} \langle 0 b \vert 0 b \rangle = \frac{1}{2} \] Since we can only measure or observe one of the states in \(\psi\) on any given measurement, that also means that when we measure the energy (or any other observable value) of a state, it can only be the eigenvalue for one of the eigenfunctions present in \(\psi\). So if \(\psi = \sqrt{\frac{1}{3}}\psi_{E_1} + \sqrt{\frac{2}{3}} \psi_{E_2}\) then we have a \(\frac{1}{3}\) chance of measuring the energy \(E_1\) and a \(\frac{2}{3}\) chance of measuring \(E_2\).

Note: Any physicists who feel as though I may have left out some key topics in quantum (*cough* related to your particular field *cough*) I completely agree. However, I feel like this introduction is sufficient for understanding quantum computing, and that's the goal for this post.

Quantum Computing

When we consider a quantum computing algorithm, we must first decide how many quantum bits (qubits) we want to work with. These are drawn as wires with gates acting on them. As such any algorithm looks like a classical circuit. Time moves from left to right. Below we have a 3-qubit circuit in which the second qubit has the operator \(\hat{Q}\) applied to it.

\(a\), \(b\), and \(c\) describe the initial condition of the qubit, for example I may write \(\lvert 0 \rangle\) instead and that would mean that the starting state would be \(\lvert 0 0 0 \rangle\)

The circuit as a whole simply performs \(\hat{Q}^{(1)} \lvert 0 0 0 \rangle \) for some operator \(\hat{Q}\)

Let's take a commonly used gate as an example, \(H\), the Hadamard gate (sorry that it happens to share a letter with the Hamiltonian). \[H = \frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1 \\ 1 & -1 \end{bmatrix} \] Acting on \(\lvert a \rangle\) it gives: \[H \lvert 0 \rangle = \frac{1}{\sqrt{2}} (\lvert 0 \rangle + \lvert 1 \rangle) \] \[H \lvert 1 \rangle = \frac{1}{\sqrt{2}} (\lvert 0 \rangle - \lvert 1 \rangle) \] If we were to take the above 3 qubit circuit with \(\hat{Q}\) substituted with \(H\) we would have: \[H^{(1)} \lvert 0 0 0 \rangle = \frac{1}{\sqrt{2}} ( \lvert 0 0 0 \rangle + \lvert 0 1 0 \rangle ) \]

Now if we imagine concatenating two copies of the circuit:

\[H^{(1)}H^{(1)} \lvert 0 0 0 \rangle = H^{(1)} \frac{1}{\sqrt{2}} ( \lvert 0 0 0 \rangle + \lvert 0 1 0 \rangle ) \] \[= \frac{1}{2} ( (\lvert 0 0 0 \rangle + \lvert 0 1 0 \rangle) + (\lvert 0 0 0 \rangle - \lvert 0 1 0 \rangle)) \] \[ = \lvert 0 0 0 \rangle \] So \(H^2\) is the identity. This gate is very useful since it allows us to put circuits into superpositions of each state, thus allowing what seems like the parallel evaluation of the circuit across each possible state. The reality is more complex since only one state can be measured in the end independent of how many states were present during execution. If this were a classical computer executing upon each state you could consider each parallel execution independently and each would provide its own output.

Quantum circuits are entirely defined by a series of these gates, each a unitary transformation (keeps the sum of probabilities at \(1\)). This can pose some problems because unitary matrices have inverses, they don't lose information. Classical gates cannot necessarily be undone - consider the OR gate: If it outputs a 1 you don't know whether the input was 01, 10, or 11. Furthermore qubits adhere to the no-cloning theorem which, put simply, states that we cannot make copies of an unknown state \(\lvert \psi \rangle\) whereas in classical computers we can simply branch the wire in half and use each half in its own computations. Together we can imagine it means that each step of the computation will use the same set of \(n\) qubits as when you started. Reasoning about quantum circuits can be difficult, and constructing them perhaps even more so due to these barriers.

One tool at our disposal to start building these circuits is the control "gate", it is effectively an extention which can be made to another gate to turn it on or off based on the state of a given qubit. The general formula for the control gate \(C(\hat{Q})\) \[ C(\hat{Q}) = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & q_{00} & q_{01} \\ 0 & 0 & q_{10} & q_{11} \end{bmatrix} \] Each row/column represents one of the possible combinations of states of the two qubits: \(\lvert 0 0 \rangle\), \(\lvert 0 1 \rangle\), \(\lvert 1 0 \rangle\), and \(\lvert 1 1 \rangle\). In a circuit it is depicted as follows:

So the CNOT gate will operate on 2 qubits (\(a\) and \(b\)), and effectively operates as "if qubit \(a\) is 0 then leave \(b\) alone, else negate \(b\)": \[ \mbox{CNOT} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix} \] For example: \[\mbox{CNOT} ( \alpha \lvert 0 0 \rangle + \beta \lvert 1 1 \rangle ) = \alpha \lvert 0 0 \rangle + \beta \lvert 1 0 \rangle \]

Let's look at a counterintuitive example which shows how mixing all the above information can lead to useful calculations. First let's assume we have a SWAP gate which acts on two sets of \(m\) qubits \(\psi_m\) and \(\phi_m\). The swap gate simply swaps the states between the qubits: \(\mbox{SWAP} \lvert \psi_m \phi_m \rangle = \lvert \phi_m \psi_m \rangle\)

Let's now examine the following circuit (where × positions mean SWAP):

Let's work out the math for the end state: \[H^{(0)} \lvert 0 \psi \phi \rangle = \frac{1}{\sqrt{2}}(\lvert 0 \psi \phi \rangle + \lvert 1 \psi \phi \rangle) \] \[\mbox{CSWAP}\frac{1}{\sqrt{2}}(\lvert 0 \psi \phi \rangle + \lvert 1 \psi \phi \rangle) = \frac{1}{\sqrt{2}} (\lvert 0 \psi \phi \rangle + \lvert 1 \phi \psi \rangle)\] \[ H^{(0)} \frac{1}{\sqrt{2}} (\lvert 0 \psi \phi \rangle + \lvert 1 \phi \psi \rangle) = \frac{1}{2}((\lvert 0 \psi \phi \rangle + \lvert 1 \psi \phi \rangle) + (\lvert 0 \phi \psi \rangle - \lvert 1 \phi \psi \rangle) )\] This state may seem like nonsense at first, but now let's consider measuring the probability that the zeroth qubit is in the \(\lvert 0 \rangle\) state. \[\frac{1}{2}((\lvert 0 \psi \phi \rangle + \lvert 1 \psi \phi \rangle) + (\lvert 0 \phi \psi \rangle - \lvert 1 \phi \psi \rangle) )\] \[= \frac{1}{2}(\lvert 0 \rangle \otimes (\lvert \psi \phi \rangle + \lvert \phi \psi \rangle) + \lvert 1 \rangle \otimes ( \lvert \psi \phi \rangle - \lvert \phi \psi \rangle) )\] Square the coefficient for \(\lvert 0 \rangle\) to measure: \[\mathbb{P}(\lvert 0 \rangle) = \vert \frac{1}{2} (\lvert \psi \phi \rangle + \lvert \phi \psi \rangle) \vert^2 \] \[ = \frac{1}{4} (\langle \psi \phi \vert \psi \phi \rangle + \langle \phi \psi \vert \phi \psi \rangle + \langle \phi \psi \vert \psi \phi \rangle + \langle \psi \phi \vert \phi \psi \rangle) \] \[= \frac{1}{4} (2 + \langle \phi \psi \vert \psi \phi \rangle + \langle \psi \phi \vert \phi \psi \rangle) \] \(\langle \phi \psi \vert \psi \phi \rangle = \langle \psi \phi \vert \phi \psi \rangle\) \[= \frac{1}{4} (2 + 2 \langle \phi \psi \vert \psi \phi \rangle) \] \(\langle \phi \psi \vert \psi \phi \rangle = \langle \phi \vert \psi \rangle \langle \psi \vert \phi \rangle = \vert \langle \psi \vert \phi \rangle \vert^2 \) \[\mathbb{P}(\lvert 0 \rangle) = \frac{1}{2} + \frac{1}{2} \vert \langle \psi \vert \phi \rangle \vert^2 \] The probability of \(\lvert 0 \rangle\) is a function of the inner product of two \(m\) dimensional vectors, and yet was accessible in constant time (3 operations)! Notice, however, that we cannot actually measure the probability of \(\lvert 0 \rangle\) in a single action, and would have to make repeated (possibly parallel) measurements to get to an arbitrary precision, but for sufficiently large \(m\) this could be more efficient than a classical computer; though the real strength would be from using this as part of a larger quantum algorithm.

"Quantum Computing" at Home

I have been working on the qip library for the last few months, its goal is to provide a tensorflow-like api for quantum computer simulations.

Let's examine the CSWAP example from above in the qip library. We define 3 groups of qubits (q1, q2, and q3). the first represents n=1 qubits whereas the next two will each represent n=2 qubits. Thus giving a total of 5 qubits (currently the library reasonably supports up to about 29 qubits using 24Gb of RAM, though 5 qubits will take effectively no resources).

from qip.qip import Qubit, Measure
from qip.operators import C, H, Swap

q1 = Qubit(n=1)
q2 = Qubit(n=2)
q3 = Qubit(n=2)

h1 = H(q1)
c1, c2, c3 = C(Swap)(h1, q2, q3)
mh1 = H(c1)

m1 = Measure(mh1)

We see here that instead of specifying the index of the qubit to which we are applying each operation we use the output qubit itself. So H is applied to the q1 qubit, which in turn is used in the CSWAP (constructed by appying C to the SwapOp) along with the original q2 and q3. The first input is used as the control with the second two forwarded to whatever operation is being controlled. We then apply H to the new control qubit state. Finally to simulate measurement we use the Measure operation.

We have constructed m1 which contains the entire circuit defined up to that point, we may now run m1 to measure the first qubit using a variety of input states.

Below we choose the following inputs: \[ \lvert q1 \rangle = 1.0 \lvert 0 \rangle + 0.0 \lvert 1 \rangle = \lvert 0 \rangle \] \[ \lvert q2 \rangle = 0.0 \lvert 0 0 \rangle + 0.0 \lvert 0 1 \rangle + 0.0 \lvert 1 0 \rangle + 1.0 \lvert 1 1 \rangle = \lvert 1 1 \rangle \] \[ \lvert q3 \rangle = ~...~ = \lvert 0 0 \rangle \]

out, classic = m1.run(feed={q1: [1.0, 0.0], 
                            q2: [0.0, 0.0, 0.0, 1.0], 
                            q3: [1.0, 0.0, 0.0, 0.0]})

measured, measured_prob = classic[m1]

out gives the quantum state after the circuit is executed, note that since one of the qubits is measured the quantum state only include 4 qubits (q2 + q3) and thus \(2^4\) complex values in the array out (for each combination of \(\lvert a b c d \rangle\) ).

The classic dictionary contains a mapping from each measured object (here just m1) to the value recovered as well as the probability of obtaining that value (which is not available in actual quantum computers but is useful nonetheless). So we expect that given \[\mathbb{P}(\lvert 0 \rangle) = \frac{1}{2} + \frac{1}{2} \vert \langle \psi \vert \phi \rangle \vert^2 \] and \[\langle q_2 \vert q_3 \rangle = \langle 1 1 \vert 0 0 \rangle = 0\] Thus \(\mathbb{P}(\lvert 0 \rangle) = \frac{1}{2}\)

We therefore expect measured to be either 0 or 1 with equal probability and measured_prob to always be 0.5.

Summary and Conclusion

This post does not aim to provide someone with all the tools necessary to begin writing their own quantum algorithms - rather it aims to provide some degree of background on how quantum computers operate. Moving forward I hope to post some more in-depth articles where I can provide some concrete examples of applications of quantum algorithms which can outperform their classical counterparts - specifically Grover's Algorithm for fast database lookup and Shor's Algorithm for finding the prime factors of an integer.