Skip to content

AI and Machine Learning Applications

One of the most practically significant aspects of the Q-Memory photonic platform is that the same hardware used for quantum computation can also accelerate classical AI workloads — specifically the matrix-vector multiplications that dominate neural network inference and training.

This page explains how the platform achieves AI acceleration, what workloads it targets, and how it compares to GPU-based approaches.

Every layer of a neural network — whether a transformer, convolutional network, or recurrent model — performs some form of matrix-vector multiplication:

$$\mathbf{y} = W\mathbf{x}$$

where $W$ is the weight matrix and $\mathbf{x}$ is the input vector. For large models, this operation is the dominant cost in both inference and training.

The memory bandwidth problem: On conventional hardware (GPU), the bottleneck is moving the weight matrix $W$ from memory to the compute units. For a 1,000 × 1,000 weight matrix, this means reading 4 MB per forward pass. Modern GPUs spend the majority of their time waiting for data, not computing.

The photonic solution: In the optical mesh, $W$ is encoded as the phase configuration of the beam splitter network. The matrix multiplication happens as light propagates through the network — in constant time, regardless of matrix size, with no data movement.

  1. Encode inputs: The input vector $\mathbf{x}$ is encoded as the amplitudes of optical signals injected into the N input ports of the mesh

  2. Program the matrix: The weight matrix $W$ is encoded as phase settings across the optical mesh elements — using either thermal phase shifters (for slowly-varying weights) or non-volatile optical memory elements (for fixed inference weights)

  3. Compute: Light propagates through the mesh. Optical interference implements the matrix multiplication. All elements of the output vector $\mathbf{y}$ are produced simultaneously

  4. Read output: Detectors at the output ports measure the amplitude of each output — giving the result of $W\mathbf{x}$ in a single propagation step

The computation time is determined by the time for light to cross the chip — nanoseconds — regardless of matrix dimensions.

OperationConventional approachPhotonic approach
Matrix-vector multiplyMemory-bound; scales with matrix sizeConstant time — nanoseconds
Weight accessRead from DRAM or HBMEncoded in optical elements
ReprogrammingNot applicableMicroseconds (thermal) to nanoseconds (electro-optic)

The dominant power cost in GPU-based AI inference is memory access — moving weights from DRAM to compute units repeatedly for each forward pass. The photonic platform eliminates this:

  • Non-volatile optical memory holds weight matrices in the mesh without any power
  • No data movement means no memory bandwidth power
  • Computation in optical domain is passive (light propagates without amplification for small networks)

For inference workloads where the same weights are used for thousands of forward passes, the non-volatile optical memory means the weight-holding power is zero — a fundamental difference from DRAM-based approaches.

The same optical hardware runs quantum algorithms and AI matrix operations. A platform scheduled for quantum key distribution workloads in the morning can switch to AI inference acceleration in the afternoon — by reprogramming the same phase elements.

Best fit: Models that perform the same weight matrix operation repeatedly on different inputs — image classifiers, natural language processing, recommendation systems.

The optical mesh size (N modes) sets the maximum matrix dimension it can handle natively. Larger models are handled by block decomposition — breaking large matrices into N × N blocks processed sequentially.

Phase 1 (~64 modes): Suitable for smaller inference models and research demonstrations

Phase 2 (~256 modes): Suitable for layers in production-scale language models

Training is more complex than inference because weights must be updated each iteration. This requires:

  1. Forward pass (matrix multiply — handled optically)
  2. Backward pass (gradient computation — handled by CMOS electronics or host)
  3. Weight update (reprogramming phase elements with new values)

The reprogramming step is the bottleneck for training. With electro-optic phase shifters (nanosecond switching), the reprogramming overhead is small compared to the computation. With thermal phase shifters (microsecond switching), it adds overhead for rapidly-changing weights.

Phase 2 targets running transformer training partially in optics, with CMOS electronics handling gradient accumulation and weight updates.

The attention mechanism in transformer models — central to every large language model — is dominated by matrix multiplications:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) V$$

The $Q K^T$ product and the final $V$ multiplication are both matrix operations that map naturally onto the optical mesh. Phase 2 targets demonstrating attention mechanism acceleration.

MetricGPU (current)Photonic platform (Phase 2)
Matrix compute latencyMicroseconds (memory bound)Nanoseconds (optical)
Static inference power100s of wattsNear zero (non-volatile optical memory)
Reprogramming (weights)Not applicableMicroseconds–nanoseconds
On-chip weight storageExternal DRAMEncoded in optical elements
Quantum capabilityNoneSame hardware, reprogrammed

The photonic platform is not positioned as a general-purpose GPU replacement — GPUs remain superior for tasks involving irregular memory access patterns, branching, and operations that don’t reduce to matrix multiplications. The photonic advantage is specific to dense matrix operations run repeatedly with the same or slowly-changing weights.

PhaseAI Capability
Phase 0Demonstrate optical matrix multiplication with small matrix (4×4) as part of component validation
Phase 1First AI inference demonstration; ~64×64 matrix; non-volatile weight storage
Phase 2Production-scale AI inference; ~256×256 matrix blocks; transformer attention acceleration
Phase 3Full photonic AI training loop; co-scheduled with quantum workloads