:py:mod:`kosmos.ml.models.vqc.circuit.qiskit_circuit.gradient_method`
=====================================================================

.. py:module:: kosmos.ml.models.vqc.circuit.qiskit_circuit.gradient_method


Module Attributes
-----------------

.. py:data:: DEVICE
   :value: 'cpu'


Classes
-------

.. py:class:: GradientMethod

   Bases: :py:class:`abc.ABC`

   Abstract base class for quantum-circuit gradient methods.

   Implementations may follow analytic rules (e.g., parameter-shift) or stochastic
   gradient-free approaches (e.g., SPSA). All subclasses must implement the ``jacobian``
   method that computes d(outputs)/d(weights).

   .. admonition:: Notes

      This package provides two implementations:
      
      - ``ParameterShiftRule``: exact, low variance, but requires 2 evaluations per parameter.
      - ``SPSA``: stochastic, higher variance, but requires only 2 evaluations per sample
        independent of parameter count.

   Initialize the gradient method.


   |

   .. rubric:: Methods

   .. py:method:: set_parameterized_circuit(parameterized_circuit: QiskitParameterizedCircuit) -> None

      Assign the parameterized circuit to be used by the gradient method.

      This must be called before computing gradients.

      :param parameterized_circuit: The Qiskit parameterized circuit.
      :type parameterized_circuit: QiskitParameterizedCircuit


   .. py:method:: validate_parameterized_circuit() -> None

      Validate that the parameterized circuit has been set.


   .. py:method:: jacobian(x: kosmos.ml.typing.TensorNpArray, weights: numpy.ndarray) -> torch.Tensor

      Compute d(outputs)/d(weights).

      :param x: Input values, shape (len_x, input_dim).
      :type x: TensorNpArray
      :param weights: Weights values.
      :type weights: np.ndarray

      :returns: Jacobian of shape (len_x, output_dim, followed by weights.shape).
      :rtype: torch.Tensor


----

.. py:class:: ParameterShiftRule(shift: float = np.pi / 2)

   Bases: :py:class:`GradientMethod`

   Gradient computation using the parameter-shift rule.

   The parameter-shift rule is an analytic approach to computing gradients in variational quantum
   circuits. The gradient of an expectation value can be computed exactly using two circuit
   evaluations per parameter:

   .. math:: f'(\theta) = \frac{f(\theta + s) - f(\theta - s)}{2 \sin(s)}

   In the common case of Pauli rotations, the canonical shift is :math:`s = \pi/2`.

   .. admonition:: Notes

      - The method provides low-variance, unbiased gradients.
      - Computational cost scales linearly with the number of parameters (two evaluations per
        parameter).
      - Requires the circuit to be differentiable in the parameter of interest and the underlying
        generator to have a known spectrum.

   Initialize the parameter-shift rule.

   :param shift: Shift magnitude. Defaults to π/2.
   :type shift: float


   |

   .. rubric:: Methods

   .. py:method:: jacobian(x: kosmos.ml.typing.TensorNpArray, weights: numpy.ndarray) -> torch.Tensor

      Compute d(outputs)/d(weights) via parameter-shift.

      :param x: Input values, shape (len_x, input_dim).
      :type x: TensorNpArray
      :param weights: Weights values.
      :type weights: np.ndarray

      :returns: Jacobian of shape (len_x, output_dim, followed by weights.shape).
      :rtype: torch.Tensor


----

.. py:class:: SPSA(epsilon: float = 0.01, num_samples: int = 3)

   Bases: :py:class:`GradientMethod`

   Gradient computation using Simultaneous Perturbation Stochastic Approximation (SPSA).

   SPSA is a gradient-free method that estimates all partial derivatives using only two circuit
   evaluations per random perturbation, independent of the number of parameters. A random
   perturbation vector :math:`\Delta` is drawn from a symmetric Bernoulli distribution
   :math:`\{-1, +1\}`, which is optimal in the sense of minimizing estimator variance
   (Sadegh & Spall, 1998).

   The gradient estimate for parameter :math:`i` is:

   .. math::
          \hat{g}_i \approx \frac{f(\theta + \epsilon \Delta) - f(\theta - \epsilon \Delta)}
          {2\epsilon \Delta_i}

   Multiple samples are averaged to reduce variance.

   .. admonition:: Notes

      - Cost is :math:`O(\mathrm{num\_samples})` compared to :math:`O(\mathrm{num\_parameters})`
        of the parameter-shift rule.
      - Produces an unbiased gradient estimator under mild regularity.
      - Typically more robust to noise than analytic gradient methods such as parameter-shift.
      - Only gradient estimation is implemented here; the optimizer gain sequences :math:`a_k`
        and :math:`c_k` are not part of this component.
      - This implementation follows standard formulations used in quantum optimization, such as
        PennyLane's ``SPSAOptimizer``
        (https://docs.pennylane.ai/en/stable/_modules/pennylane/optimize/spsa.html#SPSAOptimizer.compute_grad).

   Initialize the SPSA gradient method.

   :param epsilon: Perturbation magnitude. Defaults to 0.01.
   :type epsilon: float
   :param num_samples: Number of random perturbation samples to average over.
                       Defaults to 3.
   :type num_samples: int


   |

   .. rubric:: Methods

   .. py:method:: jacobian(x: kosmos.ml.typing.TensorNpArray, weights: numpy.ndarray) -> torch.Tensor

      Compute d(outputs)/d(weights) via SPSA.

      :param x: Input values, shape (len_x, input_dim).
      :type x: TensorNpArray
      :param weights: Weights values.
      :type weights: np.ndarray

      :returns: Jacobian of shape (len_x, output_dim, followed by weights.shape).
      :rtype: torch.Tensor