from numpy import asarray, count_nonzero
from typing import Iterable
import numpy as np
import scipy
def determine_independent_subset(A: np.ndarray) -> Iterable:
U, S, VH = np.linalg.svd(A, compute_uv=True)
def compute_rank(S: np.ndarray, tol: float | None = None):
# Provide tol if you have some prior on the problem
# Singular values <= tol are considered 0
if tol is None:
tol = S.max(axis=-1, keepdims=True) * (
max(A.shape[-2:]) * np.finfo(S.dtype).eps)
else:
tol = asarray(tol)[..., np.newaxis]
return count_nonzero(S > tol, axis=-1)
r = compute_rank(S)
Vr = VH[0:r, :]
Q, R, P = scipy.linalg.qr(Vr, pivoting=True, mode='full')
return P[0:r]Subset selection using SVD
Do you want to find the most independent subset of columns of a matrix? This can be useful for example for determining independent subset of exogenous variables, or catching an error when adding a column that is a linear combination of remaining ones. The most general formulation of this would involve checking all size \(k\) subsets of columns. Luckily there is a simple heuristics, which works well for badly conditioned matrices, see Golub et al, Matrix Computations. Surprisingly enough the very good approximation can be obtained with following pretty straightforward procedure:
- Compute svd decomposition, \(A= U S V^H\)
- Determine rank estimate \(r \leq \texttt{rank}(A)\)
- Set \(V_r = V[:, 0:r]^H\) and compute QR with column pivoting \(Q^H V_r P = \begin{bmatrix}R_{11} & R_{12} \end{bmatrix}\)
- Set \(A P = \begin{bmatrix} B_1 & B_2 \end{bmatrix}\) and partition \(P = \begin{bmatrix} P_1 & P_2\end{bmatrix},\) where \(P_1\) represents “the most independent” columns in sense that this corresponds to heuristic that QR with pivoting produces well conditioned \(R_{11}.\)
Python implementation
A = np.array([[1,2, 3], [1, 2, 3],[1, 2, 4]])
s = determine_independent_subset(A)
print(s)[2 1]