libcamera: software_isp: Apply CCM in debayering

This patch applies color correction matrix (CCM) in debayering if the CCM is specified. Not using CCM must still be supported for performance reasons. The CCM is applied as follows: [r1 g1 b1] [r] [r2 g2 b2] * [g] [r3 g3 b3] [b] The CCM matrix (the left side of the multiplication) is constant during single frame processing, while the input pixel (the right side) changes. Because each of the color channels is only 8-bit in software ISP, we can make 9 lookup tables with 256 input values for multiplications of each of the r_i, g_i, b_i values. This way we don't have to multiply each pixel, we can use table lookups and additions instead. Gamma (which is non-linear and thus cannot be a part of the 9 lookup tables values) is applied on the final values rounded to integers using another lookup table. Because the changing part is the pixel value with three color elements, only three dynamic table lookups are needed. We use three lookup tables to represent the multiplied matrix values, each of the tables corresponding to the given matrix column and pixel color. We use int16_t to store the precomputed multiplications. This seems to be noticeably (>10%) faster than `float' for the price of slightly less accuracy and it covers the range of values that sane CCMs produce. The selection and structure of data is performance critical, for example using bytes would add significant (>10%) speedup but would be too short to cover the value range. The color lookup tables can be represented either as unions, accommodating tables for both the CCM and non-CCM cases, or as separate tables for each of the cases, leaving the tables for the other case unused. The latter is selected as a matter of preference. The tables are copied (as before), which is not elegant but also not a big problem. There are patches posted that use shared buffers for parameters passing in software ISP (see software ISP TODO #5) and they can be adjusted for the new parameter format. Color gains from white balance are supposed not to be a part of the specified CCM. They are applied on it using matrix multiplication, which is simple and in correspondence with future additions in the form of matrix multiplication, like saturation adjustment. With this patch, the reported per-frame slowdown when applying CCM is about 45% on Debix Model A and about 75% on TI AM69 SK. Using std::clamp in debayering adds some performance penalty (a few percent). The clamping is necessary to eliminate out of range values possibly produced by the CCM. If it could be avoided by adjusting the precomputed tables some way then performance could be improved a bit. Signed-off-by: Milan Zamazal <mzamazal@redhat.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Signed-off-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
author: Milan Zamazal <mzamazal@redhat.com> 2025-03-26 10:08:47 +0100
committer: Kieran Bingham <kieran.bingham@ideasonboard.com> 2025-03-26 10:45:01 +0000
commit: e2b4000dc9cc9440234540a4fe28d2f08d84f5f3 (patch)
tree: df86409448ea57e2072af3526fa6f15f057c560b /src/ipa/simple/algorithms
parent: ac3068655643a8b2e9a5d002ad6fab104832e1c0 (diff)
2 files changed, 42 insertions, 12 deletions
diff --git a/src/ipa/simple/algorithms/lut.cpp b/src/ipa/simple/algorithms/lut.cpp
index 3a3daed7..a06cdeba 100644
--- a/src/ipa/simple/algorithms/lut.cpp
+++ b/src/ipa/simple/algorithms/lut.cpp
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: LGPL-2.1-or-later */
 /*
- * Copyright (C) 2024, Red Hat Inc.
+ * Copyright (C) 2024-2025, Red Hat Inc.
  *
  * Color lookup tables construction
  */
@@ -80,6 +80,11 @@ void Lut::updateGammaTable(IPAContext &context)
 	context.activeState.gamma.contrast = contrast;
 }
 
+int16_t Lut::ccmValue(unsigned int i, float ccm) const
+{
+	return std::round(i * ccm);
+}
+
 void Lut::prepare(IPAContext &context,
 		  [[maybe_unused]] const uint32_t frame,
 		  [[maybe_unused]] IPAFrameContext &frameContext,
@@ -91,22 +96,46 @@ void Lut::prepare(IPAContext &context,
 	 * observed, it's not permanently prone to minor fluctuations or
 	 * rounding errors.
 	 */
-	if (context.activeState.gamma.blackLevel != context.activeState.blc.level ||
-	    context.activeState.gamma.contrast != context.activeState.knobs.contrast)
+	const bool gammaUpdateNeeded =
+		context.activeState.gamma.blackLevel != context.activeState.blc.level ||
+		context.activeState.gamma.contrast != context.activeState.knobs.contrast;
+	if (gammaUpdateNeeded)
 		updateGammaTable(context);
 
 	auto &gains = context.activeState.awb.gains;
 	auto &gammaTable = context.activeState.gamma.gammaTable;
 	const unsigned int gammaTableSize = gammaTable.size();
-
-	for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
-		const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
-				   gammaTableSize;
-		/* Apply gamma after gain! */
-		const RGB<float> lutGains = (gains * i / div).min(gammaTableSize - 1);
-		params->red[i] = gammaTable[static_cast<unsigned int>(lutGains.r())];
-		params->green[i] = gammaTable[static_cast<unsigned int>(lutGains.g())];
-		params->blue[i] = gammaTable[static_cast<unsigned int>(lutGains.b())];
+	const double div = static_cast<double>(DebayerParams::kRGBLookupSize) /
+			   gammaTableSize;
+
+	if (!context.ccmEnabled) {
+		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
+			/* Apply gamma after gain! */
+			const RGB<float> lutGains = (gains * i / div).min(gammaTableSize - 1);
+			params->red[i] = gammaTable[static_cast<unsigned int>(lutGains.r())];
+			params->green[i] = gammaTable[static_cast<unsigned int>(lutGains.g())];
+			params->blue[i] = gammaTable[static_cast<unsigned int>(lutGains.b())];
+		}
+	} else if (context.activeState.ccm.changed || gammaUpdateNeeded) {
+		Matrix<float, 3, 3> gainCcm = { { gains.r(), 0, 0,
+						  0, gains.g(), 0,
+						  0, 0, gains.b() } };
+		auto ccm = gainCcm * context.activeState.ccm.ccm;
+		auto &red = params->redCcm;
+		auto &green = params->greenCcm;
+		auto &blue = params->blueCcm;
+		for (unsigned int i = 0; i < DebayerParams::kRGBLookupSize; i++) {
+			red[i].r = ccmValue(i, ccm[0][0]);
+			red[i].g = ccmValue(i, ccm[1][0]);
+			red[i].b = ccmValue(i, ccm[2][0]);
+			green[i].r = ccmValue(i, ccm[0][1]);
+			green[i].g = ccmValue(i, ccm[1][1]);
+			green[i].b = ccmValue(i, ccm[2][1]);
+			blue[i].r = ccmValue(i, ccm[0][2]);
+			blue[i].g = ccmValue(i, ccm[1][2]);
+			blue[i].b = ccmValue(i, ccm[2][2]);
+			params->gammaLut[i] = gammaTable[i / div];
+		}
 	}
 }
 
diff --git a/src/ipa/simple/algorithms/lut.h b/src/ipa/simple/algorithms/lut.h
index 889f864b..77324800 100644
--- a/src/ipa/simple/algorithms/lut.h
+++ b/src/ipa/simple/algorithms/lut.h
@@ -33,6 +33,7 @@ public:
 
 private:
 	void updateGammaTable(IPAContext &context);
+	int16_t ccmValue(unsigned int i, float ccm) const;
 };
 
 } /* namespace ipa::soft::algorithms */
author	Milan Zamazal <mzamazal@redhat.com>	2025-03-26 10:08:47 +0100
committer	Kieran Bingham <kieran.bingham@ideasonboard.com>	2025-03-26 10:45:01 +0000
commit	e2b4000dc9cc9440234540a4fe28d2f08d84f5f3 (patch)
tree	df86409448ea57e2072af3526fa6f15f057c560b /src/ipa/simple/algorithms
parent	ac3068655643a8b2e9a5d002ad6fab104832e1c0 (diff)