In your question you focus on the pixels and using the same pixels as if the difference must be due to or related to the pixels. It's not. The difference is a difference in noise in the image that is not sourced from the sensels/pixels. You may have previously heard that one reason FF cameras have better low light performance is because they have larger pixels and possibly that idea stuck with you. That's true but now meaningless. The noise that comes from the sensels is read noise. Go back 15 years and we could see examples of read noise in our images and see that larger sensels generated less read noise. Times change. They have engineered the read noise in our modern sensors down to a level that is insignificant.
The size of the sensels/pixels is responsible for a difference that you now can't see if even detect. So the fact that when an FX sensor is placed in DX mode the same pixels are still being used is pretty meaningless. Again we can look to sensor DR for verification of this:
Canon R5-6 The R6 is a 24mp FF camera while the R5 is a 45mp FF camera. The sensels/pixels in the R5 are only half the size of those in the R6.
The smaller sensels/pixels must be noisier but they're not. The DR plots for the two cameras overlay.
Pixel size doesn't mean squat. It used to, but we fixed that.
Read noise was always a secondary source of less importance than shot noise which is the dominant source of noise in our images. The pixels and their size have nothing to do with shot noise. The noise is in the signal itself (the light) and the only way to reduce the noise is strengthen the signal (light) -- either more exposure and/or more total signal collected.
Back to the cookie tins in the rain analogy. The rain is dirty (the light is noisy). When we collect more water the dirt in the water is less visible. As we collect less water the dirt in the water becomes more visible. When we collect more light (by total area) the noise in the light is less visible (and the pixels aren't involved). When we collect less light (by total area) the noise in the light is more visible (and the pixels aren't involved).
Below is quoted from Richard Butler's article in DPReview on noise:
What's that noise? Part one: Shedding some light on the sources of noise
[my bold] "There are three factors that affect how much light is available for your sensor to capture: your shutter speed, f-number and the
size of your sensor.
...at the same f-number (both cameras set to F2.8), the full frame camera will see four times as much light as a camera with a Four Thirds sensor, since it is exposed to the same light-per-unit-area
but has a sensor with four times the area.
As a result, when you shoot two different sized sensors with the same shutter speed, f-number and ISO, the camera with the smaller sensor has to produce the same final image brightness (which the ISO standard demands) from less total light. And, since we've established that
capturing more light improves your signal-to-noise ratio, this means every output tone from the larger sensor will have a better signal-to-noise ratio, so will look cleaner."