Gavjenks, there are, to put it mildly, substantial problems with manufacturing chips that are 4x5 inches, and I am not sure that the properties (really, rules of thumb) that describe how sensors work scale up that far. It's an interesting thought experiment, but I am pretty sure there are likely to be multiple axes of "impossible, or damn near" involved.
I know little or nothing about the manufacturing costs (other than the simple "this many fit on a wafer + defects" stuff that is bandied around blogs all the time, etc.). I'm merely assuming they will go down in price similar to Moore's law over time, like almost all similar electronic devices have in the past. It's hard to verify if this has already been the case, because I can't find any graphs of cost of the sensors alone, but it seems very likely that it has, given the short duration of digital photography and the huge changes we have seen already in that time in affordability.
However, I do know a fair amount about the sensor advantages, and I see no reason why they wouldn't scale with larger pixel pitch. You should certainly see scaling benefits in dynamic range and ISO performance. And you could certainly see benefits in resolution. I'm not 100% sure that you can make a sensor that scales in both, since it would depend on pooling across pixels to get your high ISO benefits, which may not work as well as one might expect (and there are not many examples to look to for answers. There are only one or two pairs of cameras crop and full frame with equal pixel pitch, and they are at completely different levels of professional build, may have different software involved, blah blah). But I suspect that if you made a 250 MP 4x5 sensor, with the right software, you could make it have successfully scaling advantages in ISO, dynamic range, and resolution all at once.
But
certainly one or the other. Dynamic range and ISO are just a result of simple arithmetic of signal:noise ratios. They're essentially just a matter of numbers of photons versus amount of noise (which gets higher, but less quickly than the light gathering area does). And resolution is simply a result of more pixels to work with, pixels which should easily be able to resolve actual information from a decent lens, if they are the same size or larger than modern ones.
There are also other ancillary benefits. For example, if you went with just 18MP but made them all much larger in a 4x5, then your images would not yet be diffraction limited even at f/45 ! And DOF gets effectively narrower, so you could do your "In-camera Brenizer" by just snapping a simple photo. After all, the whole point of the Brenizer method is to make it look like you used a MF or LF camera!
"shoot a bunch of frames at 60fps for a while and try to sort out something from that"
There are technologies that do pretty much exactly this already, in a much more controlled way obviously.
I can't remember the exact name to find a link, but some of those "gigapixel" images you see are made by using essentially a hundred little affordable sensors, arrayed in a hemisphere around a spherical condensing lens. The image comes in from the front from various angles, and depending on the angle, will condense onto a different one of those sensors, each of which has one or two small secondary lenses to control it more. This all allows them to essentially take a hundred precisely lined up panorama images in one instant, which can be stitched together. And the cost of the sensors only scales linearly, not exponentially like if you made it just one big one.
Video:
In the near future, this may very well be the method by which larger format sensors are practically achieved. If it becomes possible to make a larger sensor all in one piece, though, for cheap, then that would still be better, simply because there are fewer small parts that can get out of alignment, etc. and ruin your camera.
Also, using either your method or a sphere camera like in the video, it will look different than a flat sensor, because if you pivot, you are changing your perspective. In order to make it look like a rectilinear image, software has to bend and warp a whole bunch of lines, which can degrade local resolution and do other weird, undesirable things (if you look closely in the video, when they zoom in on part of the gigapixel image, there is a huge curved line of changing resolution in one spot, for example). A single flat sensor avoids such issues.