yes, disaggregated prefill has improved by now, although it still takes some custom work to set it up correctly. The original challenge of multi-user serving with heterogeneous load patterns and priorities remains, though.
prefill and decode can be processed in the same batch because the operations on token level are identical. During prefill, we simply throw away the logits from all but the very last prompt token. There is nothing special in chunked prefill in how prefill and decode can be combined in a single batch. It's just that it makes more sense to do it with chunked prefill because otherwise the decode will take ages if it was scheduled in the same batch as a very long prefill.