It seems like your "Greater or equal" is costing a lot.
It seems like you're sending data from the host via DMA FIFO and are using the first 32 bits as some kind of signal when to stop. Can you not replace this with a single boolean to signal the endof the data? Even comparing to 0 is relatively expensive for a 32-bit number. Switching to a simple boolean (for example, first byte of the input data instead of the whole 32-bit number).
You can then send a "stop" signal as a simple boolean from the host to tell the loop when to stop.
The "fan out" error is an optimisation problem of the xilinx compiler and has nothing to do with the "fanout" of your 32-bit number to the digital outputs.