Microprocessor Design/Multiply and Divide Blocks
Multiply and Divide Problems 
Multiplication and Division operations are significantly more complicated then addition or subtraction operations. This additional complexity leads to more hardware, more complicated hardware, and longer processing time.
In hardware, multiplication and division are performed by a series of sequential additions and arithmetic shifts. for this reason, it is imperative that we have efficient adders and shifters at our disposal.
Multipliers and dividers are composed of shifters and adders. It is typically not possible, or not desirable to to use the main adder and shifter units of the ALU, so a microprocessor will typically have multiple ALU units (a primary unit for addition and subtraction, and units embedded in the multiplication and division units). These are other good reasons why our ALU and shifters need to be small and fast.
Multiplication Algorithms 
Booth's Algorithm 
Cascaded Multiplication 
Division Algorithm 
Multiply and Accumulate 
Multiply and accumulate (MAC) operations perform a multiplication and an addition in a single instruction. For instance, the instruction:
MAC A, B, C
Would perform the operation:
A = A + (B × C)
This is valuable for math-intensive processors, such as graphics processors and DSPs.
An MAC tends to have a long critical path, so if your processor has an MAC operation it is probably possible to include other complicated arithmetic operations.
In a processor with an accumulator architecture, MAC operations will use the accumulator as the destination register, so the instruction:
MAC B, C
Will perform the operation:
ACC = ACC + (B × C)
Fused Multiply-Add 
A fused multiply-add operation is a floating-point operation that is similar to the MAC. However, in the fused operation, the floating-point values are not rounded between the multiply and the add, they are rounded afterwards. For more information about floating-point rounding, see Floating Point.