This is a note about DeepSeek-V2 Multi-Head Latent Attention.
This is a note about DeepSeek-V2 Multi-Head Latent Attention.
This is a note about throughput and latency from the perspective of Software and Hardware.
Roofline Model for Performance Analysis
A note about Roofline Model.