The "work" aspect refers to how GGML optimizes these operations for specific hardware. A naive implementation would loop through arrays element-by-element, which is slow. GGML approaches this differently depending on the backend:
So ggmlmediumbin is literally a .
./perplexity -m model.q4_0.bin -f wiki.test.raw ggmlmediumbin work