Should I always use -w 4 for maximum speed?

Only on a headless cracking box. Workload profile 4 hands the GPU the longest possible batches, which maximizes hashrate but makes the machine unresponsive and can cause display driver timeouts on a desktop you are also using. On a workstation, -w 3 is the sane ceiling and -w 2 keeps the UI usable.

What does the -O flag actually trade away?

Optimized kernels (-O) are faster, sometimes much faster, but they cap the maximum password length they will test, often around 31 characters or shorter depending on the mode. If your attack needs longer candidates, -O silently skips them, so you can run a full attack, see no hits, and never know the long passwords were never tried.

Why does my hashrate drop after ten minutes?

Thermal throttling. The first benchmark numbers come from a cool card. Once the die heats up and hits its temperature limit, the GPU lowers clocks to stay safe, and sustained hashrate settles well below the burst figure. Watch it with --hwmon and judge by the steady-state number, not the opening one.

Tuning hashcat for real GPU throughput

The first thing people do with a new GPU is run a benchmark and quote the headline number. That number is real for about thirty seconds. Sustained throughput on a real attack, on a warm card, with the kernel your hash actually uses, is the figure that decides whether a job finishes tonight or next week. Tuning is the gap between those two.

Benchmark the mode you are running

hashcat -b walks every mode and prints a hashrate for each. Useful for comparing cards, useless for planning a specific job, because the algorithm dominates everything.

hashcat -b              # full benchmark, every mode
hashcat -b -m 0         # MD5 only
hashcat -b -m 3200      # bcrypt only

Run those two and the gap is the whole story. MD5 on a current GPU benchmarks in the tens of billions of hashes per second (read 21000.0 MH/s as 21 billion H/s). bcrypt at cost 5 on the same card lands in the low tens of thousands. That is not a typo, it is six or seven orders of magnitude. On fast hashes the GPU is the whole game and tuning matters enormously. On slow hashes the algorithm has already won; a faster card buys you almost nothing, and your effort belongs in the wordlist instead.

Workload profiles

-w sets how aggressively hashcat feeds the GPU, from 1 to 4.

hashcat -m 0 -w 3 hashes.txt rockyou.txt

Profile 1 is for a machine you are actively using. Profile 3 is the default and right for most dedicated runs. Profile 4 squeezes out the last few percent by handing the card very long batches, and on a desktop where the same GPU drives your display that backfires: the screen stutters, input lags, and on some drivers the watchdog kills the kernel for not yielding. Headless box, -w 4. Daily driver, cap it at 3 and do not be surprised when 4 makes the machine unusable for a single-digit speed gain.

Optimized kernels and the length trap

-O enables optimized kernels. They are faster, sometimes by a wide margin. The catch is a hard cap on candidate length, often 31 characters or less depending on the mode, and hashcat will not warn you mid-run that it skipped everything longer.

hashcat -m 0 -O hashes.txt rockyou.txt

This is the gotcha covered in finding the right hashcat mode: a clean run with zero hits does not prove the password is uncrackable, it might prove -O never tried the long ones. Use -O for fast hashes and short masks where you know the length ceiling is irrelevant. Drop it the moment you suspect long passphrases, and rerun without it before you call a hash uncracked.

Thermal throttling is why your numbers fall

Benchmarks run cold. Ten minutes into a real job the die is hot, the card hits its temperature limit, and the firmware lowers clocks to protect itself. Sustained hashrate settles below the burst figure, sometimes far below on a cramped case with poor airflow.

hashcat -m 0 -w 3 --hwmon-temp-abort=90 hashes.txt rockyou.txt

Watch temperature and fan with hardware monitoring (it is on by default; do not pass the disable flag). Judge a card by its steady-state number after it has warmed up. If clocks are dropping under load, the fix is airflow and a fan curve, not a hashcat flag.

Multi-GPU and slow candidates

-d selects devices by index, so you can pin a job to specific cards or split work across them.

hashcat -I                                    # list device indices
hashcat -m 3200 -d 1,2 hashes.txt rockyou.txt # run on cards 1 and 2

-S switches to the slow-candidate path. Counterintuitively, for slow hashes like bcrypt this can be faster, because the bottleneck is the hash itself rather than candidate generation, and the slow path keeps the pipeline fed more efficiently. Benchmark both ways on your hash; do not assume.

Segmenting big attacks

A keyspace that takes a week on one box should be split. -s (skip) and -l (limit), also spelled --skip and --limit, carve a contiguous slice of the keyspace so you can hand each segment to a different machine.

hashcat -m 0 -a 3 hashes.txt ?a?a?a?a?a?a?a?a --keyspace   # total size
hashcat -m 0 -a 3 hashes.txt ?a?a?a?a?a?a?a?a -s 0           -l 5000000000
hashcat -m 0 -a 3 hashes.txt ?a?a?a?a?a?a?a?a -s 5000000000  -l 5000000000

Query --keyspace, divide by the number of workers, and dispatch the slices. This is hand-rolled distribution and it works for a handful of nodes. Past that, the overhead of coordinating by hand outweighs a real distributed setup.

The cloud rental math, honestly

Renting GPUs by the hour is genuinely worth it for a short burst on a fast hash. A few hours on rented hardware can replace days on your own card, and you pay only for the run. Where it stops making sense is sustained cracking of slow hashes. bcrypt does not care that you rented eight high-end cards; the cost factor caps your guesses per second per device, so you pay premium hourly rates to grind through a keyspace that the algorithm was specifically designed to make grinding through expensive. Rent for fast-hash bursts and short mask exhaustion. Do not rent a fleet to brute a well-configured KDF, because the math that protects the defender protects them against your rented fleet too.