Tracks perf of LLMs, VLMs and agents on web navigation tasks
Transform images based on text instructions