mxfp4 corrupted?

by tarruda - opened 10 days ago

10 days ago

% llama-server --no-mmap --no-warmup --model ./gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2 
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.011 sec
ggml_metal_device_init: GPU name:   Apple M1 Ultra
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 134217.73 MB
main: setting n_parallel = 4 and kv_unified = true (add -kvu to disable this)
build: 7193 (d82b7a7c1) with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin24.0.0
system info: n_threads = 16, n_threads_batch = 16, total_threads = 20

system_info: n_threads = 16 (n_threads_batch = 16) / 20 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | REPACK = 1 | 

init: using 19 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model './gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2'
llama_model_load_from_file_impl: using device Metal (Apple M1 Ultra) (unknown id) - 127999 MiB free
llama_model_load: error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model './gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2', try reducing --n-gpu-layers if you're running out of VRAM
srv    load_model: failed to load model, './gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

sha256sum:

% sha256sum gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.*
5160a84eaf7042d5a4851a7e04cb82f5c8e444fef03b5012ad6d3ab76f415408  gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2
44f8bf3142d5ba6daf1a0861fa93ffad02b99ef607877bdfcb9e36b39f145a0c  gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part2of2

iwis

10 days ago

🥲 Failed to load the model

Failed to load model

error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete

iwis

10 days ago

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
2025-11-30 19:40:53 [DEBUG]
CUDA : ARCHS = 750,800,890,900,1000,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
2025-11-30 19:40:53 [DEBUG]
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4070 Laptop GPU) (0000:01:00.0) - 7056 MiB free
llama_model_loader: ------------------------ Adding override for key 'gpt-oss.expert_used_count'
2025-11-30 19:40:53 [DEBUG]
llama_model_load: error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'D:\Models\GGUF\GGUFs\gpt-oss-120b-Derestricted.i1-MXFP4_MOE-00001-of-00002.gguf', try reducing --n-gpu-layers if you're running out of VRAM
2025-11-30 19:40:53 [DEBUG]
lmstudio-llama-cpp: failed to load model. Error: error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
Select a model to configure it

gghfez

10 days ago

It's fine, you've got to cat the 2 files together.

tarruda

10 days ago

@gghfez thanks

Yes, all I needed to do is:

cat gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2 gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part2of2 > gpt-oss-120b-Derestricted.MXFP4_MOE.gguf

tarruda changed discussion status to closed 10 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment