mxfp4 corrupted?

#1
by tarruda - opened
% llama-server --no-mmap --no-warmup --model ./gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2 
ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 0.011 sec
ggml_metal_device_init: GPU name:   Apple M1 Ultra
ggml_metal_device_init: GPU family: MTLGPUFamilyApple7  (1007)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal3  (5001)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = false
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 134217.73 MB
main: setting n_parallel = 4 and kv_unified = true (add -kvu to disable this)
build: 7193 (d82b7a7c1) with Apple clang version 16.0.0 (clang-1600.0.26.3) for arm64-apple-darwin24.0.0
system info: n_threads = 16, n_threads_batch = 16, total_threads = 20

system_info: n_threads = 16 (n_threads_batch = 16) / 20 | Metal : EMBED_LIBRARY = 1 | CPU : NEON = 1 | ARM_FMA = 1 | FP16_VA = 1 | DOTPROD = 1 | LLAMAFILE = 1 | ACCELERATE = 1 | REPACK = 1 | 

init: using 19 threads for HTTP server
start: binding port with default address family
main: loading model
srv    load_model: loading model './gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2'
llama_model_load_from_file_impl: using device Metal (Apple M1 Ultra) (unknown id) - 127999 MiB free
llama_model_load: error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model './gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2', try reducing --n-gpu-layers if you're running out of VRAM
srv    load_model: failed to load model, './gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

sha256sum:

% sha256sum gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.*
5160a84eaf7042d5a4851a7e04cb82f5c8e444fef03b5012ad6d3ab76f415408  gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2
44f8bf3142d5ba6daf1a0861fa93ffad02b99ef607877bdfcb9e36b39f145a0c  gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part2of2
πŸ₯² Failed to load the model

Failed to load model

error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
2025-11-30 19:40:53 [DEBUG]
CUDA : ARCHS = 750,800,890,900,1000,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
2025-11-30 19:40:53 [DEBUG]
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4070 Laptop GPU) (0000:01:00.0) - 7056 MiB free
llama_model_loader: ------------------------ Adding override for key 'gpt-oss.expert_used_count'
2025-11-30 19:40:53 [DEBUG]
llama_model_load: error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'D:\Models\GGUF\GGUFs\gpt-oss-120b-Derestricted.i1-MXFP4_MOE-00001-of-00002.gguf', try reducing --n-gpu-layers if you're running out of VRAM
2025-11-30 19:40:53 [DEBUG]
lmstudio-llama-cpp: failed to load model. Error: error loading model: tensor 'blk.17.ffn_up_exps.weight' data is not within the file bounds, model is corrupted or incomplete
Select a model to configure it

It's fine, you've got to cat the 2 files together.

@gghfez thanks

Yes, all I needed to do is:

cat gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part1of2 gpt-oss-120b-Derestricted.MXFP4_MOE.gguf.part2of2 > gpt-oss-120b-Derestricted.MXFP4_MOE.gguf
tarruda changed discussion status to closed

Sign up or log in to comment