Skip to main content
Version: v0.2.23

inline::meta-reference

Description​

Meta's reference implementation of inference with support for various model formats and optimization techniques.

Configuration​

FieldTypeRequiredDefaultDescription
modelstr | NoneNo
torch_seedint | NoneNo
max_seq_len<class 'int'>No4096
max_batch_size<class 'int'>No1
model_parallel_sizeint | NoneNo
create_distributed_process_group<class 'bool'>NoTrue
checkpoint_dirstr | NoneNo
quantizationBf16QuantizationConfig | Fp8QuantizationConfig | Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type'No

Sample Configuration​

model: Llama3.2-3B-Instruct
checkpoint_dir: ${env.CHECKPOINT_DIR:=null}
quantization:
type: ${env.QUANTIZATION_TYPE:=bf16}
model_parallel_size: ${env.MODEL_PARALLEL_SIZE:=0}
max_batch_size: ${env.MAX_BATCH_SIZE:=1}
max_seq_len: ${env.MAX_SEQ_LEN:=4096}