Delegated inference
Perform peer-to-peer inference delegation out-of-the-box via Holepunch stack, enabling resource sharing.
Overview
Lets a consumer delegate inference requests to a remote provider in a P2P manner over the Hyperswarm DHT. Use it when an inference requires more resources than the local device is able to provide.
Connectivity is direct: the consumer opens a dht.connect(providerPublicKey) connection straight to the provider — there is no topic or discovery phase. The provider exposes itself on the DHT via its keypair, and consumers connect by public key.
Delegation is configured at model-load time by passing a delegate object to loadModel(). The provider is started separately using startQVACProvider().
Functions
Provider:
startQVACProvider()— bind the DHT server on the provider's keypair and start accepting delegated requestsstopQVACProvider()— stop accepting requests
Consumer:
loadModel()— withdelegateoptioncompletion()/transcribe()/translate()/ etc. — same as localunloadModel()
For how to use each function, see SDK — API reference.
Provider
Binds the DHT server on its keypair and serves delegated requests. It publishes its public key; consumers use that key to connect directly.
Consumer
Creates a delegated model via loadModel({ delegate: ... }).
delegate main options:
providerPublicKey: provider public key (required)timeout: request timeout in ms (optional). Use a generous value (e.g.60_000) on the first call — cold-DHT bootstrap can take 15–45s. Subsequent calls reuse the open socket and are sub-second.fallbackToLocal: iftrue, run locally when delegation fails (optional)forceNewConnection: iftrue, do not reuse cached connections (optional)
Examples
Consumer
The following script shows an example of a consumer that delegates completion() requests to a provider:
import { completion, LLAMA_3_2_1B_INST_Q4_0, loadModel, close, } from "@qvac/sdk";
const providerPublicKey = process.argv[2];
if (!providerPublicKey) {
console.error("❌ Provider public key is required. Usage: node consumer.ts <provider-public-key> [consumer-seed]");
process.exit(1);
}
try {
// Optional: Consumer seed for deterministic consumer identity (for firewall testing)
const consumerSeed = process.argv[3];
process.env["QVAC_HYPERSWARM_SEED"] = consumerSeed;
console.log(`🚀 Testing delegated inference`);
console.log(`🔑 Provider: ${providerPublicKey}`);
if (consumerSeed) {
console.log(`🔑 Consumer seed: ${consumerSeed.substring(0, 16)}... (deterministic identity)`);
}
else {
console.log(`🎲 No consumer seed provided (random identity)`);
}
const modelId = await loadModel({
modelSrc: LLAMA_3_2_1B_INST_Q4_0,
modelType: "llm",
delegate: {
providerPublicKey,
// Generous timeout for the first call on a cold DHT: bootstrapping
// hyperdht and looking up the provider's key can take 15–45s on the
// very first run. Subsequent connections in the same process are
// sub-second because the DHT is already warm.
timeout: 60_000,
fallbackToLocal: true, // Optional: Fall back to local inference if delegation fails
// forceNewConnection: true, // Optional: Force a new connection instead of reusing cached one
},
onProgress: (progress) => {
console.log(`📊 Download progress: ${progress.percentage.toFixed(1)}% (${progress.downloaded}/${progress.total} bytes)`);
},
});
console.log(`✅ Delegated model registered: ${modelId}`);
const response = completion({
modelId,
history: [{ role: "user", content: "Hello!" }],
stream: true,
});
for await (const token of response.tokenStream) {
console.log(`📨 Response: ${token}`);
}
console.log("🔍 Stats:", await response.stats);
console.log("\n🎯 Delegation infrastructure working! Server correctly detected and routed the delegated request.");
void close();
}
catch (error) {
console.error("❌ Error:", error);
process.exit(1);
}Provider
The following script shows an example of starting a provider and printing its publicKey for consumers:
import { startQVACProvider } from "@qvac/sdk";
// Optional: Seed for deterministic provider identity (64-character hex string)
const seed = process.argv[2];
if (seed) {
process.env["QVAC_HYPERSWARM_SEED"] = seed;
}
// Optional: Consumer public key for firewall (allow only this consumer)
const allowedConsumerPublicKey = process.argv[3];
console.log(`🚀 Starting provider service...`);
try {
if (allowedConsumerPublicKey) {
console.log(`🔒 Firewall enabled: only allowing consumer ${allowedConsumerPublicKey}`);
}
const response = await startQVACProvider({
firewall: allowedConsumerPublicKey
? {
mode: "allow",
publicKeys: [allowedConsumerPublicKey],
}
: undefined,
});
console.log("✅ Provider service started successfully!");
console.log("🔗 Provider is now available for delegated inference requests");
console.log("");
console.log("📋 Connection Details:");
console.log(` 🆔 Provider Public Key: ${response.publicKey}`);
console.log("");
console.log("💡 Consumer command:");
console.log(` node consumer.ts ${response.publicKey}`);
console.log("");
console.log("💡 To reproduce this provider identity:");
console.log(` node provider.ts ${seed || "<random-seed>"}`);
if (!seed) {
console.log(" (Note: seed was random this time, set one for reproducible identity)");
}
console.log("");
console.log("🔒 For firewall testing:");
console.log(" 1. Generate a consumer seed (64-char hex)");
console.log(" 2. Get consumer public key: getConsumerPublicKey(consumerSeed)");
console.log(" 3. Restart provider with consumer public key as 2nd argument");
console.log(` 4. Run consumer with: node consumer.ts ${response.publicKey} <consumer-seed>`);
console.log("📡 Provider is running... Press Ctrl+C to stop");
process.on("SIGINT", () => {
console.log("\n🛑 Provider service stopped");
process.exit(0);
});
process.stdin.resume();
}
catch (error) {
console.error("❌ Error:", error);
process.exit(1);
}Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.
Notes
- Consumers do not handle reconnection automatically yet. If the provider restarts, restart the consumer.
- To stop a running provider, call
stopQVACProvider(). - When starting the provider, you can optionally set a firewall rule to allow/deny specific consumer public keys.
- Cold-start DHT bootstrap on the first connect can take 15–45s; subsequent connections in the same process are sub-second.