NEURAL
SYNTHETIC DATA
FACTORY
Build, generate, and fine-tune Small Language Models entirely from your phone. The factory that makes AI smarter — mobile-first, self-healing, adversarial.
The NEURAL Flow
Five interconnected agents forming a self-sustaining loop. Seed → Generate → Validate → Fix → Deploy.
What NEURAL Does
Roadmap
Four quarters. From self-healing pipeline to scaled B2B infrastructure.
Generate Synthetic Data
Configure your data generation parameters. Vokryl creates, Vigilis validates, DevMasiha fixes.
Live Terminal
Pipeline Architecture
Full-stack view of the NEURAL self-healing data pipeline across KRYV infrastructure.
Model Registry
Track your fine-tuned Small Language Models. Each is a specialized expert forged from synthetic data.
LLMs Guide
Large Language Models (LLMs)
Large Language Models are neural networks trained on massive text datasets to understand and generate human language. They are the foundation from which we distill Small Language Models (SLMs).
What they are
LLMs like GPT-4, Claude, and Llama-3 contain billions of parameters — numerical weights learned during training. These weights encode statistical patterns across language, enabling the model to predict the next token in a sequence.
Transformer Architecture
All modern LLMs use the Transformer architecture (2017). Key components:
- Self-Attention: Each token attends to every other token, capturing long-range dependencies
- Multi-Head Attention: Multiple attention heads learn different relationship patterns simultaneously
- Feed-Forward Layers: Dense networks that transform attended representations
- Positional Encoding: Injects position information since Transformers have no inherent sequence order
Key Models for NEURAL
These LLMs are used as teachers to generate synthetic data via API:
- GPT-4o: Best quality generation, highest cost ($5/1M tokens). Use for seed data
- Claude-3 Sonnet: Excellent instruction following, good at structured JSON output
- Llama-3-70B: Open-source, run on Oracle VM, zero cost after setup
- Mistral-8x7B (MoE): Fast inference via Cloudflare AI Workers
import openai
# Generate synthetic legal data using GPT-4
client = openai.OpenAI(api_key="your-key")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "Generate 10 legal Q&A pairs in JSONL format"
}],
response_format={"type": "json_object"}
)
data = response.choices[0].message.content
Prompting for Data Generation
The quality of synthetic data depends heavily on your prompt engineering. Use these techniques:
- Few-shot examples: Provide 2-3 examples of desired output format before asking for generation
- Persona assignment: "You are a senior lawyer answering a client's contract question"
- Adversarial injection: "Include one subtly incorrect answer that a model must learn to reject"
- JSON schema enforcement: Always request structured output with explicit schema
Multi-Model Distillation Strategy
Run Llama-3 and Mistral in parallel on Cloudflare AI Workers. Compare outputs and use Genesis to select the highest-quality response for the final dataset. This "best-of-N" strategy dramatically improves data quality without increasing per-record cost.
Cost & Scale
To generate 100,000 training records using GPT-4o at 500 tokens per record:
- Input: ~50M tokens → $250
- Output: ~50M tokens → $750
- Total: ~$1,000 for 100K premium records
- Alternative: Use free Llama-3-70B on Oracle VM → $0
A fine-tuned SLM trained on these records can then replace GPT-4 calls for a client, saving them $10,000+/month. That's the business model.
SLMs Guide
Small Language Models (SLMs)
SLMs are the core product of the NEURAL factory. They are fine-tuned versions of open-source base models, compressed and specialized for a specific domain — outperforming GPT-4 at their task while costing 90% less to run.
Why SLMs beat LLMs for B2B
- Cost: Run a 1B model for $0.002/hour on a VPS vs. $0.05/1K tokens for GPT-4
- Latency: 45ms response vs. 2-3 seconds for hosted LLMs
- Privacy: Runs on-premise — no data ever leaves the client's server
- Accuracy: A 1B model trained on 50K domain-specific examples beats a 70B general model on that task
Fine-Tuning Process
Fine-tuning takes a base model and adapts its weights using your domain-specific data. We use Parameter-Efficient Fine-Tuning (PEFT) — specifically QLoRA — to do this without massive GPU requirements.
- Step 1: Generate 10K-100K synthetic training pairs via NEURAL pipeline
- Step 2: Validate and clean with Vigilis (target: >95% pass rate)
- Step 3: Format data as Alpaca or ShareGPT JSONL
- Step 4: Run QLoRA training on Oracle ARM VM (4 vCPU / 24GB)
- Step 5: Merge LoRA adapters back into base model
- Step 6: Quantize to GGUF (4-bit) for deployment
- Step 7: Serve via llama.cpp or Ollama on Cloudflare
Data Formats
Training data must be in a specific format. NEURAL generates all formats:
# Alpaca Format (instruction tuning)
{
"instruction": "Analyze this contract clause for red flags",
"input": "Party A agrees to indemnify Party B for all losses...",
"output": "⚠️ Red flag: Unlimited indemnification clause detected..."
}
# ShareGPT Format (conversation tuning)
{
"conversations": [
{"from": "human", "value": "What does force majeure mean?"},
{"from": "gpt", "value": "Force majeure refers to..."}
]
}
QLoRA Training on Oracle VM
QLoRA (Quantized Low-Rank Adaptation) allows fine-tuning on consumer hardware. The Oracle Free ARM VM (24GB RAM) is sufficient for models up to 7B parameters.
# Install dependencies (run on Oracle VM via SSH)
pip install transformers peft trl bitsandbytes
# training_script.py
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-1B",
load_in_4bit=True
)
lora_config = LoraConfig(
r=16, lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05
)
model = get_peft_model(model, lora_config)
trainer = SFTTrainer(model=model, train_dataset=dataset)
trainer.train()
Deployment via Cloudflare
After training, quantize the model and serve it:
# Convert to GGUF (4-bit quantized)
python convert.py --model legal-auditor-1b --outtype q4_k_m
# Serve with llama.cpp HTTP server
./llama-server -m legal-auditor-1b.Q4_K_M.gguf \
--port 8080 --host 0.0.0.0 -n 512
# Cloudflare Worker proxies requests to Oracle VM
# Add your Oracle VM IP to Cloudflare Tunnel
Business Model
- API Access (B2B): $499/month — unlimited calls to your SLM via Cloudflare
- On-Premise License: $50,000/year — model weights delivered, runs on their servers
- Custom Forging: $5,000 flat — client provides 50 examples, you return a fine-tuned model in 48h
- Marketplace (Kriyex): $10-99/month — rent specialized agents per seat