AB Lite Training Progress Report

This monthly progress report details AB Lite model training achievements, challenges, and next steps for May 2026. All metrics are from internal testing on our local GPU infrastructure.

Training Overview

Architecture Updates

Parameters: 7.2B (optimized from 8.1B April configuration)
Context Window: 32K tokens (increased from 16K)
Attention Mechanism: Flash Attention 3 implementation
MoE Layers: 4 experts with 1.8B active parameters per forward pass
Quantization: INT8 inference with FP16 training precision

Dataset Composition

Malaysian Text: 45% (news, government documents, educational content)
Technical Documentation: 20% (programming, DevOps, AI/ML papers)
Conversational Data: 15% (customer service, forum discussions)
Code Samples: 12% (Python, JavaScript, TypeScript, Go)
Multilingual: 8% (English, Malay, Chinese, Tamil)

Benchmark Results

Performance Metrics (May 2026)

Benchmark	April 2026	May 2026	Improvement
MMLU (General Knowledge)	68.4%	72.1%	+3.7%
HumanEval (Code Generation)	61.2%	65.8%	+4.6%
GSM8K (Mathematical Reasoning)	58.9%	63.4%	+4.5%
Malay Language Understanding	74.3%	79.6%	+5.3%
Context Retention (32K)	82.1%	86.7%	+4.6%

Inference Performance

Latency (P50): 45ms per token
Latency (P95): 78ms per token
Throughput: 120 tokens/sec on single A100 80GB
Memory Usage: 14.2GB VRAM (INT8 quantized)
Batch Size: 32 concurrent requests without degradation

Training Infrastructure

Hardware Configuration

GPU Cluster: 8x NVIDIA A100 80GB
Interconnect: NVLink 4.0 for GPU-to-GPU communication
Storage: 50TB NVMe SSD for dataset and checkpoint storage
Network: 100Gbps InfiniBand for distributed training
Power: Dedicated 50kW circuit with UPS backup

Training Pipeline

Framework: PyTorch 2.4 with FSDP (Fully Sharded Data Parallel)
Optimizer: AdamW with cosine learning rate schedule
Batch Size: 2M tokens per training step
Training Steps: 145,000 completed (target: 200,000)
Checkpoint Frequency: Every 5,000 steps
Training Time: 28 days for current progress

Key Improvements This Month

Architecture Optimization

Reduced parameter count by 11% while maintaining performance
Implemented Flash Attention 3 for 25% faster training
Optimized MoE routing for better expert utilization
Added rotary positional embeddings for improved long-context handling

Dataset Enhancements

Added 2.3M new Malaysian documents from verified sources
Improved code dataset with 450K additional samples
Enhanced multilingual coverage with Tamil educational content
Removed 850K low-quality samples identified by automated filtering

Training Stability

Gradient clipping improved training stability by 40%
Learning rate warmup extended to 10,000 steps
Checkpoint validation reduced overfitting indicators
Distributed training synchronization optimized for 8-GPU setup

Challenges & Solutions

Challenge 1: Memory Constraints

Issue: 32K context window increased memory requirements beyond A100 capacity.

Solution: Implemented activation checkpointing and gradient accumulation to reduce peak memory usage by 35%.

Challenge 2: Malay Language Performance

Issue: Initial benchmarks showed 12% gap between English and Malay understanding.

Solution: Augmented training dataset with 1.8M additional Malay samples, implemented language-balanced batching.

Challenge 3: Code Generation Accuracy

Issue: HumanEval scores plateaued at 61% for three consecutive weeks.

Solution: Introduced curriculum learning with progressive code complexity, added execution feedback loop.

June 2026 Roadmap

Training Milestones

Complete 200,000 training steps (target: June 25)
Achieve 75% MMLU score
Reach 70% HumanEval score
Improve Malay understanding to 82%

Infrastructure Upgrades

Integrate 2 additional A100 GPUs for faster training
Upgrade storage to 100TB NVMe array
Implement automated hyperparameter optimization
Deploy monitoring dashboard for real-time training metrics

Evaluation & Testing

Third-party benchmark validation by independent research partner
User acceptance testing with 50 beta testers
Security audit for model weight integrity and data leakage
Compliance review against Malaysian AI ethics guidelines

Transparency & Open Research

AI Bradaa commits to transparent model development:

Monthly progress reports published on this blog
Training logs and metrics available to research partners
Open-source evaluation scripts on GitHub
Collaboration opportunities with academic institutions

Get Involved

We welcome collaboration on AB Lite development:

Dataset Contributions: Submit high-quality Malaysian text samples
Benchmark Testing: Help evaluate model performance on domain-specific tasks
Research Partnerships: Collaborate on model architecture and training methodologies
Beta Testing: Apply for early access to AB Lite API

Conclusion

May 2026 delivered significant progress on AB Lite training with consistent improvements across all benchmarks. Architecture optimizations, dataset enhancements, and infrastructure upgrades position us well for June milestones. Our commitment to transparent development and sovereign AI principles continues to guide the project.

Next progress report scheduled for June 18, 2026. Subscribe to our newsletter for updates.