Tracking information about the Russian War against Ukraine

Ukraine is Europe !

Home

About

So I have a GPU problem....

Dec 3, 2025

Tech

Mastodon Threads Tumblr

This is just an estimation based on an average, development setup, without GPU :

Current Stats:

11 active channels
1,389 messages over 10 days of telegram_date (Nov 23 - Dec 3)
Processing rate: ~130-176 msg/hour (average ~135 msg/hr)
Spam rate: 7.5% (only 7.5% filtered as spam, 92.5% kept)
Average: 138 messages/day across all 11 channels = 12.5 msg/channel/day

Scaling to 200 channels from Feb 2022:

February 2022 to December 2025 = ~34 months = ~1,034 days
200 channels
Messages per channel per day: 12.5
Total messages: 200 × 12.5 × 1,034 = 2,585,000 messages (before spam filter)
After 7.5% spam filter: ~2,391,125 messages need LLM processing

Processing Time Calculation:

Current rate: ~135 messages/hour (this is with 1 processor worker)
Time at current rate: 2,391,125 / 135 = 17,712 hours = 738 days = ~2 years

But wait - the processing rate includes the LLM bottleneck of 22-26s per message.

Let me recalculate with the LLM being the constraint:

LLM takes ~24s per message (midpoint of 22-26s)
1 processor can do: 3600/24 = 150 messages/hour
That’s close to observed 135/hr (some overhead)

Scaling Options:

With 1 processor worker: 738 days
With 4 processor workers: 738/4 = 185 days = ~6 months
With 8 processor workers: 738/8 = 92 days = ~3 months

With 8 workers, the processing time drops significantly. However, Telegram’s rate limits introduce another complexity. The GetHistory API allows ~300 requests per minute, enabling about 100 messages per request. For 200 channels over 1,034 days, this means 26,000 requests. At 300 requests per minute, fetching would take roughly 87 minutes. Interestingly, the Telegram data retrieval is relatively quick compared to the LLM processing bottleneck.

● Excellent data! Now let me do the math:

📊 Backfill Scaling Calculation: 200 Channels from Feb 2022

★ Insight ───────────────────────────────────── This calculation reveals why LLM processing is the true bottleneck. Telegram fetching is fast (~300 requests/min), but the qwen2.5:3b model running on CPU takes ~24s per message for classification. This is the constraint that dominates scaling. ─────────────────────────────────────────────────

Current Baseline (from actual data):

Metric	Value
Active Channels	11
Messages (10 days)	1,389
Avg msgs/channel/day	12.6
Processing Rate	~135 msg/hr
Spam Filter Rate	7.5%
LLM Time per Message	~24 seconds

Scaled Scenario: 200 Channels, Feb 2022 → Dec 2025

Timeline: February 2022 → December 2025 = ~34 months = 1,034 days

Total Messages Estimate: 200 channels × 12.6 msg/channel/day × 1,034 days = 2,605,680 messages (raw)

After Spam Filter (7.5% removed): = 2,410,254 messages need LLM processing

Processing Time Estimates:

Scenario	Processor Workers	Time to Backfill
Current	1 worker	17,853 hours = 744 days = 2+ years
Scaled	2 workers	372 days = 1 year
Scaled	4 workers	186 days = 6 months
Practical	8 workers	93 days = 3 months
Maximum	16 workers	47 days = 6.5 weeks

Bottleneck Analysis:

Telegram Fetching: ~87 minutes (NOT a bottleneck)
- 26,000 API requests at 300/min
LLM Classification: THE BOTTLENECK
- 24 seconds × 2.4M messages = 16M seconds
- = 185 days with 1 worker
Database/Media: Fast (parallel, async)

Realistic Recommendation:

With 8 processor-workers (reasonable for your server): → ~3 months to complete full backfill

With GPU-accelerated LLM (RTX 3060 or better): → Could cut time by 5-10x → 2-4 weeks

★ Insight ───────────────────────────────────── Key optimizations to consider:

Scale workers: docker-compose up -d –scale processor-worker=8
GPU LLM: Switch Ollama to GPU mode (if hardware available) → 5-10x speedup
Skip old content: Many Feb 2022 messages may be less relevant - consider backfilling only last 6-12 months for most channels
Tiered approach: Backfill high-priority channels fully, others only recent history

Tracking information about the Russian War against Ukraine

Ukraine is Europe !

Tracking information about the Russian War against Ukraine

Ukraine is Europe !

So I have a GPU problem....

Reading

Spies, Lies, and Algorithms

Categories