Completed
on 7 Jan 2026, 6:56 pm

Output

🔍 SSE Disconnect Thrashing - Root Cause Analysis

❌ Problem Confirmed

The disconnect/reconnect thrashing issue is STILL happening despite commit b1c15b5b's attempted fix. The previous fix addressed some timing issues but missed the core architectural problem.

🐛 Root Causes Identified

1. Direct State Mutation (Bypassing State Machine)

Two functions directly modify state.connectionState outside the state machine:

  • startHeartbeat() line 597: state.connectionState = 'streaming'
  • reconnectWithEventSource() line 358: state.connectionState = 'polling'

This creates state inconsistency where the state machine thinks we're in one state but the variable says another.

2. EventSource Auto-Reconnect Loop

When konui restarts mid-turn:

  1. EventSource tries to connect to /api/konsole/dashboard/events
  2. Server returns 404 (request not found - it was cleared on restart)
  3. onerror fires with readyState === CLOSED
  4. Code calls pollForOutput() as fallback
  5. BUT EventSource also auto-reconnects internally
  6. New connection fails → step 3 repeats → INFINITE LOOP

3. Heartbeat vs Polling Race

When stream_end fires with hadOutput: false:

  • State transitions to polling
  • Heartbeat continues running (only stops on completeRequest)
  • Heartbeat skips disconnect check when connectionState === 'polling'
  • But if reconnectWithEventSource errors, state becomes inconsistent
  • Heartbeat then sees non-idle, non-polling state → dispatches disconnect

✅ Required Fixes

Fix File Priority
Remove direct state mutation in startHeartbeat() console-ui.ts:597 P0
Remove direct state mutation in reconnectWithEventSource() console-streaming.ts:358 P0
Add EventSource close on reconnect failure after N attempts console-streaming.ts:371 P0
Stop heartbeat when entering polling state (not just on completeRequest) console-sm-transitions.ts P1
Add reconnect attempt counter to prevent infinite retries console-streaming.ts P1

📊 Why Previous Fix Didn't Work

Commit b1c15b5b added:

  • ✓ Retry loop for kontask check (helps with persistence race)
  • ✓ Heartbeat skips polling state (helps with normal flow)
  • ✓ lastEventAt updates in EventSource handlers (helps with timing)

But it didn't address:

  • ✗ Direct state mutation bypassing state machine
  • ✗ EventSource auto-reconnect loop on permanent failure
  • ✗ Heartbeat continuing during reconnect attempts

🎯 Next Steps

Shall I implement the P0 fixes now? This will involve:

  1. Refactoring state mutations to use dispatch() exclusively
  2. Adding proper EventSource lifecycle management with max retry count
  3. Test the fix by restarting konui mid-turn

Quick Actions

Original Request

No layout configured

Details

Type General
Status Completed
Scope vibetools
Tags consolessebugcritical
Created 7 Jan 2026, 6:56 pm
Updated 7 Jan 2026, 6:56 pm
Created By claude

Raw Data

{
  "id": "73653c26-3067-4c7b-a23f-efd7846b24dd",
  "type": "general",
  "status": "completed",
  "title": "SSE Disconnect Thrashing - Root Cause Analysis",
  "description": "Comprehensive investigation of the disconnect/reconnect thrashing issue at turn completion",
  "context": {
    "output": "<div style=\"background:#1e1e1e;color:#e0e0e0;padding:20px;border-radius:8px;font-family:'Consolas','Monaco',monospace;\">\n<h2 style=\"font-family:sans-serif;color:#fff;margin:0 0 16px 0;\">🔍 SSE Disconnect Thrashing - Root Cause Analysis</h2>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\">\n<h3 style=\"color:#e5c07b;margin:0 0 12px 0;\">❌ Problem Confirmed</h3>\n<p style=\"color:#e0e0e0;margin:0;\">The disconnect/reconnect thrashing issue is STILL happening despite commit <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">b1c15b5b</code>'s attempted fix. The previous fix addressed some timing issues but missed the core architectural problem.</p>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\">\n<h3 style=\"color:#f44336;margin:0 0 12px 0;\">🐛 Root Causes Identified</h3>\n\n<h4 style=\"color:#56b6c2;margin:16px 0 8px 0;\">1. Direct State Mutation (Bypassing State Machine)</h4>\n<p style=\"color:#e0e0e0;\">Two functions directly modify <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">state.connectionState</code> outside the state machine:</p>\n<ul style=\"color:#98c379;margin:8px 0;padding-left:20px;line-height:1.8;\">\n<li><code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">startHeartbeat()</code> line 597: <code style=\"color:#e06c75;\">state.connectionState = 'streaming'</code></li>\n<li><code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">reconnectWithEventSource()</code> line 358: <code style=\"color:#e06c75;\">state.connectionState = 'polling'</code></li>\n</ul>\n<p style=\"color:#7f848e;\">This creates state inconsistency where the state machine thinks we're in one state but the variable says another.</p>\n\n<h4 style=\"color:#56b6c2;margin:16px 0 8px 0;\">2. EventSource Auto-Reconnect Loop</h4>\n<p style=\"color:#e0e0e0;\">When konui restarts mid-turn:</p>\n<ol style=\"color:#e0e0e0;margin:8px 0;padding-left:20px;line-height:1.8;\">\n<li>EventSource tries to connect to <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">/api/konsole/dashboard/events</code></li>\n<li>Server returns 404 (request not found - it was cleared on restart)</li>\n<li><code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">onerror</code> fires with <code style=\"color:#e06c75;\">readyState === CLOSED</code></li>\n<li>Code calls <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">pollForOutput()</code> as fallback</li>\n<li><span style=\"color:#f44336;font-weight:bold;\">BUT</span> EventSource also auto-reconnects internally</li>\n<li>New connection fails → step 3 repeats → <strong style=\"color:#ff9800;\">INFINITE LOOP</strong></li>\n</ol>\n\n<h4 style=\"color:#56b6c2;margin:16px 0 8px 0;\">3. Heartbeat vs Polling Race</h4>\n<p style=\"color:#e0e0e0;\">When <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">stream_end</code> fires with <code style=\"color:#e06c75;\">hadOutput: false</code>:</p>\n<ul style=\"color:#e0e0e0;margin:8px 0;padding-left:20px;line-height:1.8;\">\n<li>State transitions to <code style=\"color:#98c379;\">polling</code></li>\n<li>Heartbeat continues running (only stops on <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">completeRequest</code>)</li>\n<li>Heartbeat skips disconnect check when <code style=\"color:#98c379;\">connectionState === 'polling'</code> ✓</li>\n<li>But if <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">reconnectWithEventSource</code> errors, state becomes inconsistent</li>\n<li>Heartbeat then sees non-idle, non-polling state → dispatches <code style=\"color:#e06c75;\">disconnect</code></li>\n</ul>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\">\n<h3 style=\"color:#4caf50;margin:0 0 12px 0;\">✅ Required Fixes</h3>\n\n<table style=\"width:100%;border-collapse:collapse;margin:12px 0;\">\n<tr style=\"background:#3d3d3d;\">\n<th style=\"text-align:left;padding:10px;color:#fff;border-radius:4px 0 0 0;\">Fix</th>\n<th style=\"text-align:left;padding:10px;color:#fff;\">File</th>\n<th style=\"text-align:left;padding:10px;color:#fff;border-radius:0 4px 0 0;\">Priority</th>\n</tr>\n<tr>\n<td style=\"padding:10px;color:#e0e0e0;border-bottom:1px solid #444;\">Remove direct state mutation in <code>startHeartbeat()</code></td>\n<td style=\"padding:10px;color:#61afef;border-bottom:1px solid #444;\">console-ui.ts:597</td>\n<td style=\"padding:10px;border-bottom:1px solid #444;\"><span style=\"background:#f44336;color:#fff;padding:2px 8px;border-radius:4px;\">P0</span></td>\n</tr>\n<tr>\n<td style=\"padding:10px;color:#e0e0e0;border-bottom:1px solid #444;\">Remove direct state mutation in <code>reconnectWithEventSource()</code></td>\n<td style=\"padding:10px;color:#61afef;border-bottom:1px solid #444;\">console-streaming.ts:358</td>\n<td style=\"padding:10px;border-bottom:1px solid #444;\"><span style=\"background:#f44336;color:#fff;padding:2px 8px;border-radius:4px;\">P0</span></td>\n</tr>\n<tr>\n<td style=\"padding:10px;color:#e0e0e0;border-bottom:1px solid #444;\">Add EventSource close on reconnect failure after N attempts</td>\n<td style=\"padding:10px;color:#61afef;border-bottom:1px solid #444;\">console-streaming.ts:371</td>\n<td style=\"padding:10px;border-bottom:1px solid #444;\"><span style=\"background:#f44336;color:#fff;padding:2px 8px;border-radius:4px;\">P0</span></td>\n</tr>\n<tr>\n<td style=\"padding:10px;color:#e0e0e0;border-bottom:1px solid #444;\">Stop heartbeat when entering polling state (not just on completeRequest)</td>\n<td style=\"padding:10px;color:#61afef;border-bottom:1px solid #444;\">console-sm-transitions.ts</td>\n<td style=\"padding:10px;border-bottom:1px solid #444;\"><span style=\"background:#ff9800;color:#fff;padding:2px 8px;border-radius:4px;\">P1</span></td>\n</tr>\n<tr>\n<td style=\"padding:10px;color:#e0e0e0;\">Add reconnect attempt counter to prevent infinite retries</td>\n<td style=\"padding:10px;color:#61afef;\">console-streaming.ts</td>\n<td style=\"padding:10px;\"><span style=\"background:#ff9800;color:#fff;padding:2px 8px;border-radius:4px;\">P1</span></td>\n</tr>\n</table>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\">\n<h3 style=\"color:#9c27b0;margin:0 0 12px 0;\">📊 Why Previous Fix Didn't Work</h3>\n<p style=\"color:#e0e0e0;\">Commit <code style=\"background:#3d3d3d;padding:2px 6px;border-radius:4px;\">b1c15b5b</code> added:</p>\n<ul style=\"color:#98c379;margin:8px 0;padding-left:20px;line-height:1.8;\">\n<li>✓ Retry loop for kontask check (helps with persistence race)</li>\n<li>✓ Heartbeat skips polling state (helps with normal flow)</li>\n<li>✓ lastEventAt updates in EventSource handlers (helps with timing)</li>\n</ul>\n<p style=\"color:#7f848e;margin-top:12px;\">But it didn't address:</p>\n<ul style=\"color:#e06c75;margin:8px 0;padding-left:20px;line-height:1.8;\">\n<li>✗ Direct state mutation bypassing state machine</li>\n<li>✗ EventSource auto-reconnect loop on permanent failure</li>\n<li>✗ Heartbeat continuing during reconnect attempts</li>\n</ul>\n</div>\n\n<div style=\"background:#263238;padding:16px;border-radius:6px;\">\n<h3 style=\"color:#00bcd4;margin:0 0 12px 0;\">🎯 Next Steps</h3>\n<p style=\"color:#e0e0e0;\">Shall I implement the P0 fixes now? This will involve:</p>\n<ol style=\"color:#e0e0e0;margin:8px 0;padding-left:20px;line-height:1.8;\">\n<li>Refactoring state mutations to use dispatch() exclusively</li>\n<li>Adding proper EventSource lifecycle management with max retry count</li>\n<li>Test the fix by restarting konui mid-turn</li>\n</ol>\n</div>\n</div>",
    "choices": [
      {
        "label": "Implement fixes",
        "value": "Implement the P0 fixes for the SSE disconnect thrashing issue. Start with removing direct state mutations and adding EventSource lifecycle management.",
        "primary": true
      },
      {
        "label": "More analysis",
        "value": "Do more analysis on the disconnect issue before implementing fixes"
      },
      {
        "label": "Quick patch only",
        "value": "Apply a quick patch to just add max retry count to EventSource reconnects, without refactoring state management"
      }
    ],
    "requestedAt": "2026-01-07T10:45:00Z",
    "requestId": "40a31343-bcf5-42f9-9b2f-50a4f6f2f504"
  },
  "createdBy": "claude",
  "createdAt": "2026-01-07T08:56:17.924Z",
  "updatedAt": "2026-01-07T08:56:18.153Z",
  "requestId": "40a31343-bcf5-42f9-9b2f-50a4f6f2f504",
  "scope": "vibetools",
  "tags": [
    "console",
    "sse",
    "bug",
    "critical"
  ],
  "targetUser": "claude"
}
DashboardReportsKontasksFlowsDecisionsSessionsTelemetryLogs + Go