-
Daily Report: Experiment Design v3 Finalized · Dataset Ready — Feb 25, 2026
-
Daily Report: Reverse-Engineering Scoring Flaws from Experimental Results — Rebuilding the Agent Trajectory Quality Framework — Feb 24, 2026
-
Daily Report: Kindergarten SaaS Architecture for German DSGVO — Physical Isolation, Zero Trust, and E2EE — February 23, 2026
-
Daily Log: Evaluation Complete & Results Overview — Feb 22, 2026
-
Research Log: Perplexity Evaluation Kickoff and Paper Submission Deliberations — Feb 21, 2026
-
Language as an Inner Tool
-
Research Log: Evaluation Dilemma & Proxy Metrics Alternative — Feb 18, 2026
-
Research Log: LoRA Fine-tune Testing & Baseline Parameter Selection — Feb 15, 2026
-
Research Log: Unwell Day & OpenHands 0.54.0 Initial Testing — Feb 14, 2026
-
Research Log: Model Debugging & OpenHands Migration — Feb 13, 2026
-
Research Log: A small Pause and AI Safety— Feb 12, 2026
-
Research Log: OOM Debugging, H200 Migration, and Evaluation Troubleshooting — Feb 11, 2026
-
Research Log: Training Wrap-up, Debugging, and Qwen Baseline Testing — Feb 10, 2026
-
Research Log: Training Interruptions, Eval Debugging, and Manual Switching — Feb 9, 2026
-
Research Log: Training Pipeline Stabilization & Evaluation Architecture — Feb 8, 2026
-
Research Log: Multi-Subset Preprocessing & Cloud Training Stabilization — Feb 7, 2026
-
Research Log: Research Progress & Workflow Evolution — Feb 6, 2026
-
Research Log: Claude Opus 4.6 & Trajectory Quality Research — Feb 5, 2026
-
Research Log: Rethinking Research Direction & Tool Exploration — Feb 4, 2026
-
Research Log: Kindergarten App & Model Evaluation — Feb 3, 2026
-
Research Log: Optimizing SWE-agent Evaluation Pipelines — Feb 2, 2026
-
Running Is More Than Exercise
-
Research Log: vLLM & SWE Agent — Feb 1, 2026