Paper by Thibault Schrepel: “A digital brain, as coined by Andrej Karpathy, is a personal knowledge infrastructure built from documents you trust. It maps connections between sources, surfaces patterns and inconsistencies. It generates answers on demand with references to the underlying material. The more it is used, the more connections it builds. The output is a private, queryable Wikipedia. On demand, the system generates wiki pages on any theme covered by the corpus, and each new page feeds back into the knowledge base.
This guide documents a pipeline for building such a system from any document corpus, for research purposes. The pipeline adapts Karpathy’s methodology and adds six research-specific contributions. (1) A schema design procedure encodes authority hierarchies and surfaces gaps in the literature. (2) A centrality-weighted wiki generation procedure anchors each article around the most-connected sources. (3) A six-step research protocol produces hypotheses rather than retrieved information. (4) Claim-level extraction moves the unit of analysis from documents to propositions, which makes visible the incompatibilities that document-level graphs hide. (5) A persistent hypothesis register stores every query-generated conjecture and re-tests it as the corpus grows. (6) A complexity-theoretic diagnostic layer measures the graph’s network properties and reports how the field is structured. The pipeline is field-neutral. The two implementations documented here, one academic research corpus and one European Commission decisions dataset, are illustrative examples…(More)”.