Out of the box, LLMs are most useful at granular tasks: summarizing a paper, drafting a post, outlining a story, translating a text, etc. Broader tasks — summarizing a whole book, for example — can prove challenging. The more text there is, the harder it is for LLMs to see the forest for the trees. But the forest is what we care about in information architecture.
There are techniques for working around this limitation. I’ve written about my experiments with using RAG with knowledge graphs. That approach showed promise, but has downsides. For one, it’s expensive (in terms of cost and energy.) It’s also time-consuming. But more importantly, it’s produced mixed results for IA use cases.
There are two primary tasks I want help with:
- understanding large unstructured information sets during early project stages, and
- producing draft taxonomies to organize large unstructured information sets.
In theory, graph RAG should help with both tasks. In practice, it’s been more useful for the first than the second. Whenever I prompt a graph RAG instance for suggestions on how to organize a corpus, I get back echoes of the graph’s structure. That’s not what I want.
Recently, I experimented with another approach that shows promise. It entails using a tool that isn’t designed for this purpose: Claude Code.
What is Claude Code?
Claude Code is a tool built by Anthropic to assist software developers. It uses Anthropic’s LLMs to create and modify software code on your behalf. You can think of it as an “agentic” coding assistant — that is, one that can do things to and with your code directly.
To do this, you give the tool access to a code repository on your computer. You interact with it via the command line. When you first call the agent on a folder containing software code, it ingests and indexes the code. It develops an internal representation of your codebase’s architecture, allowing it to work on the codebase as a whole — the forest view.
Claude Code provides a prompt-driven user interface. That is, you chat with it like you do with ChatGPT or the web-based version of Claude. But there are a few differences. For one, your codebase — and a representation of its structure — is pulled into the conversation context. With your permission, Claude Code also can modify the code. For example, it can fix bugs and run tests to validate whether they worked or not.
These aren’t things a regular chatbot can do. While you can tell the web version of Claude about your code by creating a project and uploading files or pointing it to a repo, regular Claude can only read those files into its context. In contrast, Claude Code can also modify the files. More importantly for our use case, Claude Code also structures internal representations of the codebase in ways optimized for development.
Use Case: Generating Draft Taxonomies
I hope you see where this is going. Code is just text, right? What if instead of (say) Python, we fed Claude Code the content of a website?
I tried it recently with jarango.com, and the results are promising. This site’s content is managed using Jekyll, a static site generator. Content is stored as plain text files (using Markdown markup) and processed into HTML at render time, which in my case happens either manually (for testing) or when I commit changes to a GitHub repository (for production.)
That’s a major advantage in this scenario: I can call up Claude Code in the folder on my computer where the site’s content lives and have it treat the Markdown files as “software code.” After the initial ingestion, I can prompt Claude Code on the entire corpus, much like I can using graph RAG — but with certain superpowers.
(If you’re using a dynamic CMS such as WordPress or Drupal, you’ll have to first export the site’s content as plain text files. There are Unix utilities that can assist with that process. I’ll leave it up to you to figure out how — but note LLMs are useful in this scenario as well: they know a lot about shell scripting!)
Back to the experiment. The initial ingestion process was very fast. I wondered whether Claude understood what it was dealing with, so my initial prompt was intentionally vague:
What is this?
Claude’s response was spot-on:
This is Jorge Arango’s personal website and blog, built with Jekyll. It’s a content-focused site featuring:
- Blog posts about information architecture, UX design, and systems thinking (hundreds of posts since 2002)
- “The Informed Life” podcast exploring how people organize information
- Speaking and workshop information
- Books section (likely featuring his published works)
- Services for consulting/workshops
The site uses Jekyll as a static site generator with standard Jekyll structure (_posts, _layouts, _includes, etc.) and appears to focus on information architecture, design strategy, and digital product thinking.
This gave me enough confidence to get to the point:
Ok, good. Look through the posts in the site. I’m considering re-architecting this content to make it easier for users to find. What taxonomies could I use to describe this content?
This was Claude’s response:

Pretty good for a first try! Note this took a little over three minutes and close to 64,000 tokens — much faster and more efficient than graph RAG. But, critically, it’s also more useful than any result I’ve gotten from graph RAG.
Claude suggested a primary hybrid content type + subject domain taxonomy plus two alternatives: an audience-centered taxonomy and one centered on the content format. It also provided useful terms for all three and alternative navigation paths.
This is impressive, given that categorizing content from the top-down isn’t the tool’s primary use case. But still, this was a fairly generic request: I asked for a way to make the content easier to find but didn’t specify who would be finding content on the site. That was the focus of my next prompt:
This is good, thank you. Recently, I’ve started pivoting away from IA Practitioners and Designers toward business leaders more broadly. Also, I’ve started writing more about artificial intelligence. (Notice my new tagline: “Architecting Intelligence” — I’m arguing IA is essential for AI.) How would you change your proposed categorization schemes to support these new directions?
Again, its response exceeded my expectations:

In this instance, Claude recommended organizing the site according to a “Business-Focused + AI Forward” taxonomy that includes the following terms:
- Architecting Intelligence - AI/IA intersection, intelligent systems design
- Business Strategy - Strategic value of information architecture for leaders
- Systems Leadership - Systems thinking for executives and decision-makers
- Learning & Insights - Books, observations, thought leadership
- Methods & Frameworks - Practical tools for business application
- Reflections - Quick insights and commentary
Claude also provided alternative organization schemes and “content positioning” adjustments based on the strategic shift that is prompting this ostensible website redesign.
Now, would I use these new categorization schemes as-is? No, there’s a lot of tweaking to be done. I could also be much more precise with my prompting. For example, I haven’t told the LLM enough about the intended audience; it’s guessing about what I might mean by “business leaders” as the intended audience. But these suggestions provide a solid starting point for an IA design process — certainly better than I would’ve been able to come up with myself in under ten minutes.
There are also likely limitations. At close to 1,300 pages, jarango.com isn’t a particularly large site. I wonder how well this approach would scale to a site ten times that size. And of course, I’m dealing exclusively with text files. How would this work in a site with lots of PDFs? I expect that would require converting them to text, a process that yields mixed results.
Future Directions
It’s still early days, and there’s lots of experimentation to be done. But I’m encouraged by these first tries. Claude Code is much easier to use than GraphRAG, the tool I used for my experiments with graph RAG last year. It’s also faster and more energy- and cost-efficient.
Claude Code also has another advantage over GraphRAG: it can operate on the code itself. Although I haven’t tried it, I imagine it’d be possible to have Claude Code modify the site’s code, content, and metadata. (If you’re going to experiment with this approach, back up your code. There is no undo!)
But even if limited to read-only queries, Claude Code is an incredibly powerful tool in the information architect’s toolbox. Operating on the entire corpus (rather than individual documents) is the focus of my in-development AI product. The focus of that tool is precise control. But for “quick and dirty” operations, software development agents like Claude Code might be enough — given that you can access your content as plain text.