Documentation

Supported File Types

Artifact supports the following file types for parsing documents and building catalogs (RAG):

  • .pdf - Portable Document Format
  • .doc - Microsoft Word 97-2003
  • .docx - Microsoft Word 2007-2019
  • .txt - Plain text
  • .md - Markdown
  • .html - HTML document
  • .ppt - Microsoft PowerPoint 97-2003
  • .pptx - Microsoft PowerPoint 2007-2019
  • .xls - Microsoft Excel 97-2003
  • .xlsx - Microsoft Excel 2007-2019
  • .csv - Comma-separated values

📘

Max file size: 512 MB per file

Parsing Behavior

  • All supported formats are converted into clean, structured Markdown for use with LLMs — except .txt files, which are extracted as plain text.
  • Scanned and handwritten PDFs are supported with high-quality visual understanding.