Documentation

Manage Files

This page shows you how to manage files within a Catalog, including viewing their processing status, file UIDs, and the corresponding single source-of-truth stored as a markdown text file, deleting a file.

Manage Catalog Files via API

List Files

export INSTILL_API_TOKEN=********

curl -X GET 'HOST_URL/v1alpha/namespaces/NAMESPACE_ID/catalogs/CATALOG_ID/files' \
--header "Authorization: Bearer $INSTILL_API_TOKEN" \
--header "Content-Type: application/json"
from instill.clients import init_artifact_client

artifact = init_artifact_client(api_token="INSTILL_API_TOKEN", url="HOST_URL")
artifact.list_catalog_files(namespace_id="NAMESPACE_ID", catalog_id="CATALOG_ID")
artifact.close()

Optional Query Parameters

You can use the following optional query parameters to customize your request:

  • pageSize (integer): Specifies the number of results per page. The default is 10, and the maximum is 100.
  • pageToken (string): Provides the token for retrieving the next page of results. This is used for pagination, so subsequent requests can continue where the previous one left off.
  • filter.fileUids (array of strings): Filters the results by a list of specific file unique identifiers (UIDs).

Here is an example showing how to make an API request with optional query parameters:

export INSTILL_API_TOKEN=********

curl -X GET 'http://localhost:8080/v1alpha/namespaces/NAMESPACE_ID/catalogs/CATALOG_ID/files?pageSize=10&pageToken=abcd1234&filter.fileUids="file_uid_a"&filter.fileUids="file_uid_b"' \
--header "Authorization: Bearer $INSTILL_API_TOKEN" \
--header "Content-Type: application/json"
from instill.clients import init_artifact_client

artifact = init_artifact_client(api_token="INSTILL_API_TOKEN", url="http://localhost:8080")
artifact.list_catalog_files(
  namespace_id="NAMESPACE_ID",
  catalog_id="CATALOG_ID",
  page_size=10,
  page_token="abcd1234",
  files_filter=["file_uid_a", "file_uid_b"],
)
artifact.close()

Note that the NAMESPACE_ID and CATALOG_ID path parameters must be replaced by the Catalog owner's ID (namespace) and the identifier of the Catalog whose files are to be listed, respectively.

Example Response

A successful response will return a list of files, with their associated Catalogs:

{
  "files": [
    {
      "fileUid": "file123",
      "name": "example_file.pdf",
      "type": "FILE_TYPE_PDF",
      "processStatus": "FILE_PROCESS_STATUS_COMPLETED",
      "ownerUid": "owner123",
      "kbUid": "kb123",
      "createTime": "2024-07-01T12:00:00Z",
      "updateTime": "2024-07-01T12:00:00Z",
      "size": 1024,
      "totalChunks": 10,
      "totalTokens": 100
    },
    {
      "fileUid": "file124",
      "name": "another_example_file.pdf",
      "type": "FILE_TYPE_PDF",
      "processStatus": "FILE_PROCESS_STATUS_COMPLETED",
      "ownerUid": "owner124",
      "kbUid": "kb124",
      "createTime": "2024-07-01T12:00:00Z",
      "updateTime": "2024-07-01T12:00:00Z",
      "size": 2048,
      "totalChunks": 20,
      "totalTokens": 200
    }
  ],
  "totalSize": 2,
  "pageSize": 10,
  "nextPageToken": "next1234",
  "filter": {
    "fileUids": [
      "file1",
      "file2"
    ]
  }
}

Output Description

  • files: An array of objects where each object represents a file in the Catalog.
    • fileUid (string): The unique identifier of the file.
    • name (string): The name of the file.
    • type (string): The type of the file (e.g., FILE_TYPE_PDF, FILE_TYPE_MARKDOWN, FILE_TYPE_TEXT).
    • processStatus (string): The processing status of the file.
    • ownerUid (string): The UID of the file owner.
    • kbUid (string): The UID of the Catalog the file belongs to.
    • createTime (string): The creation time of the file.
    • updateTime (string): The last update time of the file.
    • size (integer): The size of the file in bytes.
    • totalChunks (integer): The total number of chunks in the file.
    • totalTokens (integer): The total number of tokens in the file.
  • totalSize (integer): The total number of files that match the query.
  • pageSize (integer): The number of files returned in the response.
  • nextPageToken (string): The token for the next page of results. This is used for pagination.
  • filter: The filter object used in the query.
    • fileUids (array of strings): The list of file UIDs used in the filter.

Delete a File

📘

Please note that once a file is deleted, it cannot be recovered. All related Catalog entries (such as chunks, embeddings, etc.) will also be deleted.

export INSTILL_API_TOKEN=********

curl -X DELETE 'HOST_URL/v1alpha/catalogs/files?fileUid=FILE_UID' \
--header "Authorization: Bearer $INSTILL_API_TOKEN"
from instill.clients import init_artifact_client

artifact = init_artifact_client(api_token="INSTILL_API_TOKEN", url="HOST_URL")
artifact.delete_catalog_file(file_uid="FILE_UID")
artifact.close()

Note that the FILE_UID query parameter must be replaced by the unique identifier (UID) of the file to be deleted.

Manage Catalog Files via Console

View Files

To view files of a Catalog from Console, follow these steps:

  1. Launch Console locally at http://localhost:3000.
  1. Navigate to the Artifacts page using the navigation bar.
  2. Click the Catalog card you wish to view files from.
  3. Select Files in the left panel.

A list of files from your selected Catalog will appear below.

If you click on the file name, the single source of the truth (markdown text) of that file will appear.

Delete a File

To delete a file from a Catalog via Console, follow these steps:

  1. Launch Console locally at http://localhost:3000.
  1. Navigate to the Artifacts page using the navigation bar.
  2. Click the Catalog card you wish to delete a file from.
  3. Select Files in the left panel.
  4. Click the Delete button next to the file you wish to delete.