Questions

Predefined questions allow you to quickly set up interactions with your AI agents.

With each dataset, you have the choice to allow users to freely chat with your bot, or have the bot only answer "predefined questions". Predefined questions can be universal across all datasets, specific to a dataset, or be a follow-up question to another predefined question. Furthermore, each predefined question can trigger an action or exclude the use of a module assigned to a dataset.

Question Settings

You can manage question settings using the "Update Question" endpoint of the API.

Setting Key	Type	Default	Description
`cacheDelayInSeconds`	`number`	`0`	In seconds. How long you want to throttle replies for when using cache.
`cacheAnswers`	`number`	`0`	How many cached options to randomly choose from when replying to users. See the "Caching" section for more details.
`cacheAnswerInSeconds`	`number`	`0`	In seconds. How long to cache replies for before sending new requests to the AI models. If set to zero (0), caching will not be used.
`skipModules`	`string[]`	`[]`	The modules which you want to skip when asking this question. Please see the modules page for a list of module keys.

Caching

If you choose to use dataset caching, it's important to understand the potential consequences.

First Request Latency

Because the settings allow you to cache multiple responses, please note that the first time you ask a question before the cache has been set, it may take quite a while to complete all the requests to the various AI models. Please be patient during this initial stage.

Models

Even though a chat will show as using a specific model, when Bella is collecting answers to cache, it will randomly rotate through all the models for that dataset. This means that even though a user's chat may show as one model, the cached replies may come from a different model. This is done to ensure that the cached replies are not all the same, and to provide a more varied experience for users.

Optionality

If you enable caching, BellaAI offers the ability to cache multiple replies to a user's query so that every user doesn't get the same answer. Using this will create more variability in the responses.

Throttling

The vast majority of time spent waiting for your agent to reply is taken up by the AI model "thinking". When caching isn't used, depending on the model, it can take anywhere from 2 to 20 seconds to reply. Over the course of the entire request, 95%+ of the time is spent waiting for a reply is just the AI model thinking.

However, when using caching, because the AI model is never requested, the overall speed of the request is highly increased, down from seconds to milliseconds. If you want to maintain the illusion of AI "thinking", you can throlle your caches replies from the API.