SemaDB Search Overview #
POST: /collections/{collectionId}/points/search
To search for points / documents in SemaDB, we make a post request to the search endpoint of a collection with a query object. The query object is a JSON object that contains the search parameters which can be of different types like text, vector, and hybrid. These can also be combined to create complex and powerful queries.
The overall request consists of the following parts:
- Query: What to search for and how? This can be a text query, a vector query, or a hybrid query.
- Select (optional): What fields to return in the search results. It is often common to not return all fields in the search results, especially vector fields. You can select nested fields using the dot notation such as
nested.field
or use*
to select all fields.- If you don’t select any fields, only the
_id
field along with any scoring / distance fields will be returned.
- If you don’t select any fields, only the
- Sort (optional): How to sort the search results. This can be based on a field or a distance from a vector. Any sort fields must be selected first.
- Offset (optional): How many results to skip from the beginning of the overall search results.
- Limit: How many results to return from the search results.
Please refer to the API documentation for more details on the search endpoint and parameters.
graph TD SR["Search Request"] --> Query SR --> Select("Select") SR --> Sort("Sort") SR --> Offset("Offset") SR --> Limit("Limit") Query --> Vector("My Vector Field") Vector --> VQ1["Vector Parameters"]
Here is a full single query example:
{
"query": {
"property": "productEmbedding",
// Vamana index search parameters
"vectorVamana": {
"vector": [1, 2],
"operator": "near",
// See the vector search section for parameter details
"searchSize": 75,
// This limit is independent of the overall limit
"limit": 10
}
},
"select": ["description", "price"],
"sort": [
{
"field": "price",
"descending": false // ascending order
}
],
"limit": 5
}
Notice how the limits in the query object are independent of the overall limit. What we are saying here is, get the nearest 10 points to the vector [1, 2]
and then sort them by price in ascending order and return the top 5 results.
Each query type has its own set of parameters that can be used to fine-tune the search results. The search results are returned in the form of a JSON object:
{
"points": [
{
"_distance": 0,
"_hybridScore": -0,
"_id": "faefe2b1-cf85-48db-9621-94b833ee9cc9",
"description": "A product",
"price": 100
},
{
"_distance": 314402.94,
// Since we don't have a hybrid query, the hybrid score is the same as the distance
// It is negated to have higher scores indicate better matches
"_hybridScore": -314402.94,
"_id": "d2e3ebb5-149a-46e4-8b3f-cd6c38530da3",
"description": "Another product",
"price": 200
}
]
}
Composite Queries #
Each Query object refers to a single field in the collection. To create complex queries, we can combine multiple queries using the _and
and _or
as the query property.
graph TD subgraph "Single Query" Query --> Vector("My Vector Field") Vector --> VQ1["Vector Parameters"] end subgraph "Composite Query" And["_and"] Or["_or"] And --> QA1["Query 1"] And --> QA2["..."] Or --> QA3["Query 1"] Or --> QA4["..."] end
Here is an example request:
{
"query": {
"property": "_and",
"_and": [
{
"property": "size",
"integer": {
"value": 10,
"operator": "equals",
}
},
{
"property": "price",
"float": {
"value": 100,
"operator": "lessThan",
}
}
]
},
"limit": 10
}