milvus-logo
Star
0
Forks
0
Get Started

Tutorial

This is a basic introduction to Milvus by PyMilvus.

For a runnable python script, checkout example.py on PyMilvus Github, or hello milvus on Milvus official website. It's a good recommended start to get started with Milvus and PyMilvus as well.

Note

Here we use float vectors as example vector field data, if you want to learn example about binary vectors, see binary vector example.

Prerequisites

Before we start, there are some prerequisites.

Make sure that:

  • You have a running Milvus instance.

  • PyMilvus is correctly installed, see Installation.

Connect to Milvus

First of all, we need to import pymilvus.

>>> from pymilvus import connections

Then, we can make connection with Milvus server. By default Milvus runs on localhost in port 19530, so you can use default value to connect to Milvus.

>>> host = '127.0.0.1'
>>> port = '19530'
>>> connections.add_connection(default={"host": host, "port": port})
>>> connections.connect(alias='default')

After connecting, we can communicate with Milvus in the following ways. If you are confused about the terminology, see Milvus Terminology for explanations.

Collection

Now let's create a new collection. Before we start, we can list all the collections already exist. For a brand new Milvus running instance, the result should be empty.

>>> from pymilvus import list_collections
>>> list_collections()
[]

Create Collection

To create collection, we could provide the schema for it.

In this tutorial, we will create a collection with three fields: id, year and embedding.

The type of 'id' field is int64, and it is set as primary field. The type of year field is int64, and the type of embedding is FLOAT_VECTOR whose dim is 128.

Now we can create a collection:

>>> from pymilvus import Collection, DataType, FieldSchema, CollectionSchema
>>> dim = 128
>>> id_field = FieldSchema(name="id", dtype=DataType.INT64, description="primary_field")
>>> year_field = FieldSchema(name="year", dtype=DataType.INT64, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
>>> schema = CollectionSchema(fields=[id_field, year_field, embedding_field], primary_field='id', auto_id=True, description='desc of collection')
>>> collection_name = "tutorial"
>>> collection = Collection(name=collection_name, schema=schema)

Then you can list collections and 'tutorial' will be in the result.

>>> list_collections()
['tutorial']

You can also get info of the collection.

>>> collection.description
"desc of collection"

This tutorial is a basic intro tutorial, building index won't be covered by this tutorial. If you want to go further into Milvus with indexes, it's recommended to check our index examples.

If you're already known about indexes from index examples, and you want a full lists of params supported by PyMilvus, you check out Index chapter of the PyMilvus documentation.

Further more, if you want to get a thorough view of indexes, check our official website for Vector Index.

Create Partition

If you don't create a partition, there will be a default one called "_default", all the entities will be inserted into the "_default" partition. You can check it by Collection.partitions()

>>> collection.partitions
[{"name": "_default", "description": "", "num_entities": 0}]

You can provide a partition name to create a new partition.

>>> collection.create_partition("new_partition")
>>> collection.partitions
[{"name": "_default", "description": "", "num_entities": 0}, {"name": "new_partition", "description": "", "num_entities": 0}]

Insert Entities

An entity is a group of fields that corresponds to real world objects. In this tutorial, collection has three fields. Here is an example of 30 entities structured in list of list. .. note:

The field id was set as primary and auto_id above, so we shall not input the value for it when inserting.

>>> import random
>>> nb = 30
>>> years = [i for i in range(nb)]
>>> embeddings = [[random.random() for _ in range(dim)] for _ in range(nb)]
>>> entities = [years, embeddings]
>>> collection.insert(entities)