AssemblyAI

AssemblyAI Partner Service is available from CLI version 1.51.0 and greater

Cerebrium’s partnership with AssemblyAI helps teams deliver speech-to-text (STT) services with efficient deployment, minimized latency, and region selection for data privacy compliance needs.

Setup

Contact the AssemblyAI team in order to get a self-hosted Contract. You can contact them here
Create a simple cerebrium app with the CLI:

cerebrium init assembly-ai

AssemblyAI services use a simplified TOML configuration with the [cerebrium.runtime.assemblyai] section. Create a cerebrium.toml file with the following:

[cerebrium.deployment]
name = "assembly-ai"
disable_auth = true

[cerebrium.runtime.assemblyai]
port = 8080
model_name = "english"

[cerebrium.hardware]
cpu = 4
memory = 16
compute = "AMPERE_A10"
gpu_count = 1
region = "us-east-1"

[cerebrium.scaling]
min_replicas = 1
max_replicas = 3
cooldown = 120
replica_concurrency = 32
scaling_metric = "concurrency_utilization"
scaling_target = 70

The above disables auth meaning anyone can make requests to your endpoint. If you set disable_auth=false, then you need to use the API key from your Cerebrium Dashboard. key

Run cerebrium deploy to deploy the AssemblyAI service - the output of which should appear as follows:

App Dashboard: https://dashboard.cerebrium.ai/projects/p-xxxxxxxx/apps/p-xxxxxxxx-assembly-ai

Use the Deployment url from the output to send requests to the WS AssemblyAI service. We can use their example repo to test its working

1. git clone https://github.com/AssemblyAI/streaming-self-hosting-stack.git
2. cd streaming_example && python example_with_prerecorded_audio_file.py --audio-file example_audio_file.wav --endpoint wss://api.aws.us-east-1.cerebrium.ai/v4/p-xxxxxx/assembly-ai --language english

You should then see the following output:

00:01.040000-0:00:01.200000, end-of-turn: False: it's true
00:01.040000-0:00:01.280000, end-of-turn: False: it's true that
00:01.040000-0:00:01.600000, end-of-turn: False: it's true that assem
00:01.040000-0:00:01.680000, end-of-turn: False: it's true that assembly
00:01.040000-0:00:02.080000, end-of-turn: False: it's true that assembly a
00:01.040000-0:00:02.160000, end-of-turn: False: it's true that assembly ai
00:01.040000-0:00:02.320000, end-of-turn: False: it's true that assembly ai lets
00:01.040000-0:00:02.400000, end-of-turn: False: it's true that assembly ai lets you
00:01.040000-0:00:02.560000, end-of-turn: False: it's true that assembly ai lets you build

Scaling and Concurrency

AssemblyAI services support independent scaling configurations:

min_replicas: Minimum instances to maintain (0 for scale-to-zero). Recommended: 1.
max_replicas: Maximum instances during high load.
replica_concurrency: Concurrent requests per instance. Recommended: 3.
cooldown: Seconds an instance remains active after last request. Recommended: 32.
compute: Instance type. Recommended: AMPERE_A10.

Adjust these parameters based on traffic patterns and latency requirements. Best would be to consult the Rime team about concurrency and scalability For further documentation on AssemblyAI, see the AssemblyAI documentation.

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Storage

Partner Services

Other concepts

Setup

Scaling and Concurrency

Getting Started

Container Images

GPUs and Compute Resources

Scaling apps

Deployments

Endpoints

Storage

Partner Services

Other concepts

​Setup

​Scaling and Concurrency

Setup

Scaling and Concurrency