Open source tool for running large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning.
Serverless GPU inference for ML models Pay-per-millisecond API to run ML in production.