Pinboard (nharbour)
https://pinboard.in/u:nharbour/public/
recent bookmarks from nharbourneuralmagic/nm-vllm: A high-throughput and memory-efficient inference and serving engine for LLMs2024-03-12T04:31:56+00:00
https://github.com/neuralmagic/nm-vllm
nharbourinference vllm alternative production llm deep-learning twitter tim-dettmershttps://pinboard.in/u:nharbour/b:23a7c31fc843/Performance Tuning Tips - onnxruntime2024-02-28T12:29:49+00:00
https://pkreg101.github.io/onnxruntime/docs/performance/tips-to-tune-performance.html
nharbouronnx ort onnx-runtime deep-learning omp thread threading threads numpy pytorch inference productionhttps://pinboard.in/u:nharbour/b:868be1caeccc/Pete Hunt 🚁 on X - mistral vs gpt saving money2023-10-25T20:01:04+00:00
https://twitter.com/floydophone/status/1715035183228018816
nharbourllm openai gpt-3 llama mistral production serving inference twitter dagster quantisation quantizationhttps://pinboard.in/u:nharbour/b:de5067862564/Carton - Run any ML model from any programming language.2023-10-02T13:32:28+00:00
https://carton.run/
nharbourinference compile pytorch python deep-learning production trace tracinghttps://pinboard.in/u:nharbour/b:4f4d6901418a/turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs2023-09-13T12:02:03+00:00
https://github.com/turboderp/exllamav2
nharbourA fast inference library for running LLMs locally on modern consumer-class GPUs - turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs]]>inference llama llama-2 deep-learning productionhttps://pinboard.in/u:nharbour/b:08252a6356d1/How is LLaMa.cpp possible?2023-08-16T12:12:25+00:00
https://finbarr.ca/how-is-llama-cpp-possible/
nharbourllama-2 llama.cpp deep-learning memory llm karpathy inferencehttps://pinboard.in/u:nharbour/b:c9d56ff8a988/turboderp/exllama: A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.2023-07-28T10:46:59+00:00
https://github.com/turboderp/exllama
nharbourA more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights. - turboderp/exllama: A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.]]>llama inference deep-learning llama-2 qptq quantization quantisationhttps://pinboard.in/u:nharbour/b:11e300589b31/Bring Your Own Container With Amazon SageMaker | by Ram Vegiraju | Nov, 2021 | Towards Data Science2021-11-17T02:23:28+00:00
https://towardsdatascience.com/bring-your-own-container-with-amazon-sagemaker-37211d8412f4
nharboursagemaker deep-learning aws medium inference spacy dockerhttps://pinboard.in/u:nharbour/b:fc0416ed7037/Using PyTorch-Neuron and the AWS Neuron Compiler - Deep Learning AMI2021-06-15T23:17:11+00:00
https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-pytorch-neuron.html
nharbourThe PyTorch-Neuron compilation API provides a method to compile a model graph that you can run on an AWS Inferentia device.]]>pytorch aws compiler inferentia ec2 deep-learning inference productionhttps://pinboard.in/u:nharbour/b:bb92973c7c70/Boost inference speed of T5 models up to 5X & reduce the model size by 3X - 🤗Transformers - Hugging Face Forums2021-03-17T04:33:24+00:00
https://discuss.huggingface.co/t/boost-inference-speed-of-t5-models-up-to-5x-reduce-the-model-size-by-3x/4405
nharbourT5 models inference is naturally slow, as they undergo seq2seq decoding. To speed up the inference speed, we can convert the t5 model to onnx and run them on onnxruntime.
these are the steps to run T5 models on onnxrun…]]>t5 nlp transformers onnx production inferencehttps://pinboard.in/u:nharbour/b:339c7dce16e5/cortexlabs/cortex: Build machine learning APIs2020-08-26T06:25:27+00:00
https://github.com/cortexlabs/cortex
nharbourBuild machine learning APIs. Contribute to cortexlabs/cortex development by creating an account on GitHub.]]>machine-learning deep-learning api serving production inference reddithttps://pinboard.in/u:nharbour/b:f72590ee77e1/pytorch/cpu_threading_torchscript_inference.rst at master · pytorch/pytorch2020-07-31T04:48:40+00:00
https://github.com/pytorch/pytorch/blob/master/docs/source/notes/cpu_threading_torchscript_inference.rst
nharbourTensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch]]>thread threading multithreaded threads pytorch deep-learning inference nlphttps://pinboard.in/u:nharbour/b:3ab967a93256/Scaling AllenNLP/PyTorch in Production - Snaptravel - Medium2020-06-17T19:20:42+00:00
https://medium.com/snaptravel/scaling-allennlp-pytorch-in-production-56746c76d710
nharbourYou can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or window. Reload to refresh your session. Reload to refresh your session.]]>pytorch production gunicorn threading inference deep-learning allennlphttps://pinboard.in/u:nharbour/b:73e47031cc24/fatchord/WaveRNN: WaveRNN Vocoder + TTS2020-05-17T01:13:39+00:00
https://github.com/fatchord/WaveRNN
nharbourWaveRNN Vocoder + TTS. Contribute to fatchord/WaveRNN development by creating an account on GitHub.]]>tts text-to-speech facebook deep-learning reddit sota cpu inferencehttps://pinboard.in/u:nharbour/b:3615e6cfaf81/WaveRNN + TTS Outputs | model_outputs2020-05-17T01:13:35+00:00
https://fatchord.github.io/model_outputs/
nharbourtts text-to-speech facebook deep-learning reddit sota cpu inferencehttps://pinboard.in/u:nharbour/b:0f0b7d88b450/MIL WebDNN2020-05-06T04:23:25+00:00
https://mil-tokyo.github.io/webdnn/
nharbourWebDNN is an open source software framework for fast execution of deep neural network (DNN) pre-trained model on web browser.]]>webdnn inference deep-learning browserhttps://pinboard.in/u:nharbour/b:fe02169281e2/ahkarami/Deep-Learning-in-Production: In this repository, I will share some useful notes and references about deploying deep learning-based models in production.2020-05-06T04:20:36+00:00
https://github.com/ahkarami/Deep-Learning-in-Production
nharbourIn this repository, I will share some useful notes and references about deploying deep learning-based models in production. - ahkarami/Deep-Learning-in-Production]]>deep-learning production pytorch inferencehttps://pinboard.in/u:nharbour/b:e12f4ee538b9/TorchServe and [TorchElastic for Kubernetes], new PyTorch libraries for serving and training models at scale2020-04-24T02:15:47+00:00
https://medium.com/pytorch/torchserve-and-torchelastic-for-kubernetes-new-pytorch-libraries-for-serving-and-training-models-2efd12e09adc
nharbourtorchserve production inference medium deep-learning nlphttps://pinboard.in/u:nharbour/b:57d70f01adc3/Deploy machine learning models in production - Cortex2020-04-18T07:02:45+00:00
https://www.cortex.dev/
nharbourdeep-learning production inference fargate eks k8shttps://pinboard.in/u:nharbour/b:9877c5f193cb/Jetson Nano Developer Kit | NVIDIA Developer2019-03-19T06:50:34+00:00
https://developer.nvidia.com/embedded/buy/jetson-nano-devkit?nvid=nv-int-mn-78462
nharbour*/]]>nvidia inference hardware gpu pytorch deep-learninghttps://pinboard.in/u:nharbour/b:a89f281be715/Amazon Elastic Inference – GPU-Powered Deep Learning Inference Acceleration | AWS News Blog2018-12-11T06:55:03+00:00
https://aws.amazon.com/blogs/aws/amazon-elastic-inference-gpu-powered-deep-learning-inference-acceleration/
nharbouraws inference production elastic gpu deep-learninghttps://pinboard.in/u:nharbour/b:f2dde3d9c014/Deploying a Seq2Seq Model with the Hybrid Frontend — PyTorch Tutorials 1.0.0.dev20181128 documentation2018-12-07T07:46:45+00:00
https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html
nharbourlstm seq2seq pytorch jit trace deep-learning production inference scripthttps://pinboard.in/u:nharbour/b:4403bee2bca1/Deploying on Zeit | fast.ai course v32018-11-12T01:44:39+00:00
https://course-v3.fast.ai/deployment_zeit.html
nharbourdeploy pytorch now.sh now.js deep-learning fast.ai deployment inferencehttps://pinboard.in/u:nharbour/b:6ec9357e1388/Deploy any machine learning model serverless in AWS | Ritchie Vink2018-10-04T14:09:42+00:00
https://www.ritchievink.com/blog/2018/09/16/deploy-any-machine-learning-model-serverless-in-aws/
nharbourlambda aws deep-learning deploy production inference deployment model serverlesshttps://pinboard.in/u:nharbour/b:2c78bd45dcb2/Deploying models in Production with good performance - Deep Learning - Deep Learning Course Forums2018-09-05T15:01:18+00:00
http://forums.fast.ai/t/deploying-models-in-production-with-good-performance/21410
nharbourHey guys, after reading the forum posts on deploying model to production, I came around some different frameworks which are really helpful in deploying the machine learning models. You can deploy using spark for real tim…]]>inference deploy deployment production deep-learning framework fastaihttps://pinboard.in/u:nharbour/b:f202451729dd/Clipper :: Clipper2018-08-31T02:20:31+00:00
http://clipper.ai/
nharbourclipper deep-learning docker framework inference production deploy deployment hosting servinghttps://pinboard.in/u:nharbour/b:2d21b3839e80/User Guide - GraphPipe -- Dead Simple ML Model Serving via a Standard Protocol2018-08-15T20:44:28+00:00
https://oracle.github.io/graphpipe/#/guide/user-guide/overview
nharbourapi inference serving pytorch oracle deep-learninghttps://pinboard.in/u:nharbour/b:9867888d3ea5/Exposing DL models as api's/microservices - Part 2 - Deep Learning Course Forums2018-05-09T01:27:29+00:00
http://forums.fast.ai/t/exposing-dl-models-as-apis-microservices/13477/9
nharbourHi All,
Recently I have seen some blogposts and talks describing putting DL/ML models in production by packaging them as api’s. I would like this thread to be a resource for getting started approaches, learning resource…]]>hosting pytorch production inference api service webserver microservice flask deep-learninghttps://pinboard.in/u:nharbour/b:85d3136435b7/ysh329/deep-learning-model-convertor: The convertor/conversion of deep learning models for different deep learning frameworks/softwares.2017-09-12T21:20:30+00:00
https://github.com/ysh329/deep-learning-model-convertor
nharbourdeep-learning-model-convertor - The convertor/conversion of deep learning models for different deep learning frameworks/softwares.]]>ios core-ml coreml convert model keras pytorch tensorflow inference servinghttps://pinboard.in/u:nharbour/b:78812f375684/Implement Bayesian inference using PHP, Part 12008-03-13T22:23:55+00:00
http://www.ibm.com/developerworks/web/library/wa-bayes1/
nharbourbayes bayesian inference php probability statistics market-research survey code codinghttps://pinboard.in/u:nharbour/b:bb250fc5aab6/