After a short break from our first triumph, the PyData Yerevan Second Monthly Meetup is approaching!
NLP research engineer in Unum, Vladimir Orshulevich, will walk us through the next chapter with “Fast inference for Language Models” talk.
This will be a remarkable opportunity to:
discover about speedups for vanilla hugging face models inference using TensorRT, ONNX, and Nvidia NGC Container that is optimized for GPU acceleration,
be acquainted with how to make NLP models lighter and fast,
explore batch size, max_length selection,
find out about GPU distributed model inference.
Mark your calendars to join the talk on the updated date, on May 19, at 19:00, in room 314W PAB, at the American University of Armenia.