[Doc] Add `projects` section in README which is developed based on FasterTransformer by lvhan028 · Pull Request #731 · NVIDIA/FasterTransformer

lvhan028 · 2023-07-25T04:35:34Z

It is noted that some issues(#506 #729 #727) are requesting FasterTransformer to support Llama and Llama-2. Our project LMDeploy developed based on FasterTransformer, has supported them and their derived models, like vicuna, alpaca, baichuan, and so on.

Meanwhile, LMDeploy has developed a continuous-batch-like feature named persistent-batch, which can handle #696 by the way. It modeled the inference of a conversational LLM as a persistently running batch whose lifetime spans the entire serving process, To put it simply

The persistent batch as N pre-configured batch slots.
Requests join the batch when there are free slots available. A batch slot is released and can be reused once the generation of the requested tokens is finished.
On cache-hits , history tokens don't need to be decoded in every round of a conversation; generation of response tokens will start instantly.
The batch grows or shrinks automatically to minimize unnecessary computations.

We really appreciate FasterTransformer team for developing such an efficient and high-throughput LLM inference engine

AnyangAngus · 2023-07-25T06:39:37Z

@lvhan028
Cool！
I see TurboMind can support llama-2-70b with GQA now.
I would like to ask if there will be any support plans for LMDeploy to support Llama-2-7b and Llama-2-13b with GQA ?
Thank U！

lvhan028 · 2023-07-25T07:48:24Z

@AnyangAngus
GQA in LMDeploy/TurboMind doesn't distinguish between 7B, 13B, or 70B models.

But as far as I know, llama-2-7b/13b doesn't have GQA block

add projects

08fcd41

lvhan028 changed the title ~~[Doc] add projects section in README which is developed based on FasterTransformer~~ [Doc] Add projects section in README which is developed based on FasterTransformer Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Add `projects` section in README which is developed based on FasterTransformer#731

[Doc] Add `projects` section in README which is developed based on FasterTransformer#731
lvhan028 wants to merge 1 commit intoNVIDIA:mainfrom
lvhan028:add-project-in-readme

lvhan028 commented Jul 25, 2023 •

edited

Loading

Uh oh!

AnyangAngus commented Jul 25, 2023

Uh oh!

lvhan028 commented Jul 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lvhan028 commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AnyangAngus commented Jul 25, 2023

Uh oh!

lvhan028 commented Jul 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lvhan028 commented Jul 25, 2023 •

edited

Loading