浏览代码

Update PerfXCloud Model List (#7212)

Co-authored-by: xhb <466010723@qq.com>
Hongbin 11 月之前
父节点
当前提交
d1a6702aa4
共有 18 个文件被更改,包括 593 次插入25 次删除
  1. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/Llama3-Chinese_v2.yaml
  2. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3-70B-Instruct-GPTQ-Int4.yaml
  3. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3-8B-Instruct.yaml
  4. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3.1-405B-Instruct-AWQ-INT4.yaml
  5. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3.1-8B-Instruct.yaml
  6. 4 3
      api/core/model_runtime/model_providers/perfxcloud/llm/Qwen-14B-Chat-Int4.yaml
  7. 4 3
      api/core/model_runtime/model_providers/perfxcloud/llm/Qwen1.5-110B-Chat-GPTQ-Int4.yaml
  8. 4 4
      api/core/model_runtime/model_providers/perfxcloud/llm/Qwen1.5-72B-Chat-GPTQ-Int4.yaml
  9. 4 4
      api/core/model_runtime/model_providers/perfxcloud/llm/Qwen1.5-7B.yaml
  10. 5 5
      api/core/model_runtime/model_providers/perfxcloud/llm/Qwen2-72B-Instruct-GPTQ-Int4.yaml
  11. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/Qwen2-72B-Instruct.yaml
  12. 4 4
      api/core/model_runtime/model_providers/perfxcloud/llm/Qwen2-7B.yaml
  13. 11 2
      api/core/model_runtime/model_providers/perfxcloud/llm/_position.yaml
  14. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/chatglm3-6b.yaml
  15. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/deepseek-v2-chat.yaml
  16. 61 0
      api/core/model_runtime/model_providers/perfxcloud/llm/deepseek-v2-lite-chat.yaml
  17. 4 0
      api/core/model_runtime/model_providers/perfxcloud/text_embedding/BAAI-bge-large-en-v1.5.yaml
  18. 4 0
      api/core/model_runtime/model_providers/perfxcloud/text_embedding/BAAI-bge-large-zh-v1.5.yaml

文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/Llama3-Chinese_v2.yaml


文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3-70B-Instruct-GPTQ-Int4.yaml


文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3-8B-Instruct.yaml


文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3.1-405B-Instruct-AWQ-INT4.yaml


文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/Meta-Llama-3.1-8B-Instruct.yaml


+ 4 - 3
api/core/model_runtime/model_providers/perfxcloud/llm/Qwen-14B-Chat-Int4.yaml

@@ -55,7 +55,8 @@ parameter_rules:
       zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
       en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
 pricing:
-  input: '0.000'
-  output: '0.000'
-  unit: '0.000'
+  input: "0.000"
+  output: "0.000"
+  unit: "0.000"
   currency: RMB
+deprecated: true

+ 4 - 3
api/core/model_runtime/model_providers/perfxcloud/llm/Qwen1.5-110B-Chat-GPTQ-Int4.yaml

@@ -55,7 +55,8 @@ parameter_rules:
       zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
       en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
 pricing:
-  input: '0.000'
-  output: '0.000'
-  unit: '0.000'
+  input: "0.000"
+  output: "0.000"
+  unit: "0.000"
   currency: RMB
+deprecated: true

+ 4 - 4
api/core/model_runtime/model_providers/perfxcloud/llm/Qwen1.5-72B-Chat-GPTQ-Int4.yaml

@@ -6,7 +6,7 @@ features:
   - agent-thought
 model_properties:
   mode: chat
-  context_size: 8192
+  context_size: 2048
 parameter_rules:
   - name: temperature
     use_template: temperature
@@ -55,7 +55,7 @@ parameter_rules:
       zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
       en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
 pricing:
-  input: '0.000'
-  output: '0.000'
-  unit: '0.000'
+  input: "0.000"
+  output: "0.000"
+  unit: "0.000"
   currency: RMB

+ 4 - 4
api/core/model_runtime/model_providers/perfxcloud/llm/Qwen1.5-7B.yaml

@@ -6,7 +6,7 @@ features:
   - agent-thought
 model_properties:
   mode: completion
-  context_size: 8192
+  context_size: 32768
 parameter_rules:
   - name: temperature
     use_template: temperature
@@ -55,7 +55,7 @@ parameter_rules:
       zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
       en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
 pricing:
-  input: '0.000'
-  output: '0.000'
-  unit: '0.000'
+  input: "0.000"
+  output: "0.000"
+  unit: "0.000"
   currency: RMB

+ 5 - 5
api/core/model_runtime/model_providers/perfxcloud/llm/Qwen2-72B-Instruct-GPTQ-Int4.yaml

@@ -8,12 +8,12 @@ features:
   - stream-tool-call
 model_properties:
   mode: chat
-  context_size: 8192
+  context_size: 2048
 parameter_rules:
   - name: temperature
     use_template: temperature
     type: float
-    default: 0.3
+    default: 0.7
     min: 0.0
     max: 2.0
     help:
@@ -57,7 +57,7 @@ parameter_rules:
       zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
       en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
 pricing:
-  input: '0.000'
-  output: '0.000'
-  unit: '0.000'
+  input: "0.000"
+  output: "0.000"
+  unit: "0.000"
   currency: RMB

文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/Qwen2-72B-Instruct.yaml


+ 4 - 4
api/core/model_runtime/model_providers/perfxcloud/llm/Qwen2-7B.yaml

@@ -8,7 +8,7 @@ features:
   - stream-tool-call
 model_properties:
   mode: completion
-  context_size: 8192
+  context_size: 32768
 parameter_rules:
   - name: temperature
     use_template: temperature
@@ -57,7 +57,7 @@ parameter_rules:
       zh_Hans: 用于控制模型生成时的重复度。提高repetition_penalty时可以降低模型生成的重复度。1.0表示不做惩罚。
       en_US: Used to control the repeatability when generating models. Increasing repetition_penalty can reduce the duplication of model generation. 1.0 means no punishment.
 pricing:
-  input: '0.000'
-  output: '0.000'
-  unit: '0.000'
+  input: "0.000"
+  output: "0.000"
+  unit: "0.000"
   currency: RMB

+ 11 - 2
api/core/model_runtime/model_providers/perfxcloud/llm/_position.yaml

@@ -1,6 +1,15 @@
+- Meta-Llama-3.1-405B-Instruct-AWQ-INT4
+- Meta-Llama-3.1-8B-Instruct
+- Meta-Llama-3-70B-Instruct-GPTQ-Int4
+- Meta-Llama-3-8B-Instruct
 - Qwen2-72B-Instruct-GPTQ-Int4
+- Qwen2-72B-Instruct
 - Qwen2-7B
-- Qwen1.5-110B-Chat-GPTQ-Int4
+- Qwen-14B-Chat-Int4
 - Qwen1.5-72B-Chat-GPTQ-Int4
 - Qwen1.5-7B
-- Qwen-14B-Chat-Int4
+- Qwen1.5-110B-Chat-GPTQ-Int4
+- deepseek-v2-chat
+- deepseek-v2-lite-chat
+- Llama3-Chinese_v2
+- chatglm3-6b

文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/chatglm3-6b.yaml


文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/deepseek-v2-chat.yaml


文件差异内容过多而无法显示
+ 61 - 0
api/core/model_runtime/model_providers/perfxcloud/llm/deepseek-v2-lite-chat.yaml


+ 4 - 0
api/core/model_runtime/model_providers/perfxcloud/text_embedding/BAAI-bge-large-en-v1.5.yaml

@@ -0,0 +1,4 @@
+model: BAAI/bge-large-en-v1.5
+model_type: text-embedding
+model_properties:
+  context_size: 32768

+ 4 - 0
api/core/model_runtime/model_providers/perfxcloud/text_embedding/BAAI-bge-large-zh-v1.5.yaml

@@ -0,0 +1,4 @@
+model: BAAI/bge-large-zh-v1.5
+model_type: text-embedding
+model_properties:
+  context_size: 32768