소스 검색

fix: code block segmentation problem of markdown document (#6465)

灰灰 9 달 전
부모
커밋
5e4ac11df3
1개의 변경된 파일8개의 추가작업 그리고 0개의 파일을 삭제
  1. 8 0
      api/core/rag/extractor/markdown_extractor.py

+ 8 - 0
api/core/rag/extractor/markdown_extractor.py

@@ -54,8 +54,16 @@ class MarkdownExtractor(BaseExtractor):
 
 
         current_header = None
         current_header = None
         current_text = ""
         current_text = ""
+        code_block_flag = False
 
 
         for line in lines:
         for line in lines:
+            if line.startswith("```"):
+                code_block_flag = not code_block_flag
+                current_text += line + "\n"
+                continue
+            if code_block_flag:
+                current_text += line + "\n"
+                continue
             header_match = re.match(r"^#+\s", line)
             header_match = re.match(r"^#+\s", line)
             if header_match:
             if header_match:
                 if current_header is not None:
                 if current_header is not None: