v1.0: release Chinese-Mixtral and Chinese-Mixtral-Instruct (#1)

* doc: init doc * Update README.md * doc: update gguf model perf. * doc: finish quant perf. * doc: update intro * doc: add longbench info * doc: update iq2_xs, iq2_xxs perf. * doc: add chinese-mixtral baidu links * doc: add ppl v.s. ctx figure * add merge_lora script * Update merge_mixtral_with_chinese_lora_low_mem.py clean up old naming * doc: update template * doc: update numbers, mixtral arch * llamacpp: add chat script * Update chinese-mixtral-ppl.png * doc: add perf. * doc: update baidu link * doc: update gpt-4 rating * doc: finsh perf. * script: change default temp. * doc: add gpt-4 score * doc: update context desc. * doc: init en readme * doc: minor fixes * update based on Codacy --------- Co-authored-by: ymcui <[email protected]> Co-authored-by: iMountTai <[email protected]>
ymcui · Jan 29, 2024 · f7abd09 · f7abd09
1 parent 776e7bd
commit f7abd09
Show file tree

Hide file tree

Showing 9 changed files with 582 additions and 4 deletions.
diff --git a/README.md b/README.md
diff --git a/README_EN.md b/README_EN.md
@@ -0,0 +1,15 @@
+[**🇨🇳中文**](./README.md) | [**🌐English**](./README_EN.md) | [**📖文档/Docs**](https://github.com/ymcui/Chinese-Mixtral/wiki) | [**❓提问/Issues**](https://github.com/ymcui/Chinese-Mixtral/issues) | [**💬讨论/Discussions**](https://github.com/ymcui/Chinese-Mixtral/discussions) | [**⚔️竞技场/Arena**](http://llm-arena.ymcui.com/)
+
+<p align="center">
+    <br>
+    <img src="./pics/banner.png" width="800"/>
+    <br>
+</p>
+<p align="center">
+    <img alt="GitHub" src="https://img.shields.io/github/license/ymcui/Chinese-Mixtral.svg?color=blue&style=flat-square">
+    <img alt="GitHub release (latest by date)" src="https://img.shields.io/github/v/release/ymcui/Chinese-Mixtral">
+    <img alt="GitHub top language" src="https://img.shields.io/github/languages/top/ymcui/Chinese-Mixtral">
+    <a href="https://app.codacy.com/gh/ymcui/Chinese-Mixtral/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"><img src="https://app.codacy.com/project/badge/Grade/142d688425494644b5b156068f55370d"/></a>
+</p>
+English readme is under construction.
+
diff --git a/examples/mixtral-vs-alpaca-gpt4rating.md b/examples/mixtral-vs-alpaca-gpt4rating.md
@@ -0,0 +1,86 @@
+# GPT-4 评分细节
+
+注：system1为Chinese-Mixtral-Instruct，system2为Chinese-Alpaca-2-13B。
+
+### 评分输出
+
+**No.1 Prompt:**
+
+- **System 1:** 9/10. This response is informative, well-structured, and covers multiple aspects of Emperor Li Yuan's reign and his significance in Chinese history. It details his reforms, cultural contributions, and diplomatic policies, providing a comprehensive overview.
+- **System 2:** 8/10. This response is also quite good, highlighting Li Yuan's political and military prowess and his contributions to the prosperity of the Tang Dynasty. However, it's slightly less detailed compared to System 1, particularly in terms of specific reforms and cultural achievements.
+
+**No.2 Prompt:**
+- **System 1:** 7/10. The explanation is accurate but somewhat repetitive and lacks clarity in conveying the stages of black hole formation. The description of the process leading to a black hole from a neutron star is correct but could be more concise.
+- **System 2:** 8/10. This response provides a clear and succinct explanation of black hole formation, touching upon key concepts like event horizons and singularities. It also nicely includes the observational aspects of black holes, making the explanation more comprehensive.
+
+**No.3 Prompt:**
+- **System 1:** 9/10. The translation "Drink water and think of the source" is a direct and accurate reflection of the Chinese idiom "饮水思源", capturing its essence.
+- **System 2:** 4/10. The translation "Water is the source of thought" deviates significantly from the original meaning of the idiom, which emphasizes gratitude and remembering one's origins, not a literal connection between water and thought.
+
+**No.4 Prompt:**
+- **System 1:** 8/10. The response provides a thoughtful interpretation of the relationship between Jia Baoyu and Lin Daiyu in "Dream of the Red Chamber", considering it as pure friendship and love while also reflecting on societal constraints. It could, however, delve deeper into the nuances of their relationship.
+- **System 2:** 7/10. This response acknowledges the complexity and multiple interpretations of the relationship, which is a valid approach. However, it lacks a definitive stance and detailed analysis, making it seem somewhat non-committal and less informative.
+
+**No.5 Prompt:**
+- **System 1:** 8/10. This response logically deduces that Xiao Ming's pet might not be a cat based on the given premise, while also wisely noting the limitation of this deduction. It's a cautious yet accurate conclusion.
+- **System 2:** 8/10. Similar to System 1, this response correctly deduces the possibility that Xiao Ming's pet is not a cat, based on the premise. It's clear and logical, though it doesn't acknowledge the potential exceptions as System 1 does.
+
+**No.6 Prompt:**
+- **System 1:** 8/10. This response accurately describes Fermat's Last Theorem and why it was difficult to prove. It correctly mentions the involvement of higher-dimensional integer solutions and the use of complex mathematical tools, highlighting Andrew Wiles' role in its proof. However, it could provide more detail on the specific mathematical concepts involved.
+- **System 2:** 6/10. The explanation of Fermat's Last Theorem is generally correct but contains an error in stating that the theorem involves "positive integer solutions (a, b, c) where a, b, and c are distinct prime numbers." This is not part of the theorem. The response correctly identifies the complexity and the involvement of advanced mathematical concepts in the proof but inaccurately links the theorem to the Riemann Hypothesis and other unsolved problems.
+
+**No.7 Prompt:**
+- **System 1:** 9/10. This response provides a comprehensive overview of cutting-edge AI technologies. It covers a wide range of areas such as deep learning, reinforcement learning, GANs, natural language generation, image generation, TTS, machine translation, autonomous driving, AI in healthcare, and quantum computing. Each technology is briefly but effectively described.
+- **System 2:** 8/10. This response also lists several key areas of advanced AI technologies, including deep learning, reinforcement learning, transfer learning, self-supervised learning, GANs, and quantum computing. However, it is slightly less detailed compared to System 1, especially in explaining the applications and implications of these technologies.
+
+**No.8 Prompt:**
+- **System 1:** 7/10. The response identifies several key differences in dining etiquette between Chinese and Western cultures, such as seating posture, utensil usage, drinking habits, and conversation styles. However, some of the points, like the specific way of sitting or the way of drinking water, are not universally representative of either culture.
+- **System 2:** 6/10. The response correctly identifies differences in utensil usage and mealtime etiquette between Chinese and Western cultures. However, the description of utensil order in Chinese dining and the ritualistic behaviors before eating in Western culture are not entirely accurate, leading to a lower score for accuracy.
+
+**No.9 Prompt:**
+- **System 1:** 8/10. This poetic description captures the essence of spring beautifully with vivid imagery and rhythmic structure. The use of repetition in the first and last stanzas adds a nice touch, although it might be seen as slightly redundant.
+- **System 2:** 9/10. The poem effectively describes the scenery of spring with concise and evocative language, creating a vivid image of the season. It uses a variety of imagery and maintains a consistent rhyme scheme, resulting in a more impactful and cohesive poem.
+
+**No.10 Prompt:**
+- **System 1:** 8/10. This response provides a clear and concise explanation of utilitarianism and its application in modern society. It appropriately discusses its influence on policy-making, business decisions, and moral judgments. However, the explanation could be enhanced with more examples or a deeper exploration of the challenges in applying utilitarian principles.
+- **System 2:** 7/10. The response accurately defines utilitarianism and discusses its applications in government decision-making, business, and personal behavior. However, it lacks depth in exploring the nuances and potential criticisms of utilitarianism, particularly in its application in complex real-world scenarios.
+
+**No.11 Prompt:**
+- **System 1:** 9/10. This explanation of the chi-square test for independence is clear, accurate, and well-structured. It correctly outlines the steps for performing the test, from data collection to result interpretation. The caution about the test's limitations with small sample sizes and its applicability only to categorical variables is a valuable addition.
+- **System 2:** 8/10. The response correctly describes the chi-square test, including determining categories, constructing expected frequency matrices, and calculating the chi-square statistic. However, it slightly misrepresents the need for normal distribution conformity, as the chi-square test applies to categorical data, not necessarily normal distributions. Also, the explanation is a bit less detailed than System 1 in terms of the test's application and interpretation steps.
+
+**No.12 Prompt:**
+- **System 1:** 9/10. The provided Python function correctly implements the task of summing even numbers in a list. The example and output demonstrate the function's correct usage and results. The explanation is clear and the code is well-formatted, making it easy to understand and use.
+- **System 2:** 9/10. This response also correctly provides a Python function to sum even numbers in a list. The function is accurately implemented, and the included example demonstrates its correct application. The explanation is succinct and to the point, effectively communicating the function's purpose and usage.
+
+**No.13 Prompt:**
+- **System 1:** 9/10. The response correctly applies Newton's Second Law of Motion to calculate the acceleration of an object given its mass and the force applied. The explanation is clear, and the step-by-step calculation is accurate, leading to the correct answer of 4 m/s² for the object's acceleration.
+- **System 2:** 9/10. This response also correctly calculates the acceleration of an object using Newton's Second Law, providing a straightforward and accurate explanation. The use of the formula and the calculation steps are clear and concise, leading to the correct result of 4 m/s² for acceleration.
+
+**No.14 Prompt:**
+- **System 1:** 9/10. The response accurately balances the chemical equation for the combustion of propane (C3H8). It correctly identifies the need to balance each element (carbon, hydrogen, and oxygen) and provides a clear, step-by-step process for doing so. The final balanced equation is correct and well-explained.
+- **System 2:** 5/10. This response provides an incorrect approach to balancing the chemical equation. The initial assessment of the atomic composition and the subsequent calculations are flawed, leading to an incorrect balanced equation. The response fails to accurately apply the principles of chemical balancing and stoichiometry.
+
+**No.15 Prompt:**
+- **System 1:** 6/10. This explanation of the Internal Rate of Return (IRR) has some inaccuracies. The method of calculating IRR using different discount rates and aiming for a net present value of zero is correct. However, the example provided is misleading. The example calculation seems to misunderstand the concept of discounting cash flows and incorrectly applies percentages to future cash flows. The IRR is typically found using financial calculators or software as it involves solving a polynomial equation.
+- **System 2:** 7/10. The description of IRR is generally correct, emphasizing its role in equating the net present value to zero. The basic concepts are well explained, including cash flow forecasts, discount rates, and NPV. However, the example provided is not entirely clear in demonstrating how to calculate the IRR. It simplifies the calculation process and does not accurately reflect the complexity involved in finding the IRR for a given set of cash flows.
+
+**No.16 Prompt:**
+- **System 1:** 8/10. The provided Caesar cipher text and its decryption method are correct. The decryption method described is accurate for a Caesar cipher with a shift of 3, and the explanation is clear and straightforward. However, the additional step of converting uppercase to lowercase letters is unnecessary for the decryption process itself and might lead to confusion, as the Caesar cipher traditionally does not distinguish between letter cases.
+- **System 2:** 8/10. This response also correctly explains the Caesar cipher and its decryption method. The given example demonstrates how to decrypt a text encrypted with a Caesar cipher by shifting each letter by the predetermined number of positions. The explanation is clear and directly applicable to the Caesar cipher. However, the explanation of mapping "khoor" to "e" then "h" is slightly confusing, as "k" directly maps to "h" with a backward shift of 3 positions.
+
+**No.17 Prompt:**
+- **System 1:** 7/10. The response provides an accurate explanation of the Fibonacci sequence and an appropriate algorithm to generate the first 10 numbers. The iterative approach used in the provided Python code is efficient and clear. However, the algorithm could be optimized to avoid recalculating the sequence for each number up to 10.
+- **System 2:** 5/10. The explanation of the Fibonacci sequence is correct, but the provided algorithm uses a recursive approach, which is highly inefficient for calculating the first 10 Fibonacci numbers. Recursion in this case leads to many redundant calculations, making it less suitable for larger numbers. The approach lacks optimization and is not practical for generating a simple sequence of Fibonacci numbers.
+
+**No.18 Prompt:**
+- **System 1:** 9/10. This response correctly outlines the use of the Haversine formula to calculate the great-circle distance between two points given their latitude and longitude. The provided Python code accurately implements the formula and includes the necessary mathematical functions. The explanation is clear, and the step-by-step process is well presented. The only minor issue is the lack of an explicit mention that the Haversine formula assumes a spherical Earth, which can introduce small errors for precise distance measurements.
+- **System 2:** 3/10. The formula provided by this system for calculating the great-circle distance between two points based on latitude and longitude is incorrect. The formula `distance = 2 * asin(sqrt((x1 - x2)**2 + (y1 - y2)**2))` does not correctly apply to spherical geometry and does not account for the Earth's curvature. This response fails to accurately represent the method for calculating the great-circle distance.
+
+**No.19 Prompt:**
+- **System 1:** 9/10. This response accurately explains Ohm's Law, including the relationship between voltage (V), current (I), and resistance (R). The example provided is practical and effectively demonstrates how to apply Ohm's Law to calculate the current in a circuit. The explanation is clear and accurate, making it easy to understand how to use Ohm's Law in practical scenarios.
+- **System 2:** 9/10. Similar to System 1, this response also correctly explains Ohm's Law and its application in calculating current in a circuit. The given example is relevant and demonstrates the practical application of the law. The explanation is well-structured and clear, making the concept accessible for those unfamiliar with electrical principles.
+
+**No.20 Prompt:**
+- **System 1:** 8/10. This explanation of the t-test for comparing the means of two independent samples is thorough and accurate. It correctly outlines the steps involved in conducting a t-test, including hypothesis setup, calculation of the t-statistic, and interpretation of results. The explanation of when to use the t-test is also appropriate, highlighting scenarios where it is applicable. However, a more detailed explanation of the assumptions underlying the t-test, such as the assumption of normal distribution and equal variances, could enhance the response.
+- **System 2:** 8/10. This response also provides a correct and comprehensive explanation of the t-test, including the steps involved in performing the test and interpreting its results. The formula for the t-statistic is accurately presented, and the explanation of how to use the t-test is clear. Similar to System 1, it could benefit from a more detailed discussion on the assumptions of the t-test, such as normal distribution and homogeneity of variances, which are crucial for the test's validity.