0Strelitzia2 commited on Apr 18

Commit

23cf7a0

verified ·

1 Parent(s): c46c828

Upload folder using huggingface_hub

Browse files

Files changed (49) hide show

README.md +202 -0
adapter_config.json +34 -0
adapter_model.safetensors +3 -0
checkpoint-225/README.md +202 -0
checkpoint-225/adapter_config.json +34 -0
checkpoint-225/adapter_model.safetensors +3 -0
checkpoint-225/optimizer.pt +3 -0
checkpoint-225/rng_state.pth +3 -0
checkpoint-225/scheduler.pt +3 -0
checkpoint-225/trainer_state.json +420 -0
checkpoint-225/training_args.bin +3 -0
checkpoint-400/README.md +202 -0
checkpoint-400/adapter_config.json +34 -0
checkpoint-400/adapter_model.safetensors +3 -0
checkpoint-400/optimizer.pt +3 -0
checkpoint-400/rng_state.pth +3 -0
checkpoint-400/scheduler.pt +3 -0
checkpoint-400/trainer_state.json +721 -0
checkpoint-400/training_args.bin +3 -0
checkpoint-425/README.md +202 -0
checkpoint-425/adapter_config.json +34 -0
checkpoint-425/adapter_model.safetensors +3 -0
checkpoint-425/optimizer.pt +3 -0
checkpoint-425/rng_state.pth +3 -0
checkpoint-425/scheduler.pt +3 -0
checkpoint-425/trainer_state.json +764 -0
checkpoint-425/training_args.bin +3 -0
checkpoint-450/README.md +202 -0
checkpoint-450/adapter_config.json +34 -0
checkpoint-450/adapter_model.safetensors +3 -0
checkpoint-450/optimizer.pt +3 -0
checkpoint-450/rng_state.pth +3 -0
checkpoint-450/scheduler.pt +3 -0
checkpoint-450/trainer_state.json +807 -0
checkpoint-450/training_args.bin +3 -0
checkpoint-464/README.md +202 -0
checkpoint-464/adapter_config.json +34 -0
checkpoint-464/adapter_model.safetensors +3 -0
checkpoint-464/optimizer.pt +3 -0
checkpoint-464/rng_state.pth +3 -0
checkpoint-464/scheduler.pt +3 -0
checkpoint-464/trainer_state.json +821 -0
checkpoint-464/training_args.bin +3 -0
runs/Apr16_18-22-52_zhangshenyi2/events.out.tfevents.1744827774.zhangshenyi2.2126728.0 +3 -0
runs/Apr17_01-30-13_zhangshenyi2/events.out.tfevents.1744853415.zhangshenyi2.62022.0 +3 -0
runs/Apr17_01-35-14_zhangshenyi2/events.out.tfevents.1744853715.zhangshenyi2.62985.0 +3 -0
runs/Apr17_01-44-24_zhangshenyi2/events.out.tfevents.1744854266.zhangshenyi2.64102.0 +3 -0
runs/Apr17_01-46-00_zhangshenyi2/events.out.tfevents.1744854362.zhangshenyi2.66064.0 +3 -0
runs/Apr17_01-51-30_zhangshenyi2/events.out.tfevents.1744854691.zhangshenyi2.66986.0 +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:044a2455ec846f8369516d33e1577c0c4df34b174fd4bf70dd9b606f62335d55
+size 13648432

checkpoint-225/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

checkpoint-225/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-225/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:044a2455ec846f8369516d33e1577c0c4df34b174fd4bf70dd9b606f62335d55
+size 13648432

checkpoint-225/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c557b8c9a187d4fb4a828475594be5d4a4c2514265a2fc63ad1aaa28df036056
+size 27370618

checkpoint-225/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0fa11fcbb8879930035bf3a359bee7c7e0ca3f75560dd599127db8adf8b4fc46
+size 14244

checkpoint-225/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0be1e22b702d2c7cac531853da4319297d984a7c4b0d731de88c8a95ed721ba
+size 1064

checkpoint-225/trainer_state.json ADDED Viewed

	@@ -0,0 +1,420 @@

+{
+  "best_metric": 2.3685686588287354,
+  "best_model_checkpoint": "/mnt/data/computer_design/lora_checkpoints/DeepSeek-R1-Distill-Llama-8B__news-summarizer-noreason__ral_8_16_0.0003_8/checkpoint-225",
+  "epoch": 3.84,
+  "eval_steps": 25,
+  "global_step": 225,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 5.31721305847168,
+      "learning_rate": 0.00015,
+      "loss": 3.5245,
+      "step": 5
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 2.912184476852417,
+      "learning_rate": 0.0003,
+      "loss": 2.9075,
+      "step": 10
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 1.5293394327163696,
+      "learning_rate": 0.00029669603524229074,
+      "loss": 2.6626,
+      "step": 15
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 1.3206167221069336,
+      "learning_rate": 0.0002933920704845815,
+      "loss": 2.5962,
+      "step": 20
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 1.2503535747528076,
+      "learning_rate": 0.0002900881057268722,
+      "loss": 2.5323,
+      "step": 25
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "eval_loss": 2.5280094146728516,
+      "eval_runtime": 50.7358,
+      "eval_samples_per_second": 9.855,
+      "eval_steps_per_second": 1.242,
+      "step": 25
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 1.1334757804870605,
+      "learning_rate": 0.000286784140969163,
+      "loss": 2.5021,
+      "step": 30
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 1.1564419269561768,
+      "learning_rate": 0.0002834801762114537,
+      "loss": 2.4867,
+      "step": 35
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 1.0658727884292603,
+      "learning_rate": 0.00028017621145374447,
+      "loss": 2.4791,
+      "step": 40
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.071118950843811,
+      "learning_rate": 0.0002768722466960352,
+      "loss": 2.4294,
+      "step": 45
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 1.1074410676956177,
+      "learning_rate": 0.00027356828193832595,
+      "loss": 2.4526,
+      "step": 50
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "eval_loss": 2.43389892578125,
+      "eval_runtime": 50.8571,
+      "eval_samples_per_second": 9.831,
+      "eval_steps_per_second": 1.239,
+      "step": 50
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 1.0138615369796753,
+      "learning_rate": 0.0002702643171806167,
+      "loss": 2.4296,
+      "step": 55
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.9914734959602356,
+      "learning_rate": 0.0002669603524229075,
+      "loss": 2.3865,
+      "step": 60
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.9485092759132385,
+      "learning_rate": 0.0002636563876651982,
+      "loss": 2.3592,
+      "step": 65
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 1.032936692237854,
+      "learning_rate": 0.00026035242290748897,
+      "loss": 2.3762,
+      "step": 70
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.978344738483429,
+      "learning_rate": 0.00025704845814977973,
+      "loss": 2.3715,
+      "step": 75
+    },
+    {
+      "epoch": 1.28,
+      "eval_loss": 2.4061355590820312,
+      "eval_runtime": 53.5326,
+      "eval_samples_per_second": 9.34,
+      "eval_steps_per_second": 1.177,
+      "step": 75
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 1.0429058074951172,
+      "learning_rate": 0.00025374449339207045,
+      "loss": 2.3519,
+      "step": 80
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 1.0109028816223145,
+      "learning_rate": 0.0002504405286343612,
+      "loss": 2.3698,
+      "step": 85
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 1.0443379878997803,
+      "learning_rate": 0.00024713656387665193,
+      "loss": 2.3652,
+      "step": 90
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.9977161884307861,
+      "learning_rate": 0.0002438325991189427,
+      "loss": 2.3799,
+      "step": 95
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.9699886441230774,
+      "learning_rate": 0.00024052863436123346,
+      "loss": 2.3437,
+      "step": 100
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "eval_loss": 2.392068386077881,
+      "eval_runtime": 50.7099,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 100
+    },
+    {
+      "epoch": 1.792,
+      "grad_norm": 1.0068645477294922,
+      "learning_rate": 0.00023722466960352423,
+      "loss": 2.3396,
+      "step": 105
+    },
+    {
+      "epoch": 1.8773333333333333,
+      "grad_norm": 0.9659402966499329,
+      "learning_rate": 0.00023392070484581494,
+      "loss": 2.3443,
+      "step": 110
+    },
+    {
+      "epoch": 1.9626666666666668,
+      "grad_norm": 0.9416194558143616,
+      "learning_rate": 0.0002306167400881057,
+      "loss": 2.3458,
+      "step": 115
+    },
+    {
+      "epoch": 2.048,
+      "grad_norm": 0.9857434630393982,
+      "learning_rate": 0.00022731277533039645,
+      "loss": 2.2908,
+      "step": 120
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "grad_norm": 0.9868885278701782,
+      "learning_rate": 0.00022400881057268722,
+      "loss": 2.2919,
+      "step": 125
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "eval_loss": 2.3825137615203857,
+      "eval_runtime": 50.7136,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 125
+    },
+    {
+      "epoch": 2.2186666666666666,
+      "grad_norm": 0.9239076972007751,
+      "learning_rate": 0.00022070484581497796,
+      "loss": 2.3055,
+      "step": 130
+    },
+    {
+      "epoch": 2.304,
+      "grad_norm": 0.9522895216941833,
+      "learning_rate": 0.0002174008810572687,
+      "loss": 2.319,
+      "step": 135
+    },
+    {
+      "epoch": 2.389333333333333,
+      "grad_norm": 0.989910900592804,
+      "learning_rate": 0.00021409691629955944,
+      "loss": 2.2679,
+      "step": 140
+    },
+    {
+      "epoch": 2.474666666666667,
+      "grad_norm": 1.0279978513717651,
+      "learning_rate": 0.0002107929515418502,
+      "loss": 2.309,
+      "step": 145
+    },
+    {
+      "epoch": 2.56,
+      "grad_norm": 0.9677265286445618,
+      "learning_rate": 0.00020748898678414097,
+      "loss": 2.2834,
+      "step": 150
+    },
+    {
+      "epoch": 2.56,
+      "eval_loss": 2.3774726390838623,
+      "eval_runtime": 50.6978,
+      "eval_samples_per_second": 9.862,
+      "eval_steps_per_second": 1.243,
+      "step": 150
+    },
+    {
+      "epoch": 2.6453333333333333,
+      "grad_norm": 0.9602519869804382,
+      "learning_rate": 0.0002041850220264317,
+      "loss": 2.3044,
+      "step": 155
+    },
+    {
+      "epoch": 2.7306666666666666,
+      "grad_norm": 0.9305415153503418,
+      "learning_rate": 0.00020088105726872246,
+      "loss": 2.2996,
+      "step": 160
+    },
+    {
+      "epoch": 2.816,
+      "grad_norm": 0.9666855931282043,
+      "learning_rate": 0.0001975770925110132,
+      "loss": 2.2807,
+      "step": 165
+    },
+    {
+      "epoch": 2.9013333333333335,
+      "grad_norm": 1.0196256637573242,
+      "learning_rate": 0.00019427312775330396,
+      "loss": 2.2948,
+      "step": 170
+    },
+    {
+      "epoch": 2.986666666666667,
+      "grad_norm": 0.9804545044898987,
+      "learning_rate": 0.00019096916299559468,
+      "loss": 2.3319,
+      "step": 175
+    },
+    {
+      "epoch": 2.986666666666667,
+      "eval_loss": 2.3707964420318604,
+      "eval_runtime": 50.7119,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 175
+    },
+    {
+      "epoch": 3.072,
+      "grad_norm": 0.9667485356330872,
+      "learning_rate": 0.00018766519823788544,
+      "loss": 2.2698,
+      "step": 180
+    },
+    {
+      "epoch": 3.1573333333333333,
+      "grad_norm": 0.9869656562805176,
+      "learning_rate": 0.00018436123348017618,
+      "loss": 2.2443,
+      "step": 185
+    },
+    {
+      "epoch": 3.2426666666666666,
+      "grad_norm": 0.9679750204086304,
+      "learning_rate": 0.00018105726872246695,
+      "loss": 2.2636,
+      "step": 190
+    },
+    {
+      "epoch": 3.328,
+      "grad_norm": 0.9996704459190369,
+      "learning_rate": 0.00017775330396475772,
+      "loss": 2.2536,
+      "step": 195
+    },
+    {
+      "epoch": 3.413333333333333,
+      "grad_norm": 0.9564487338066101,
+      "learning_rate": 0.00017444933920704843,
+      "loss": 2.2404,
+      "step": 200
+    },
+    {
+      "epoch": 3.413333333333333,
+      "eval_loss": 2.3742570877075195,
+      "eval_runtime": 50.7116,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 200
+    },
+    {
+      "epoch": 3.498666666666667,
+      "grad_norm": 1.0057621002197266,
+      "learning_rate": 0.0001711453744493392,
+      "loss": 2.2888,
+      "step": 205
+    },
+    {
+      "epoch": 3.584,
+      "grad_norm": 1.0336802005767822,
+      "learning_rate": 0.00016784140969162994,
+      "loss": 2.2946,
+      "step": 210
+    },
+    {
+      "epoch": 3.6693333333333333,
+      "grad_norm": 1.0010818243026733,
+      "learning_rate": 0.0001645374449339207,
+      "loss": 2.2531,
+      "step": 215
+    },
+    {
+      "epoch": 3.7546666666666666,
+      "grad_norm": 0.9891405701637268,
+      "learning_rate": 0.00016123348017621142,
+      "loss": 2.2336,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "grad_norm": 0.9514368176460266,
+      "learning_rate": 0.0001579295154185022,
+      "loss": 2.238,
+      "step": 225
+    },
+    {
+      "epoch": 3.84,
+      "eval_loss": 2.3685686588287354,
+      "eval_runtime": 50.7197,
+      "eval_samples_per_second": 9.858,
+      "eval_steps_per_second": 1.242,
+      "step": 225
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 464,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 8,
+  "save_steps": 25,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.321446050824192e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-225/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e5e28485a7b3a1a3db706bc20a8e6c9dd73d2112d08db67065b84e6004e139f
+size 5432

checkpoint-400/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

checkpoint-400/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-400/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b7a5bbab0a581d13e2801ec7647eb08194a722978353977a84f2bd07fa75e80a
+size 13648432

checkpoint-400/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26224ed20060818d4896b6a211c4edd4bdb50861cc4af2e1bcd66c5a008b8e30
+size 27370618

checkpoint-400/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:26a13de9068ea4a3014ef1f284200ad07dbd62251799cbcf83a15de395da3f90
+size 14244

checkpoint-400/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2b1ede1762adf36f916a660cffac568622479412abbf183837388077f5bab8a8
+size 1064

checkpoint-400/trainer_state.json ADDED Viewed

	@@ -0,0 +1,721 @@

+{
+  "best_metric": 2.3685686588287354,
+  "best_model_checkpoint": "/mnt/data/computer_design/lora_checkpoints/DeepSeek-R1-Distill-Llama-8B__news-summarizer-noreason__ral_8_16_0.0003_8/checkpoint-225",
+  "epoch": 6.826666666666666,
+  "eval_steps": 25,
+  "global_step": 400,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 5.31721305847168,
+      "learning_rate": 0.00015,
+      "loss": 3.5245,
+      "step": 5
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 2.912184476852417,
+      "learning_rate": 0.0003,
+      "loss": 2.9075,
+      "step": 10
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 1.5293394327163696,
+      "learning_rate": 0.00029669603524229074,
+      "loss": 2.6626,
+      "step": 15
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 1.3206167221069336,
+      "learning_rate": 0.0002933920704845815,
+      "loss": 2.5962,
+      "step": 20
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 1.2503535747528076,
+      "learning_rate": 0.0002900881057268722,
+      "loss": 2.5323,
+      "step": 25
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "eval_loss": 2.5280094146728516,
+      "eval_runtime": 50.7358,
+      "eval_samples_per_second": 9.855,
+      "eval_steps_per_second": 1.242,
+      "step": 25
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 1.1334757804870605,
+      "learning_rate": 0.000286784140969163,
+      "loss": 2.5021,
+      "step": 30
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 1.1564419269561768,
+      "learning_rate": 0.0002834801762114537,
+      "loss": 2.4867,
+      "step": 35
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 1.0658727884292603,
+      "learning_rate": 0.00028017621145374447,
+      "loss": 2.4791,
+      "step": 40
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.071118950843811,
+      "learning_rate": 0.0002768722466960352,
+      "loss": 2.4294,
+      "step": 45
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 1.1074410676956177,
+      "learning_rate": 0.00027356828193832595,
+      "loss": 2.4526,
+      "step": 50
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "eval_loss": 2.43389892578125,
+      "eval_runtime": 50.8571,
+      "eval_samples_per_second": 9.831,
+      "eval_steps_per_second": 1.239,
+      "step": 50
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 1.0138615369796753,
+      "learning_rate": 0.0002702643171806167,
+      "loss": 2.4296,
+      "step": 55
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.9914734959602356,
+      "learning_rate": 0.0002669603524229075,
+      "loss": 2.3865,
+      "step": 60
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.9485092759132385,
+      "learning_rate": 0.0002636563876651982,
+      "loss": 2.3592,
+      "step": 65
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 1.032936692237854,
+      "learning_rate": 0.00026035242290748897,
+      "loss": 2.3762,
+      "step": 70
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.978344738483429,
+      "learning_rate": 0.00025704845814977973,
+      "loss": 2.3715,
+      "step": 75
+    },
+    {
+      "epoch": 1.28,
+      "eval_loss": 2.4061355590820312,
+      "eval_runtime": 53.5326,
+      "eval_samples_per_second": 9.34,
+      "eval_steps_per_second": 1.177,
+      "step": 75
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 1.0429058074951172,
+      "learning_rate": 0.00025374449339207045,
+      "loss": 2.3519,
+      "step": 80
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 1.0109028816223145,
+      "learning_rate": 0.0002504405286343612,
+      "loss": 2.3698,
+      "step": 85
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 1.0443379878997803,
+      "learning_rate": 0.00024713656387665193,
+      "loss": 2.3652,
+      "step": 90
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.9977161884307861,
+      "learning_rate": 0.0002438325991189427,
+      "loss": 2.3799,
+      "step": 95
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.9699886441230774,
+      "learning_rate": 0.00024052863436123346,
+      "loss": 2.3437,
+      "step": 100
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "eval_loss": 2.392068386077881,
+      "eval_runtime": 50.7099,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 100
+    },
+    {
+      "epoch": 1.792,
+      "grad_norm": 1.0068645477294922,
+      "learning_rate": 0.00023722466960352423,
+      "loss": 2.3396,
+      "step": 105
+    },
+    {
+      "epoch": 1.8773333333333333,
+      "grad_norm": 0.9659402966499329,
+      "learning_rate": 0.00023392070484581494,
+      "loss": 2.3443,
+      "step": 110
+    },
+    {
+      "epoch": 1.9626666666666668,
+      "grad_norm": 0.9416194558143616,
+      "learning_rate": 0.0002306167400881057,
+      "loss": 2.3458,
+      "step": 115
+    },
+    {
+      "epoch": 2.048,
+      "grad_norm": 0.9857434630393982,
+      "learning_rate": 0.00022731277533039645,
+      "loss": 2.2908,
+      "step": 120
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "grad_norm": 0.9868885278701782,
+      "learning_rate": 0.00022400881057268722,
+      "loss": 2.2919,
+      "step": 125
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "eval_loss": 2.3825137615203857,
+      "eval_runtime": 50.7136,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 125
+    },
+    {
+      "epoch": 2.2186666666666666,
+      "grad_norm": 0.9239076972007751,
+      "learning_rate": 0.00022070484581497796,
+      "loss": 2.3055,
+      "step": 130
+    },
+    {
+      "epoch": 2.304,
+      "grad_norm": 0.9522895216941833,
+      "learning_rate": 0.0002174008810572687,
+      "loss": 2.319,
+      "step": 135
+    },
+    {
+      "epoch": 2.389333333333333,
+      "grad_norm": 0.989910900592804,
+      "learning_rate": 0.00021409691629955944,
+      "loss": 2.2679,
+      "step": 140
+    },
+    {
+      "epoch": 2.474666666666667,
+      "grad_norm": 1.0279978513717651,
+      "learning_rate": 0.0002107929515418502,
+      "loss": 2.309,
+      "step": 145
+    },
+    {
+      "epoch": 2.56,
+      "grad_norm": 0.9677265286445618,
+      "learning_rate": 0.00020748898678414097,
+      "loss": 2.2834,
+      "step": 150
+    },
+    {
+      "epoch": 2.56,
+      "eval_loss": 2.3774726390838623,
+      "eval_runtime": 50.6978,
+      "eval_samples_per_second": 9.862,
+      "eval_steps_per_second": 1.243,
+      "step": 150
+    },
+    {
+      "epoch": 2.6453333333333333,
+      "grad_norm": 0.9602519869804382,
+      "learning_rate": 0.0002041850220264317,
+      "loss": 2.3044,
+      "step": 155
+    },
+    {
+      "epoch": 2.7306666666666666,
+      "grad_norm": 0.9305415153503418,
+      "learning_rate": 0.00020088105726872246,
+      "loss": 2.2996,
+      "step": 160
+    },
+    {
+      "epoch": 2.816,
+      "grad_norm": 0.9666855931282043,
+      "learning_rate": 0.0001975770925110132,
+      "loss": 2.2807,
+      "step": 165
+    },
+    {
+      "epoch": 2.9013333333333335,
+      "grad_norm": 1.0196256637573242,
+      "learning_rate": 0.00019427312775330396,
+      "loss": 2.2948,
+      "step": 170
+    },
+    {
+      "epoch": 2.986666666666667,
+      "grad_norm": 0.9804545044898987,
+      "learning_rate": 0.00019096916299559468,
+      "loss": 2.3319,
+      "step": 175
+    },
+    {
+      "epoch": 2.986666666666667,
+      "eval_loss": 2.3707964420318604,
+      "eval_runtime": 50.7119,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 175
+    },
+    {
+      "epoch": 3.072,
+      "grad_norm": 0.9667485356330872,
+      "learning_rate": 0.00018766519823788544,
+      "loss": 2.2698,
+      "step": 180
+    },
+    {
+      "epoch": 3.1573333333333333,
+      "grad_norm": 0.9869656562805176,
+      "learning_rate": 0.00018436123348017618,
+      "loss": 2.2443,
+      "step": 185
+    },
+    {
+      "epoch": 3.2426666666666666,
+      "grad_norm": 0.9679750204086304,
+      "learning_rate": 0.00018105726872246695,
+      "loss": 2.2636,
+      "step": 190
+    },
+    {
+      "epoch": 3.328,
+      "grad_norm": 0.9996704459190369,
+      "learning_rate": 0.00017775330396475772,
+      "loss": 2.2536,
+      "step": 195
+    },
+    {
+      "epoch": 3.413333333333333,
+      "grad_norm": 0.9564487338066101,
+      "learning_rate": 0.00017444933920704843,
+      "loss": 2.2404,
+      "step": 200
+    },
+    {
+      "epoch": 3.413333333333333,
+      "eval_loss": 2.3742570877075195,
+      "eval_runtime": 50.7116,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 200
+    },
+    {
+      "epoch": 3.498666666666667,
+      "grad_norm": 1.0057621002197266,
+      "learning_rate": 0.0001711453744493392,
+      "loss": 2.2888,
+      "step": 205
+    },
+    {
+      "epoch": 3.584,
+      "grad_norm": 1.0336802005767822,
+      "learning_rate": 0.00016784140969162994,
+      "loss": 2.2946,
+      "step": 210
+    },
+    {
+      "epoch": 3.6693333333333333,
+      "grad_norm": 1.0010818243026733,
+      "learning_rate": 0.0001645374449339207,
+      "loss": 2.2531,
+      "step": 215
+    },
+    {
+      "epoch": 3.7546666666666666,
+      "grad_norm": 0.9891405701637268,
+      "learning_rate": 0.00016123348017621142,
+      "loss": 2.2336,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "grad_norm": 0.9514368176460266,
+      "learning_rate": 0.0001579295154185022,
+      "loss": 2.238,
+      "step": 225
+    },
+    {
+      "epoch": 3.84,
+      "eval_loss": 2.3685686588287354,
+      "eval_runtime": 50.7197,
+      "eval_samples_per_second": 9.858,
+      "eval_steps_per_second": 1.242,
+      "step": 225
+    },
+    {
+      "epoch": 3.9253333333333336,
+      "grad_norm": 1.0048863887786865,
+      "learning_rate": 0.00015462555066079293,
+      "loss": 2.2638,
+      "step": 230
+    },
+    {
+      "epoch": 4.010666666666666,
+      "grad_norm": 0.9555865526199341,
+      "learning_rate": 0.0001513215859030837,
+      "loss": 2.2523,
+      "step": 235
+    },
+    {
+      "epoch": 4.096,
+      "grad_norm": 1.01077401638031,
+      "learning_rate": 0.00014801762114537444,
+      "loss": 2.2247,
+      "step": 240
+    },
+    {
+      "epoch": 4.181333333333333,
+      "grad_norm": 0.9413540959358215,
+      "learning_rate": 0.00014471365638766518,
+      "loss": 2.216,
+      "step": 245
+    },
+    {
+      "epoch": 4.266666666666667,
+      "grad_norm": 1.0012569427490234,
+      "learning_rate": 0.00014140969162995594,
+      "loss": 2.223,
+      "step": 250
+    },
+    {
+      "epoch": 4.266666666666667,
+      "eval_loss": 2.373790740966797,
+      "eval_runtime": 50.706,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 250
+    },
+    {
+      "epoch": 4.352,
+      "grad_norm": 0.9957796335220337,
+      "learning_rate": 0.00013810572687224668,
+      "loss": 2.2367,
+      "step": 255
+    },
+    {
+      "epoch": 4.437333333333333,
+      "grad_norm": 1.013082504272461,
+      "learning_rate": 0.00013480176211453743,
+      "loss": 2.2146,
+      "step": 260
+    },
+    {
+      "epoch": 4.522666666666667,
+      "grad_norm": 1.0343190431594849,
+      "learning_rate": 0.0001314977973568282,
+      "loss": 2.2362,
+      "step": 265
+    },
+    {
+      "epoch": 4.608,
+      "grad_norm": 1.0079319477081299,
+      "learning_rate": 0.00012819383259911893,
+      "loss": 2.2182,
+      "step": 270
+    },
+    {
+      "epoch": 4.693333333333333,
+      "grad_norm": 1.0466967821121216,
+      "learning_rate": 0.00012488986784140967,
+      "loss": 2.2197,
+      "step": 275
+    },
+    {
+      "epoch": 4.693333333333333,
+      "eval_loss": 2.3711977005004883,
+      "eval_runtime": 50.7236,
+      "eval_samples_per_second": 9.857,
+      "eval_steps_per_second": 1.242,
+      "step": 275
+    },
+    {
+      "epoch": 4.778666666666666,
+      "grad_norm": 1.0417555570602417,
+      "learning_rate": 0.00012158590308370043,
+      "loss": 2.2284,
+      "step": 280
+    },
+    {
+      "epoch": 4.864,
+      "grad_norm": 1.001129150390625,
+      "learning_rate": 0.00011828193832599118,
+      "loss": 2.2411,
+      "step": 285
+    },
+    {
+      "epoch": 4.949333333333334,
+      "grad_norm": 1.0128998756408691,
+      "learning_rate": 0.00011497797356828192,
+      "loss": 2.2368,
+      "step": 290
+    },
+    {
+      "epoch": 5.034666666666666,
+      "grad_norm": 0.9789999127388,
+      "learning_rate": 0.00011167400881057268,
+      "loss": 2.2259,
+      "step": 295
+    },
+    {
+      "epoch": 5.12,
+      "grad_norm": 1.0087758302688599,
+      "learning_rate": 0.00010837004405286342,
+      "loss": 2.1643,
+      "step": 300
+    },
+    {
+      "epoch": 5.12,
+      "eval_loss": 2.370661735534668,
+      "eval_runtime": 50.7317,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 300
+    },
+    {
+      "epoch": 5.205333333333333,
+      "grad_norm": 1.0349854230880737,
+      "learning_rate": 0.00010506607929515418,
+      "loss": 2.1957,
+      "step": 305
+    },
+    {
+      "epoch": 5.290666666666667,
+      "grad_norm": 1.0541808605194092,
+      "learning_rate": 0.00010176211453744494,
+      "loss": 2.1873,
+      "step": 310
+    },
+    {
+      "epoch": 5.376,
+      "grad_norm": 1.0202800035476685,
+      "learning_rate": 9.845814977973568e-05,
+      "loss": 2.2108,
+      "step": 315
+    },
+    {
+      "epoch": 5.461333333333333,
+      "grad_norm": 1.036137342453003,
+      "learning_rate": 9.515418502202643e-05,
+      "loss": 2.1934,
+      "step": 320
+    },
+    {
+      "epoch": 5.546666666666667,
+      "grad_norm": 1.012592077255249,
+      "learning_rate": 9.185022026431717e-05,
+      "loss": 2.2055,
+      "step": 325
+    },
+    {
+      "epoch": 5.546666666666667,
+      "eval_loss": 2.372723340988159,
+      "eval_runtime": 50.7062,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 325
+    },
+    {
+      "epoch": 5.632,
+      "grad_norm": 1.0501244068145752,
+      "learning_rate": 8.854625550660793e-05,
+      "loss": 2.2097,
+      "step": 330
+    },
+    {
+      "epoch": 5.717333333333333,
+      "grad_norm": 1.0283957719802856,
+      "learning_rate": 8.524229074889867e-05,
+      "loss": 2.1996,
+      "step": 335
+    },
+    {
+      "epoch": 5.802666666666667,
+      "grad_norm": 1.001703143119812,
+      "learning_rate": 8.193832599118942e-05,
+      "loss": 2.2157,
+      "step": 340
+    },
+    {
+      "epoch": 5.888,
+      "grad_norm": 1.0345960855484009,
+      "learning_rate": 7.863436123348016e-05,
+      "loss": 2.216,
+      "step": 345
+    },
+    {
+      "epoch": 5.973333333333334,
+      "grad_norm": 1.0450265407562256,
+      "learning_rate": 7.533039647577093e-05,
+      "loss": 2.2141,
+      "step": 350
+    },
+    {
+      "epoch": 5.973333333333334,
+      "eval_loss": 2.3703668117523193,
+      "eval_runtime": 50.7172,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 350
+    },
+    {
+      "epoch": 6.058666666666666,
+      "grad_norm": 0.986889660358429,
+      "learning_rate": 7.202643171806167e-05,
+      "loss": 2.192,
+      "step": 355
+    },
+    {
+      "epoch": 6.144,
+      "grad_norm": 1.0109078884124756,
+      "learning_rate": 6.872246696035242e-05,
+      "loss": 2.1836,
+      "step": 360
+    },
+    {
+      "epoch": 6.229333333333333,
+      "grad_norm": 1.0342578887939453,
+      "learning_rate": 6.541850220264316e-05,
+      "loss": 2.1894,
+      "step": 365
+    },
+    {
+      "epoch": 6.314666666666667,
+      "grad_norm": 1.0402517318725586,
+      "learning_rate": 6.211453744493392e-05,
+      "loss": 2.1727,
+      "step": 370
+    },
+    {
+      "epoch": 6.4,
+      "grad_norm": 1.0148675441741943,
+      "learning_rate": 5.881057268722466e-05,
+      "loss": 2.1697,
+      "step": 375
+    },
+    {
+      "epoch": 6.4,
+      "eval_loss": 2.3737528324127197,
+      "eval_runtime": 50.7165,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 375
+    },
+    {
+      "epoch": 6.485333333333333,
+      "grad_norm": 1.0261683464050293,
+      "learning_rate": 5.550660792951541e-05,
+      "loss": 2.1756,
+      "step": 380
+    },
+    {
+      "epoch": 6.570666666666667,
+      "grad_norm": 1.0536677837371826,
+      "learning_rate": 5.220264317180616e-05,
+      "loss": 2.1984,
+      "step": 385
+    },
+    {
+      "epoch": 6.656,
+      "grad_norm": 1.0320463180541992,
+      "learning_rate": 4.889867841409691e-05,
+      "loss": 2.1758,
+      "step": 390
+    },
+    {
+      "epoch": 6.741333333333333,
+      "grad_norm": 1.0172383785247803,
+      "learning_rate": 4.559471365638766e-05,
+      "loss": 2.1988,
+      "step": 395
+    },
+    {
+      "epoch": 6.826666666666666,
+      "grad_norm": 1.0310728549957275,
+      "learning_rate": 4.229074889867841e-05,
+      "loss": 2.1908,
+      "step": 400
+    },
+    {
+      "epoch": 6.826666666666666,
+      "eval_loss": 2.3720057010650635,
+      "eval_runtime": 50.708,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 400
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 464,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 8,
+  "save_steps": 25,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.904792979243008e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-400/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e5e28485a7b3a1a3db706bc20a8e6c9dd73d2112d08db67065b84e6004e139f
+size 5432

checkpoint-425/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

checkpoint-425/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-425/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:686d7ef254cef6fee72adf62bed218161c9d8e2170285a69a15dc42c47e7e080
+size 13648432

checkpoint-425/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d699628e969e70fd88f73c343cc31ea7243288ea0024a02f077b0009dcedd3c9
+size 27370618

checkpoint-425/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed06860cf8d788c4ddf1ec692fcbd6cf518e0b58a06267987195322f90cc891f
+size 14244

checkpoint-425/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e31c3d5e1c3e26db9539569f6f1dc72de8abb5908a882194d238d2d89aac4f56
+size 1064

checkpoint-425/trainer_state.json ADDED Viewed

	@@ -0,0 +1,764 @@

+{
+  "best_metric": 2.3685686588287354,
+  "best_model_checkpoint": "/mnt/data/computer_design/lora_checkpoints/DeepSeek-R1-Distill-Llama-8B__news-summarizer-noreason__ral_8_16_0.0003_8/checkpoint-225",
+  "epoch": 7.253333333333333,
+  "eval_steps": 25,
+  "global_step": 425,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 5.31721305847168,
+      "learning_rate": 0.00015,
+      "loss": 3.5245,
+      "step": 5
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 2.912184476852417,
+      "learning_rate": 0.0003,
+      "loss": 2.9075,
+      "step": 10
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 1.5293394327163696,
+      "learning_rate": 0.00029669603524229074,
+      "loss": 2.6626,
+      "step": 15
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 1.3206167221069336,
+      "learning_rate": 0.0002933920704845815,
+      "loss": 2.5962,
+      "step": 20
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 1.2503535747528076,
+      "learning_rate": 0.0002900881057268722,
+      "loss": 2.5323,
+      "step": 25
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "eval_loss": 2.5280094146728516,
+      "eval_runtime": 50.7358,
+      "eval_samples_per_second": 9.855,
+      "eval_steps_per_second": 1.242,
+      "step": 25
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 1.1334757804870605,
+      "learning_rate": 0.000286784140969163,
+      "loss": 2.5021,
+      "step": 30
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 1.1564419269561768,
+      "learning_rate": 0.0002834801762114537,
+      "loss": 2.4867,
+      "step": 35
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 1.0658727884292603,
+      "learning_rate": 0.00028017621145374447,
+      "loss": 2.4791,
+      "step": 40
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.071118950843811,
+      "learning_rate": 0.0002768722466960352,
+      "loss": 2.4294,
+      "step": 45
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 1.1074410676956177,
+      "learning_rate": 0.00027356828193832595,
+      "loss": 2.4526,
+      "step": 50
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "eval_loss": 2.43389892578125,
+      "eval_runtime": 50.8571,
+      "eval_samples_per_second": 9.831,
+      "eval_steps_per_second": 1.239,
+      "step": 50
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 1.0138615369796753,
+      "learning_rate": 0.0002702643171806167,
+      "loss": 2.4296,
+      "step": 55
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.9914734959602356,
+      "learning_rate": 0.0002669603524229075,
+      "loss": 2.3865,
+      "step": 60
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.9485092759132385,
+      "learning_rate": 0.0002636563876651982,
+      "loss": 2.3592,
+      "step": 65
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 1.032936692237854,
+      "learning_rate": 0.00026035242290748897,
+      "loss": 2.3762,
+      "step": 70
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.978344738483429,
+      "learning_rate": 0.00025704845814977973,
+      "loss": 2.3715,
+      "step": 75
+    },
+    {
+      "epoch": 1.28,
+      "eval_loss": 2.4061355590820312,
+      "eval_runtime": 53.5326,
+      "eval_samples_per_second": 9.34,
+      "eval_steps_per_second": 1.177,
+      "step": 75
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 1.0429058074951172,
+      "learning_rate": 0.00025374449339207045,
+      "loss": 2.3519,
+      "step": 80
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 1.0109028816223145,
+      "learning_rate": 0.0002504405286343612,
+      "loss": 2.3698,
+      "step": 85
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 1.0443379878997803,
+      "learning_rate": 0.00024713656387665193,
+      "loss": 2.3652,
+      "step": 90
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.9977161884307861,
+      "learning_rate": 0.0002438325991189427,
+      "loss": 2.3799,
+      "step": 95
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.9699886441230774,
+      "learning_rate": 0.00024052863436123346,
+      "loss": 2.3437,
+      "step": 100
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "eval_loss": 2.392068386077881,
+      "eval_runtime": 50.7099,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 100
+    },
+    {
+      "epoch": 1.792,
+      "grad_norm": 1.0068645477294922,
+      "learning_rate": 0.00023722466960352423,
+      "loss": 2.3396,
+      "step": 105
+    },
+    {
+      "epoch": 1.8773333333333333,
+      "grad_norm": 0.9659402966499329,
+      "learning_rate": 0.00023392070484581494,
+      "loss": 2.3443,
+      "step": 110
+    },
+    {
+      "epoch": 1.9626666666666668,
+      "grad_norm": 0.9416194558143616,
+      "learning_rate": 0.0002306167400881057,
+      "loss": 2.3458,
+      "step": 115
+    },
+    {
+      "epoch": 2.048,
+      "grad_norm": 0.9857434630393982,
+      "learning_rate": 0.00022731277533039645,
+      "loss": 2.2908,
+      "step": 120
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "grad_norm": 0.9868885278701782,
+      "learning_rate": 0.00022400881057268722,
+      "loss": 2.2919,
+      "step": 125
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "eval_loss": 2.3825137615203857,
+      "eval_runtime": 50.7136,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 125
+    },
+    {
+      "epoch": 2.2186666666666666,
+      "grad_norm": 0.9239076972007751,
+      "learning_rate": 0.00022070484581497796,
+      "loss": 2.3055,
+      "step": 130
+    },
+    {
+      "epoch": 2.304,
+      "grad_norm": 0.9522895216941833,
+      "learning_rate": 0.0002174008810572687,
+      "loss": 2.319,
+      "step": 135
+    },
+    {
+      "epoch": 2.389333333333333,
+      "grad_norm": 0.989910900592804,
+      "learning_rate": 0.00021409691629955944,
+      "loss": 2.2679,
+      "step": 140
+    },
+    {
+      "epoch": 2.474666666666667,
+      "grad_norm": 1.0279978513717651,
+      "learning_rate": 0.0002107929515418502,
+      "loss": 2.309,
+      "step": 145
+    },
+    {
+      "epoch": 2.56,
+      "grad_norm": 0.9677265286445618,
+      "learning_rate": 0.00020748898678414097,
+      "loss": 2.2834,
+      "step": 150
+    },
+    {
+      "epoch": 2.56,
+      "eval_loss": 2.3774726390838623,
+      "eval_runtime": 50.6978,
+      "eval_samples_per_second": 9.862,
+      "eval_steps_per_second": 1.243,
+      "step": 150
+    },
+    {
+      "epoch": 2.6453333333333333,
+      "grad_norm": 0.9602519869804382,
+      "learning_rate": 0.0002041850220264317,
+      "loss": 2.3044,
+      "step": 155
+    },
+    {
+      "epoch": 2.7306666666666666,
+      "grad_norm": 0.9305415153503418,
+      "learning_rate": 0.00020088105726872246,
+      "loss": 2.2996,
+      "step": 160
+    },
+    {
+      "epoch": 2.816,
+      "grad_norm": 0.9666855931282043,
+      "learning_rate": 0.0001975770925110132,
+      "loss": 2.2807,
+      "step": 165
+    },
+    {
+      "epoch": 2.9013333333333335,
+      "grad_norm": 1.0196256637573242,
+      "learning_rate": 0.00019427312775330396,
+      "loss": 2.2948,
+      "step": 170
+    },
+    {
+      "epoch": 2.986666666666667,
+      "grad_norm": 0.9804545044898987,
+      "learning_rate": 0.00019096916299559468,
+      "loss": 2.3319,
+      "step": 175
+    },
+    {
+      "epoch": 2.986666666666667,
+      "eval_loss": 2.3707964420318604,
+      "eval_runtime": 50.7119,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 175
+    },
+    {
+      "epoch": 3.072,
+      "grad_norm": 0.9667485356330872,
+      "learning_rate": 0.00018766519823788544,
+      "loss": 2.2698,
+      "step": 180
+    },
+    {
+      "epoch": 3.1573333333333333,
+      "grad_norm": 0.9869656562805176,
+      "learning_rate": 0.00018436123348017618,
+      "loss": 2.2443,
+      "step": 185
+    },
+    {
+      "epoch": 3.2426666666666666,
+      "grad_norm": 0.9679750204086304,
+      "learning_rate": 0.00018105726872246695,
+      "loss": 2.2636,
+      "step": 190
+    },
+    {
+      "epoch": 3.328,
+      "grad_norm": 0.9996704459190369,
+      "learning_rate": 0.00017775330396475772,
+      "loss": 2.2536,
+      "step": 195
+    },
+    {
+      "epoch": 3.413333333333333,
+      "grad_norm": 0.9564487338066101,
+      "learning_rate": 0.00017444933920704843,
+      "loss": 2.2404,
+      "step": 200
+    },
+    {
+      "epoch": 3.413333333333333,
+      "eval_loss": 2.3742570877075195,
+      "eval_runtime": 50.7116,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 200
+    },
+    {
+      "epoch": 3.498666666666667,
+      "grad_norm": 1.0057621002197266,
+      "learning_rate": 0.0001711453744493392,
+      "loss": 2.2888,
+      "step": 205
+    },
+    {
+      "epoch": 3.584,
+      "grad_norm": 1.0336802005767822,
+      "learning_rate": 0.00016784140969162994,
+      "loss": 2.2946,
+      "step": 210
+    },
+    {
+      "epoch": 3.6693333333333333,
+      "grad_norm": 1.0010818243026733,
+      "learning_rate": 0.0001645374449339207,
+      "loss": 2.2531,
+      "step": 215
+    },
+    {
+      "epoch": 3.7546666666666666,
+      "grad_norm": 0.9891405701637268,
+      "learning_rate": 0.00016123348017621142,
+      "loss": 2.2336,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "grad_norm": 0.9514368176460266,
+      "learning_rate": 0.0001579295154185022,
+      "loss": 2.238,
+      "step": 225
+    },
+    {
+      "epoch": 3.84,
+      "eval_loss": 2.3685686588287354,
+      "eval_runtime": 50.7197,
+      "eval_samples_per_second": 9.858,
+      "eval_steps_per_second": 1.242,
+      "step": 225
+    },
+    {
+      "epoch": 3.9253333333333336,
+      "grad_norm": 1.0048863887786865,
+      "learning_rate": 0.00015462555066079293,
+      "loss": 2.2638,
+      "step": 230
+    },
+    {
+      "epoch": 4.010666666666666,
+      "grad_norm": 0.9555865526199341,
+      "learning_rate": 0.0001513215859030837,
+      "loss": 2.2523,
+      "step": 235
+    },
+    {
+      "epoch": 4.096,
+      "grad_norm": 1.01077401638031,
+      "learning_rate": 0.00014801762114537444,
+      "loss": 2.2247,
+      "step": 240
+    },
+    {
+      "epoch": 4.181333333333333,
+      "grad_norm": 0.9413540959358215,
+      "learning_rate": 0.00014471365638766518,
+      "loss": 2.216,
+      "step": 245
+    },
+    {
+      "epoch": 4.266666666666667,
+      "grad_norm": 1.0012569427490234,
+      "learning_rate": 0.00014140969162995594,
+      "loss": 2.223,
+      "step": 250
+    },
+    {
+      "epoch": 4.266666666666667,
+      "eval_loss": 2.373790740966797,
+      "eval_runtime": 50.706,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 250
+    },
+    {
+      "epoch": 4.352,
+      "grad_norm": 0.9957796335220337,
+      "learning_rate": 0.00013810572687224668,
+      "loss": 2.2367,
+      "step": 255
+    },
+    {
+      "epoch": 4.437333333333333,
+      "grad_norm": 1.013082504272461,
+      "learning_rate": 0.00013480176211453743,
+      "loss": 2.2146,
+      "step": 260
+    },
+    {
+      "epoch": 4.522666666666667,
+      "grad_norm": 1.0343190431594849,
+      "learning_rate": 0.0001314977973568282,
+      "loss": 2.2362,
+      "step": 265
+    },
+    {
+      "epoch": 4.608,
+      "grad_norm": 1.0079319477081299,
+      "learning_rate": 0.00012819383259911893,
+      "loss": 2.2182,
+      "step": 270
+    },
+    {
+      "epoch": 4.693333333333333,
+      "grad_norm": 1.0466967821121216,
+      "learning_rate": 0.00012488986784140967,
+      "loss": 2.2197,
+      "step": 275
+    },
+    {
+      "epoch": 4.693333333333333,
+      "eval_loss": 2.3711977005004883,
+      "eval_runtime": 50.7236,
+      "eval_samples_per_second": 9.857,
+      "eval_steps_per_second": 1.242,
+      "step": 275
+    },
+    {
+      "epoch": 4.778666666666666,
+      "grad_norm": 1.0417555570602417,
+      "learning_rate": 0.00012158590308370043,
+      "loss": 2.2284,
+      "step": 280
+    },
+    {
+      "epoch": 4.864,
+      "grad_norm": 1.001129150390625,
+      "learning_rate": 0.00011828193832599118,
+      "loss": 2.2411,
+      "step": 285
+    },
+    {
+      "epoch": 4.949333333333334,
+      "grad_norm": 1.0128998756408691,
+      "learning_rate": 0.00011497797356828192,
+      "loss": 2.2368,
+      "step": 290
+    },
+    {
+      "epoch": 5.034666666666666,
+      "grad_norm": 0.9789999127388,
+      "learning_rate": 0.00011167400881057268,
+      "loss": 2.2259,
+      "step": 295
+    },
+    {
+      "epoch": 5.12,
+      "grad_norm": 1.0087758302688599,
+      "learning_rate": 0.00010837004405286342,
+      "loss": 2.1643,
+      "step": 300
+    },
+    {
+      "epoch": 5.12,
+      "eval_loss": 2.370661735534668,
+      "eval_runtime": 50.7317,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 300
+    },
+    {
+      "epoch": 5.205333333333333,
+      "grad_norm": 1.0349854230880737,
+      "learning_rate": 0.00010506607929515418,
+      "loss": 2.1957,
+      "step": 305
+    },
+    {
+      "epoch": 5.290666666666667,
+      "grad_norm": 1.0541808605194092,
+      "learning_rate": 0.00010176211453744494,
+      "loss": 2.1873,
+      "step": 310
+    },
+    {
+      "epoch": 5.376,
+      "grad_norm": 1.0202800035476685,
+      "learning_rate": 9.845814977973568e-05,
+      "loss": 2.2108,
+      "step": 315
+    },
+    {
+      "epoch": 5.461333333333333,
+      "grad_norm": 1.036137342453003,
+      "learning_rate": 9.515418502202643e-05,
+      "loss": 2.1934,
+      "step": 320
+    },
+    {
+      "epoch": 5.546666666666667,
+      "grad_norm": 1.012592077255249,
+      "learning_rate": 9.185022026431717e-05,
+      "loss": 2.2055,
+      "step": 325
+    },
+    {
+      "epoch": 5.546666666666667,
+      "eval_loss": 2.372723340988159,
+      "eval_runtime": 50.7062,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 325
+    },
+    {
+      "epoch": 5.632,
+      "grad_norm": 1.0501244068145752,
+      "learning_rate": 8.854625550660793e-05,
+      "loss": 2.2097,
+      "step": 330
+    },
+    {
+      "epoch": 5.717333333333333,
+      "grad_norm": 1.0283957719802856,
+      "learning_rate": 8.524229074889867e-05,
+      "loss": 2.1996,
+      "step": 335
+    },
+    {
+      "epoch": 5.802666666666667,
+      "grad_norm": 1.001703143119812,
+      "learning_rate": 8.193832599118942e-05,
+      "loss": 2.2157,
+      "step": 340
+    },
+    {
+      "epoch": 5.888,
+      "grad_norm": 1.0345960855484009,
+      "learning_rate": 7.863436123348016e-05,
+      "loss": 2.216,
+      "step": 345
+    },
+    {
+      "epoch": 5.973333333333334,
+      "grad_norm": 1.0450265407562256,
+      "learning_rate": 7.533039647577093e-05,
+      "loss": 2.2141,
+      "step": 350
+    },
+    {
+      "epoch": 5.973333333333334,
+      "eval_loss": 2.3703668117523193,
+      "eval_runtime": 50.7172,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 350
+    },
+    {
+      "epoch": 6.058666666666666,
+      "grad_norm": 0.986889660358429,
+      "learning_rate": 7.202643171806167e-05,
+      "loss": 2.192,
+      "step": 355
+    },
+    {
+      "epoch": 6.144,
+      "grad_norm": 1.0109078884124756,
+      "learning_rate": 6.872246696035242e-05,
+      "loss": 2.1836,
+      "step": 360
+    },
+    {
+      "epoch": 6.229333333333333,
+      "grad_norm": 1.0342578887939453,
+      "learning_rate": 6.541850220264316e-05,
+      "loss": 2.1894,
+      "step": 365
+    },
+    {
+      "epoch": 6.314666666666667,
+      "grad_norm": 1.0402517318725586,
+      "learning_rate": 6.211453744493392e-05,
+      "loss": 2.1727,
+      "step": 370
+    },
+    {
+      "epoch": 6.4,
+      "grad_norm": 1.0148675441741943,
+      "learning_rate": 5.881057268722466e-05,
+      "loss": 2.1697,
+      "step": 375
+    },
+    {
+      "epoch": 6.4,
+      "eval_loss": 2.3737528324127197,
+      "eval_runtime": 50.7165,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 375
+    },
+    {
+      "epoch": 6.485333333333333,
+      "grad_norm": 1.0261683464050293,
+      "learning_rate": 5.550660792951541e-05,
+      "loss": 2.1756,
+      "step": 380
+    },
+    {
+      "epoch": 6.570666666666667,
+      "grad_norm": 1.0536677837371826,
+      "learning_rate": 5.220264317180616e-05,
+      "loss": 2.1984,
+      "step": 385
+    },
+    {
+      "epoch": 6.656,
+      "grad_norm": 1.0320463180541992,
+      "learning_rate": 4.889867841409691e-05,
+      "loss": 2.1758,
+      "step": 390
+    },
+    {
+      "epoch": 6.741333333333333,
+      "grad_norm": 1.0172383785247803,
+      "learning_rate": 4.559471365638766e-05,
+      "loss": 2.1988,
+      "step": 395
+    },
+    {
+      "epoch": 6.826666666666666,
+      "grad_norm": 1.0310728549957275,
+      "learning_rate": 4.229074889867841e-05,
+      "loss": 2.1908,
+      "step": 400
+    },
+    {
+      "epoch": 6.826666666666666,
+      "eval_loss": 2.3720057010650635,
+      "eval_runtime": 50.708,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 400
+    },
+    {
+      "epoch": 6.912,
+      "grad_norm": 1.0129923820495605,
+      "learning_rate": 3.898678414096916e-05,
+      "loss": 2.1975,
+      "step": 405
+    },
+    {
+      "epoch": 6.997333333333334,
+      "grad_norm": 0.9970852732658386,
+      "learning_rate": 3.568281938325991e-05,
+      "loss": 2.1562,
+      "step": 410
+    },
+    {
+      "epoch": 7.082666666666666,
+      "grad_norm": 1.0077489614486694,
+      "learning_rate": 3.237885462555066e-05,
+      "loss": 2.1576,
+      "step": 415
+    },
+    {
+      "epoch": 7.168,
+      "grad_norm": 1.037332534790039,
+      "learning_rate": 2.9074889867841408e-05,
+      "loss": 2.1547,
+      "step": 420
+    },
+    {
+      "epoch": 7.253333333333333,
+      "grad_norm": 1.0144597291946411,
+      "learning_rate": 2.5770925110132158e-05,
+      "loss": 2.1841,
+      "step": 425
+    },
+    {
+      "epoch": 7.253333333333333,
+      "eval_loss": 2.3749217987060547,
+      "eval_runtime": 50.7304,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 425
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 464,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 8,
+  "save_steps": 25,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.273842540445696e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-425/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e5e28485a7b3a1a3db706bc20a8e6c9dd73d2112d08db67065b84e6004e139f
+size 5432

checkpoint-450/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

checkpoint-450/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-450/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d526227700d0d06b0c39ccca13525bcc281d5e0a6dad92dcf908523da51f5bb
+size 13648432

checkpoint-450/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc3eaf1a0b4f9c19d9bbaf6f3f96682ee9f0412be36d6a51bf8c03d348630ea9
+size 27370618

checkpoint-450/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2008f7603f58cdc2d9cf44ace6ff1baeabe09eb913c4251835bca5c11265abb0
+size 14244

checkpoint-450/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d7fbc14c6828f4ded5a25b6d01c74768b0b25904b86a4ed022794422640edf1
+size 1064

checkpoint-450/trainer_state.json ADDED Viewed

	@@ -0,0 +1,807 @@

+{
+  "best_metric": 2.3685686588287354,
+  "best_model_checkpoint": "/mnt/data/computer_design/lora_checkpoints/DeepSeek-R1-Distill-Llama-8B__news-summarizer-noreason__ral_8_16_0.0003_8/checkpoint-225",
+  "epoch": 7.68,
+  "eval_steps": 25,
+  "global_step": 450,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 5.31721305847168,
+      "learning_rate": 0.00015,
+      "loss": 3.5245,
+      "step": 5
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 2.912184476852417,
+      "learning_rate": 0.0003,
+      "loss": 2.9075,
+      "step": 10
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 1.5293394327163696,
+      "learning_rate": 0.00029669603524229074,
+      "loss": 2.6626,
+      "step": 15
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 1.3206167221069336,
+      "learning_rate": 0.0002933920704845815,
+      "loss": 2.5962,
+      "step": 20
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 1.2503535747528076,
+      "learning_rate": 0.0002900881057268722,
+      "loss": 2.5323,
+      "step": 25
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "eval_loss": 2.5280094146728516,
+      "eval_runtime": 50.7358,
+      "eval_samples_per_second": 9.855,
+      "eval_steps_per_second": 1.242,
+      "step": 25
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 1.1334757804870605,
+      "learning_rate": 0.000286784140969163,
+      "loss": 2.5021,
+      "step": 30
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 1.1564419269561768,
+      "learning_rate": 0.0002834801762114537,
+      "loss": 2.4867,
+      "step": 35
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 1.0658727884292603,
+      "learning_rate": 0.00028017621145374447,
+      "loss": 2.4791,
+      "step": 40
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.071118950843811,
+      "learning_rate": 0.0002768722466960352,
+      "loss": 2.4294,
+      "step": 45
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 1.1074410676956177,
+      "learning_rate": 0.00027356828193832595,
+      "loss": 2.4526,
+      "step": 50
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "eval_loss": 2.43389892578125,
+      "eval_runtime": 50.8571,
+      "eval_samples_per_second": 9.831,
+      "eval_steps_per_second": 1.239,
+      "step": 50
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 1.0138615369796753,
+      "learning_rate": 0.0002702643171806167,
+      "loss": 2.4296,
+      "step": 55
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.9914734959602356,
+      "learning_rate": 0.0002669603524229075,
+      "loss": 2.3865,
+      "step": 60
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.9485092759132385,
+      "learning_rate": 0.0002636563876651982,
+      "loss": 2.3592,
+      "step": 65
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 1.032936692237854,
+      "learning_rate": 0.00026035242290748897,
+      "loss": 2.3762,
+      "step": 70
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.978344738483429,
+      "learning_rate": 0.00025704845814977973,
+      "loss": 2.3715,
+      "step": 75
+    },
+    {
+      "epoch": 1.28,
+      "eval_loss": 2.4061355590820312,
+      "eval_runtime": 53.5326,
+      "eval_samples_per_second": 9.34,
+      "eval_steps_per_second": 1.177,
+      "step": 75
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 1.0429058074951172,
+      "learning_rate": 0.00025374449339207045,
+      "loss": 2.3519,
+      "step": 80
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 1.0109028816223145,
+      "learning_rate": 0.0002504405286343612,
+      "loss": 2.3698,
+      "step": 85
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 1.0443379878997803,
+      "learning_rate": 0.00024713656387665193,
+      "loss": 2.3652,
+      "step": 90
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.9977161884307861,
+      "learning_rate": 0.0002438325991189427,
+      "loss": 2.3799,
+      "step": 95
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.9699886441230774,
+      "learning_rate": 0.00024052863436123346,
+      "loss": 2.3437,
+      "step": 100
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "eval_loss": 2.392068386077881,
+      "eval_runtime": 50.7099,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 100
+    },
+    {
+      "epoch": 1.792,
+      "grad_norm": 1.0068645477294922,
+      "learning_rate": 0.00023722466960352423,
+      "loss": 2.3396,
+      "step": 105
+    },
+    {
+      "epoch": 1.8773333333333333,
+      "grad_norm": 0.9659402966499329,
+      "learning_rate": 0.00023392070484581494,
+      "loss": 2.3443,
+      "step": 110
+    },
+    {
+      "epoch": 1.9626666666666668,
+      "grad_norm": 0.9416194558143616,
+      "learning_rate": 0.0002306167400881057,
+      "loss": 2.3458,
+      "step": 115
+    },
+    {
+      "epoch": 2.048,
+      "grad_norm": 0.9857434630393982,
+      "learning_rate": 0.00022731277533039645,
+      "loss": 2.2908,
+      "step": 120
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "grad_norm": 0.9868885278701782,
+      "learning_rate": 0.00022400881057268722,
+      "loss": 2.2919,
+      "step": 125
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "eval_loss": 2.3825137615203857,
+      "eval_runtime": 50.7136,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 125
+    },
+    {
+      "epoch": 2.2186666666666666,
+      "grad_norm": 0.9239076972007751,
+      "learning_rate": 0.00022070484581497796,
+      "loss": 2.3055,
+      "step": 130
+    },
+    {
+      "epoch": 2.304,
+      "grad_norm": 0.9522895216941833,
+      "learning_rate": 0.0002174008810572687,
+      "loss": 2.319,
+      "step": 135
+    },
+    {
+      "epoch": 2.389333333333333,
+      "grad_norm": 0.989910900592804,
+      "learning_rate": 0.00021409691629955944,
+      "loss": 2.2679,
+      "step": 140
+    },
+    {
+      "epoch": 2.474666666666667,
+      "grad_norm": 1.0279978513717651,
+      "learning_rate": 0.0002107929515418502,
+      "loss": 2.309,
+      "step": 145
+    },
+    {
+      "epoch": 2.56,
+      "grad_norm": 0.9677265286445618,
+      "learning_rate": 0.00020748898678414097,
+      "loss": 2.2834,
+      "step": 150
+    },
+    {
+      "epoch": 2.56,
+      "eval_loss": 2.3774726390838623,
+      "eval_runtime": 50.6978,
+      "eval_samples_per_second": 9.862,
+      "eval_steps_per_second": 1.243,
+      "step": 150
+    },
+    {
+      "epoch": 2.6453333333333333,
+      "grad_norm": 0.9602519869804382,
+      "learning_rate": 0.0002041850220264317,
+      "loss": 2.3044,
+      "step": 155
+    },
+    {
+      "epoch": 2.7306666666666666,
+      "grad_norm": 0.9305415153503418,
+      "learning_rate": 0.00020088105726872246,
+      "loss": 2.2996,
+      "step": 160
+    },
+    {
+      "epoch": 2.816,
+      "grad_norm": 0.9666855931282043,
+      "learning_rate": 0.0001975770925110132,
+      "loss": 2.2807,
+      "step": 165
+    },
+    {
+      "epoch": 2.9013333333333335,
+      "grad_norm": 1.0196256637573242,
+      "learning_rate": 0.00019427312775330396,
+      "loss": 2.2948,
+      "step": 170
+    },
+    {
+      "epoch": 2.986666666666667,
+      "grad_norm": 0.9804545044898987,
+      "learning_rate": 0.00019096916299559468,
+      "loss": 2.3319,
+      "step": 175
+    },
+    {
+      "epoch": 2.986666666666667,
+      "eval_loss": 2.3707964420318604,
+      "eval_runtime": 50.7119,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 175
+    },
+    {
+      "epoch": 3.072,
+      "grad_norm": 0.9667485356330872,
+      "learning_rate": 0.00018766519823788544,
+      "loss": 2.2698,
+      "step": 180
+    },
+    {
+      "epoch": 3.1573333333333333,
+      "grad_norm": 0.9869656562805176,
+      "learning_rate": 0.00018436123348017618,
+      "loss": 2.2443,
+      "step": 185
+    },
+    {
+      "epoch": 3.2426666666666666,
+      "grad_norm": 0.9679750204086304,
+      "learning_rate": 0.00018105726872246695,
+      "loss": 2.2636,
+      "step": 190
+    },
+    {
+      "epoch": 3.328,
+      "grad_norm": 0.9996704459190369,
+      "learning_rate": 0.00017775330396475772,
+      "loss": 2.2536,
+      "step": 195
+    },
+    {
+      "epoch": 3.413333333333333,
+      "grad_norm": 0.9564487338066101,
+      "learning_rate": 0.00017444933920704843,
+      "loss": 2.2404,
+      "step": 200
+    },
+    {
+      "epoch": 3.413333333333333,
+      "eval_loss": 2.3742570877075195,
+      "eval_runtime": 50.7116,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 200
+    },
+    {
+      "epoch": 3.498666666666667,
+      "grad_norm": 1.0057621002197266,
+      "learning_rate": 0.0001711453744493392,
+      "loss": 2.2888,
+      "step": 205
+    },
+    {
+      "epoch": 3.584,
+      "grad_norm": 1.0336802005767822,
+      "learning_rate": 0.00016784140969162994,
+      "loss": 2.2946,
+      "step": 210
+    },
+    {
+      "epoch": 3.6693333333333333,
+      "grad_norm": 1.0010818243026733,
+      "learning_rate": 0.0001645374449339207,
+      "loss": 2.2531,
+      "step": 215
+    },
+    {
+      "epoch": 3.7546666666666666,
+      "grad_norm": 0.9891405701637268,
+      "learning_rate": 0.00016123348017621142,
+      "loss": 2.2336,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "grad_norm": 0.9514368176460266,
+      "learning_rate": 0.0001579295154185022,
+      "loss": 2.238,
+      "step": 225
+    },
+    {
+      "epoch": 3.84,
+      "eval_loss": 2.3685686588287354,
+      "eval_runtime": 50.7197,
+      "eval_samples_per_second": 9.858,
+      "eval_steps_per_second": 1.242,
+      "step": 225
+    },
+    {
+      "epoch": 3.9253333333333336,
+      "grad_norm": 1.0048863887786865,
+      "learning_rate": 0.00015462555066079293,
+      "loss": 2.2638,
+      "step": 230
+    },
+    {
+      "epoch": 4.010666666666666,
+      "grad_norm": 0.9555865526199341,
+      "learning_rate": 0.0001513215859030837,
+      "loss": 2.2523,
+      "step": 235
+    },
+    {
+      "epoch": 4.096,
+      "grad_norm": 1.01077401638031,
+      "learning_rate": 0.00014801762114537444,
+      "loss": 2.2247,
+      "step": 240
+    },
+    {
+      "epoch": 4.181333333333333,
+      "grad_norm": 0.9413540959358215,
+      "learning_rate": 0.00014471365638766518,
+      "loss": 2.216,
+      "step": 245
+    },
+    {
+      "epoch": 4.266666666666667,
+      "grad_norm": 1.0012569427490234,
+      "learning_rate": 0.00014140969162995594,
+      "loss": 2.223,
+      "step": 250
+    },
+    {
+      "epoch": 4.266666666666667,
+      "eval_loss": 2.373790740966797,
+      "eval_runtime": 50.706,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 250
+    },
+    {
+      "epoch": 4.352,
+      "grad_norm": 0.9957796335220337,
+      "learning_rate": 0.00013810572687224668,
+      "loss": 2.2367,
+      "step": 255
+    },
+    {
+      "epoch": 4.437333333333333,
+      "grad_norm": 1.013082504272461,
+      "learning_rate": 0.00013480176211453743,
+      "loss": 2.2146,
+      "step": 260
+    },
+    {
+      "epoch": 4.522666666666667,
+      "grad_norm": 1.0343190431594849,
+      "learning_rate": 0.0001314977973568282,
+      "loss": 2.2362,
+      "step": 265
+    },
+    {
+      "epoch": 4.608,
+      "grad_norm": 1.0079319477081299,
+      "learning_rate": 0.00012819383259911893,
+      "loss": 2.2182,
+      "step": 270
+    },
+    {
+      "epoch": 4.693333333333333,
+      "grad_norm": 1.0466967821121216,
+      "learning_rate": 0.00012488986784140967,
+      "loss": 2.2197,
+      "step": 275
+    },
+    {
+      "epoch": 4.693333333333333,
+      "eval_loss": 2.3711977005004883,
+      "eval_runtime": 50.7236,
+      "eval_samples_per_second": 9.857,
+      "eval_steps_per_second": 1.242,
+      "step": 275
+    },
+    {
+      "epoch": 4.778666666666666,
+      "grad_norm": 1.0417555570602417,
+      "learning_rate": 0.00012158590308370043,
+      "loss": 2.2284,
+      "step": 280
+    },
+    {
+      "epoch": 4.864,
+      "grad_norm": 1.001129150390625,
+      "learning_rate": 0.00011828193832599118,
+      "loss": 2.2411,
+      "step": 285
+    },
+    {
+      "epoch": 4.949333333333334,
+      "grad_norm": 1.0128998756408691,
+      "learning_rate": 0.00011497797356828192,
+      "loss": 2.2368,
+      "step": 290
+    },
+    {
+      "epoch": 5.034666666666666,
+      "grad_norm": 0.9789999127388,
+      "learning_rate": 0.00011167400881057268,
+      "loss": 2.2259,
+      "step": 295
+    },
+    {
+      "epoch": 5.12,
+      "grad_norm": 1.0087758302688599,
+      "learning_rate": 0.00010837004405286342,
+      "loss": 2.1643,
+      "step": 300
+    },
+    {
+      "epoch": 5.12,
+      "eval_loss": 2.370661735534668,
+      "eval_runtime": 50.7317,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 300
+    },
+    {
+      "epoch": 5.205333333333333,
+      "grad_norm": 1.0349854230880737,
+      "learning_rate": 0.00010506607929515418,
+      "loss": 2.1957,
+      "step": 305
+    },
+    {
+      "epoch": 5.290666666666667,
+      "grad_norm": 1.0541808605194092,
+      "learning_rate": 0.00010176211453744494,
+      "loss": 2.1873,
+      "step": 310
+    },
+    {
+      "epoch": 5.376,
+      "grad_norm": 1.0202800035476685,
+      "learning_rate": 9.845814977973568e-05,
+      "loss": 2.2108,
+      "step": 315
+    },
+    {
+      "epoch": 5.461333333333333,
+      "grad_norm": 1.036137342453003,
+      "learning_rate": 9.515418502202643e-05,
+      "loss": 2.1934,
+      "step": 320
+    },
+    {
+      "epoch": 5.546666666666667,
+      "grad_norm": 1.012592077255249,
+      "learning_rate": 9.185022026431717e-05,
+      "loss": 2.2055,
+      "step": 325
+    },
+    {
+      "epoch": 5.546666666666667,
+      "eval_loss": 2.372723340988159,
+      "eval_runtime": 50.7062,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 325
+    },
+    {
+      "epoch": 5.632,
+      "grad_norm": 1.0501244068145752,
+      "learning_rate": 8.854625550660793e-05,
+      "loss": 2.2097,
+      "step": 330
+    },
+    {
+      "epoch": 5.717333333333333,
+      "grad_norm": 1.0283957719802856,
+      "learning_rate": 8.524229074889867e-05,
+      "loss": 2.1996,
+      "step": 335
+    },
+    {
+      "epoch": 5.802666666666667,
+      "grad_norm": 1.001703143119812,
+      "learning_rate": 8.193832599118942e-05,
+      "loss": 2.2157,
+      "step": 340
+    },
+    {
+      "epoch": 5.888,
+      "grad_norm": 1.0345960855484009,
+      "learning_rate": 7.863436123348016e-05,
+      "loss": 2.216,
+      "step": 345
+    },
+    {
+      "epoch": 5.973333333333334,
+      "grad_norm": 1.0450265407562256,
+      "learning_rate": 7.533039647577093e-05,
+      "loss": 2.2141,
+      "step": 350
+    },
+    {
+      "epoch": 5.973333333333334,
+      "eval_loss": 2.3703668117523193,
+      "eval_runtime": 50.7172,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 350
+    },
+    {
+      "epoch": 6.058666666666666,
+      "grad_norm": 0.986889660358429,
+      "learning_rate": 7.202643171806167e-05,
+      "loss": 2.192,
+      "step": 355
+    },
+    {
+      "epoch": 6.144,
+      "grad_norm": 1.0109078884124756,
+      "learning_rate": 6.872246696035242e-05,
+      "loss": 2.1836,
+      "step": 360
+    },
+    {
+      "epoch": 6.229333333333333,
+      "grad_norm": 1.0342578887939453,
+      "learning_rate": 6.541850220264316e-05,
+      "loss": 2.1894,
+      "step": 365
+    },
+    {
+      "epoch": 6.314666666666667,
+      "grad_norm": 1.0402517318725586,
+      "learning_rate": 6.211453744493392e-05,
+      "loss": 2.1727,
+      "step": 370
+    },
+    {
+      "epoch": 6.4,
+      "grad_norm": 1.0148675441741943,
+      "learning_rate": 5.881057268722466e-05,
+      "loss": 2.1697,
+      "step": 375
+    },
+    {
+      "epoch": 6.4,
+      "eval_loss": 2.3737528324127197,
+      "eval_runtime": 50.7165,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 375
+    },
+    {
+      "epoch": 6.485333333333333,
+      "grad_norm": 1.0261683464050293,
+      "learning_rate": 5.550660792951541e-05,
+      "loss": 2.1756,
+      "step": 380
+    },
+    {
+      "epoch": 6.570666666666667,
+      "grad_norm": 1.0536677837371826,
+      "learning_rate": 5.220264317180616e-05,
+      "loss": 2.1984,
+      "step": 385
+    },
+    {
+      "epoch": 6.656,
+      "grad_norm": 1.0320463180541992,
+      "learning_rate": 4.889867841409691e-05,
+      "loss": 2.1758,
+      "step": 390
+    },
+    {
+      "epoch": 6.741333333333333,
+      "grad_norm": 1.0172383785247803,
+      "learning_rate": 4.559471365638766e-05,
+      "loss": 2.1988,
+      "step": 395
+    },
+    {
+      "epoch": 6.826666666666666,
+      "grad_norm": 1.0310728549957275,
+      "learning_rate": 4.229074889867841e-05,
+      "loss": 2.1908,
+      "step": 400
+    },
+    {
+      "epoch": 6.826666666666666,
+      "eval_loss": 2.3720057010650635,
+      "eval_runtime": 50.708,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 400
+    },
+    {
+      "epoch": 6.912,
+      "grad_norm": 1.0129923820495605,
+      "learning_rate": 3.898678414096916e-05,
+      "loss": 2.1975,
+      "step": 405
+    },
+    {
+      "epoch": 6.997333333333334,
+      "grad_norm": 0.9970852732658386,
+      "learning_rate": 3.568281938325991e-05,
+      "loss": 2.1562,
+      "step": 410
+    },
+    {
+      "epoch": 7.082666666666666,
+      "grad_norm": 1.0077489614486694,
+      "learning_rate": 3.237885462555066e-05,
+      "loss": 2.1576,
+      "step": 415
+    },
+    {
+      "epoch": 7.168,
+      "grad_norm": 1.037332534790039,
+      "learning_rate": 2.9074889867841408e-05,
+      "loss": 2.1547,
+      "step": 420
+    },
+    {
+      "epoch": 7.253333333333333,
+      "grad_norm": 1.0144597291946411,
+      "learning_rate": 2.5770925110132158e-05,
+      "loss": 2.1841,
+      "step": 425
+    },
+    {
+      "epoch": 7.253333333333333,
+      "eval_loss": 2.3749217987060547,
+      "eval_runtime": 50.7304,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 425
+    },
+    {
+      "epoch": 7.338666666666667,
+      "grad_norm": 1.0164906978607178,
+      "learning_rate": 2.2466960352422905e-05,
+      "loss": 2.1584,
+      "step": 430
+    },
+    {
+      "epoch": 7.424,
+      "grad_norm": 1.0603595972061157,
+      "learning_rate": 1.9162995594713652e-05,
+      "loss": 2.1727,
+      "step": 435
+    },
+    {
+      "epoch": 7.509333333333333,
+      "grad_norm": 1.0436800718307495,
+      "learning_rate": 1.5859030837004403e-05,
+      "loss": 2.1668,
+      "step": 440
+    },
+    {
+      "epoch": 7.594666666666667,
+      "grad_norm": 1.0156831741333008,
+      "learning_rate": 1.2555066079295153e-05,
+      "loss": 2.1508,
+      "step": 445
+    },
+    {
+      "epoch": 7.68,
+      "grad_norm": 1.0232970714569092,
+      "learning_rate": 9.251101321585902e-06,
+      "loss": 2.1795,
+      "step": 450
+    },
+    {
+      "epoch": 7.68,
+      "eval_loss": 2.374783754348755,
+      "eval_runtime": 50.7314,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 450
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 464,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 8,
+  "save_steps": 25,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.642892101648384e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-450/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e5e28485a7b3a1a3db706bc20a8e6c9dd73d2112d08db67065b84e6004e139f
+size 5432

checkpoint-464/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: /mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.15.1

checkpoint-464/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/mnt/data/MODEL/deepseek/DeepSeek-R1-Distill-Llama-8B",
+  "bias": "none",
+  "corda_config": null,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": "gaussian",
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-464/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d0c337bad7b18d197d80fbf1a6dcb3b202189725cd081ffcb0970e762c9d2e5
+size 13648432

checkpoint-464/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c167018c49927ba134909086d95413d83bfbfdf1b4fba0d26185f6f0d0bde5c
+size 27370618

checkpoint-464/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f2bc1cd2accbdba94b1c8c08aebb6396bb6c6c46cd35f9509b86f072023d276
+size 14244

checkpoint-464/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c3e8f2e6bad23ce83fe0bfec5f39e0fa66d0db25a52744ee984f49ccd6ef488
+size 1064

checkpoint-464/trainer_state.json ADDED Viewed

	@@ -0,0 +1,821 @@

+{
+  "best_metric": 2.3685686588287354,
+  "best_model_checkpoint": "/mnt/data/computer_design/lora_checkpoints/DeepSeek-R1-Distill-Llama-8B__news-summarizer-noreason__ral_8_16_0.0003_8/checkpoint-225",
+  "epoch": 7.918933333333333,
+  "eval_steps": 25,
+  "global_step": 464,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.08533333333333333,
+      "grad_norm": 5.31721305847168,
+      "learning_rate": 0.00015,
+      "loss": 3.5245,
+      "step": 5
+    },
+    {
+      "epoch": 0.17066666666666666,
+      "grad_norm": 2.912184476852417,
+      "learning_rate": 0.0003,
+      "loss": 2.9075,
+      "step": 10
+    },
+    {
+      "epoch": 0.256,
+      "grad_norm": 1.5293394327163696,
+      "learning_rate": 0.00029669603524229074,
+      "loss": 2.6626,
+      "step": 15
+    },
+    {
+      "epoch": 0.3413333333333333,
+      "grad_norm": 1.3206167221069336,
+      "learning_rate": 0.0002933920704845815,
+      "loss": 2.5962,
+      "step": 20
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 1.2503535747528076,
+      "learning_rate": 0.0002900881057268722,
+      "loss": 2.5323,
+      "step": 25
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "eval_loss": 2.5280094146728516,
+      "eval_runtime": 50.7358,
+      "eval_samples_per_second": 9.855,
+      "eval_steps_per_second": 1.242,
+      "step": 25
+    },
+    {
+      "epoch": 0.512,
+      "grad_norm": 1.1334757804870605,
+      "learning_rate": 0.000286784140969163,
+      "loss": 2.5021,
+      "step": 30
+    },
+    {
+      "epoch": 0.5973333333333334,
+      "grad_norm": 1.1564419269561768,
+      "learning_rate": 0.0002834801762114537,
+      "loss": 2.4867,
+      "step": 35
+    },
+    {
+      "epoch": 0.6826666666666666,
+      "grad_norm": 1.0658727884292603,
+      "learning_rate": 0.00028017621145374447,
+      "loss": 2.4791,
+      "step": 40
+    },
+    {
+      "epoch": 0.768,
+      "grad_norm": 1.071118950843811,
+      "learning_rate": 0.0002768722466960352,
+      "loss": 2.4294,
+      "step": 45
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 1.1074410676956177,
+      "learning_rate": 0.00027356828193832595,
+      "loss": 2.4526,
+      "step": 50
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "eval_loss": 2.43389892578125,
+      "eval_runtime": 50.8571,
+      "eval_samples_per_second": 9.831,
+      "eval_steps_per_second": 1.239,
+      "step": 50
+    },
+    {
+      "epoch": 0.9386666666666666,
+      "grad_norm": 1.0138615369796753,
+      "learning_rate": 0.0002702643171806167,
+      "loss": 2.4296,
+      "step": 55
+    },
+    {
+      "epoch": 1.024,
+      "grad_norm": 0.9914734959602356,
+      "learning_rate": 0.0002669603524229075,
+      "loss": 2.3865,
+      "step": 60
+    },
+    {
+      "epoch": 1.1093333333333333,
+      "grad_norm": 0.9485092759132385,
+      "learning_rate": 0.0002636563876651982,
+      "loss": 2.3592,
+      "step": 65
+    },
+    {
+      "epoch": 1.1946666666666665,
+      "grad_norm": 1.032936692237854,
+      "learning_rate": 0.00026035242290748897,
+      "loss": 2.3762,
+      "step": 70
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 0.978344738483429,
+      "learning_rate": 0.00025704845814977973,
+      "loss": 2.3715,
+      "step": 75
+    },
+    {
+      "epoch": 1.28,
+      "eval_loss": 2.4061355590820312,
+      "eval_runtime": 53.5326,
+      "eval_samples_per_second": 9.34,
+      "eval_steps_per_second": 1.177,
+      "step": 75
+    },
+    {
+      "epoch": 1.3653333333333333,
+      "grad_norm": 1.0429058074951172,
+      "learning_rate": 0.00025374449339207045,
+      "loss": 2.3519,
+      "step": 80
+    },
+    {
+      "epoch": 1.4506666666666668,
+      "grad_norm": 1.0109028816223145,
+      "learning_rate": 0.0002504405286343612,
+      "loss": 2.3698,
+      "step": 85
+    },
+    {
+      "epoch": 1.536,
+      "grad_norm": 1.0443379878997803,
+      "learning_rate": 0.00024713656387665193,
+      "loss": 2.3652,
+      "step": 90
+    },
+    {
+      "epoch": 1.6213333333333333,
+      "grad_norm": 0.9977161884307861,
+      "learning_rate": 0.0002438325991189427,
+      "loss": 2.3799,
+      "step": 95
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 0.9699886441230774,
+      "learning_rate": 0.00024052863436123346,
+      "loss": 2.3437,
+      "step": 100
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "eval_loss": 2.392068386077881,
+      "eval_runtime": 50.7099,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 100
+    },
+    {
+      "epoch": 1.792,
+      "grad_norm": 1.0068645477294922,
+      "learning_rate": 0.00023722466960352423,
+      "loss": 2.3396,
+      "step": 105
+    },
+    {
+      "epoch": 1.8773333333333333,
+      "grad_norm": 0.9659402966499329,
+      "learning_rate": 0.00023392070484581494,
+      "loss": 2.3443,
+      "step": 110
+    },
+    {
+      "epoch": 1.9626666666666668,
+      "grad_norm": 0.9416194558143616,
+      "learning_rate": 0.0002306167400881057,
+      "loss": 2.3458,
+      "step": 115
+    },
+    {
+      "epoch": 2.048,
+      "grad_norm": 0.9857434630393982,
+      "learning_rate": 0.00022731277533039645,
+      "loss": 2.2908,
+      "step": 120
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "grad_norm": 0.9868885278701782,
+      "learning_rate": 0.00022400881057268722,
+      "loss": 2.2919,
+      "step": 125
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "eval_loss": 2.3825137615203857,
+      "eval_runtime": 50.7136,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 125
+    },
+    {
+      "epoch": 2.2186666666666666,
+      "grad_norm": 0.9239076972007751,
+      "learning_rate": 0.00022070484581497796,
+      "loss": 2.3055,
+      "step": 130
+    },
+    {
+      "epoch": 2.304,
+      "grad_norm": 0.9522895216941833,
+      "learning_rate": 0.0002174008810572687,
+      "loss": 2.319,
+      "step": 135
+    },
+    {
+      "epoch": 2.389333333333333,
+      "grad_norm": 0.989910900592804,
+      "learning_rate": 0.00021409691629955944,
+      "loss": 2.2679,
+      "step": 140
+    },
+    {
+      "epoch": 2.474666666666667,
+      "grad_norm": 1.0279978513717651,
+      "learning_rate": 0.0002107929515418502,
+      "loss": 2.309,
+      "step": 145
+    },
+    {
+      "epoch": 2.56,
+      "grad_norm": 0.9677265286445618,
+      "learning_rate": 0.00020748898678414097,
+      "loss": 2.2834,
+      "step": 150
+    },
+    {
+      "epoch": 2.56,
+      "eval_loss": 2.3774726390838623,
+      "eval_runtime": 50.6978,
+      "eval_samples_per_second": 9.862,
+      "eval_steps_per_second": 1.243,
+      "step": 150
+    },
+    {
+      "epoch": 2.6453333333333333,
+      "grad_norm": 0.9602519869804382,
+      "learning_rate": 0.0002041850220264317,
+      "loss": 2.3044,
+      "step": 155
+    },
+    {
+      "epoch": 2.7306666666666666,
+      "grad_norm": 0.9305415153503418,
+      "learning_rate": 0.00020088105726872246,
+      "loss": 2.2996,
+      "step": 160
+    },
+    {
+      "epoch": 2.816,
+      "grad_norm": 0.9666855931282043,
+      "learning_rate": 0.0001975770925110132,
+      "loss": 2.2807,
+      "step": 165
+    },
+    {
+      "epoch": 2.9013333333333335,
+      "grad_norm": 1.0196256637573242,
+      "learning_rate": 0.00019427312775330396,
+      "loss": 2.2948,
+      "step": 170
+    },
+    {
+      "epoch": 2.986666666666667,
+      "grad_norm": 0.9804545044898987,
+      "learning_rate": 0.00019096916299559468,
+      "loss": 2.3319,
+      "step": 175
+    },
+    {
+      "epoch": 2.986666666666667,
+      "eval_loss": 2.3707964420318604,
+      "eval_runtime": 50.7119,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 175
+    },
+    {
+      "epoch": 3.072,
+      "grad_norm": 0.9667485356330872,
+      "learning_rate": 0.00018766519823788544,
+      "loss": 2.2698,
+      "step": 180
+    },
+    {
+      "epoch": 3.1573333333333333,
+      "grad_norm": 0.9869656562805176,
+      "learning_rate": 0.00018436123348017618,
+      "loss": 2.2443,
+      "step": 185
+    },
+    {
+      "epoch": 3.2426666666666666,
+      "grad_norm": 0.9679750204086304,
+      "learning_rate": 0.00018105726872246695,
+      "loss": 2.2636,
+      "step": 190
+    },
+    {
+      "epoch": 3.328,
+      "grad_norm": 0.9996704459190369,
+      "learning_rate": 0.00017775330396475772,
+      "loss": 2.2536,
+      "step": 195
+    },
+    {
+      "epoch": 3.413333333333333,
+      "grad_norm": 0.9564487338066101,
+      "learning_rate": 0.00017444933920704843,
+      "loss": 2.2404,
+      "step": 200
+    },
+    {
+      "epoch": 3.413333333333333,
+      "eval_loss": 2.3742570877075195,
+      "eval_runtime": 50.7116,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 200
+    },
+    {
+      "epoch": 3.498666666666667,
+      "grad_norm": 1.0057621002197266,
+      "learning_rate": 0.0001711453744493392,
+      "loss": 2.2888,
+      "step": 205
+    },
+    {
+      "epoch": 3.584,
+      "grad_norm": 1.0336802005767822,
+      "learning_rate": 0.00016784140969162994,
+      "loss": 2.2946,
+      "step": 210
+    },
+    {
+      "epoch": 3.6693333333333333,
+      "grad_norm": 1.0010818243026733,
+      "learning_rate": 0.0001645374449339207,
+      "loss": 2.2531,
+      "step": 215
+    },
+    {
+      "epoch": 3.7546666666666666,
+      "grad_norm": 0.9891405701637268,
+      "learning_rate": 0.00016123348017621142,
+      "loss": 2.2336,
+      "step": 220
+    },
+    {
+      "epoch": 3.84,
+      "grad_norm": 0.9514368176460266,
+      "learning_rate": 0.0001579295154185022,
+      "loss": 2.238,
+      "step": 225
+    },
+    {
+      "epoch": 3.84,
+      "eval_loss": 2.3685686588287354,
+      "eval_runtime": 50.7197,
+      "eval_samples_per_second": 9.858,
+      "eval_steps_per_second": 1.242,
+      "step": 225
+    },
+    {
+      "epoch": 3.9253333333333336,
+      "grad_norm": 1.0048863887786865,
+      "learning_rate": 0.00015462555066079293,
+      "loss": 2.2638,
+      "step": 230
+    },
+    {
+      "epoch": 4.010666666666666,
+      "grad_norm": 0.9555865526199341,
+      "learning_rate": 0.0001513215859030837,
+      "loss": 2.2523,
+      "step": 235
+    },
+    {
+      "epoch": 4.096,
+      "grad_norm": 1.01077401638031,
+      "learning_rate": 0.00014801762114537444,
+      "loss": 2.2247,
+      "step": 240
+    },
+    {
+      "epoch": 4.181333333333333,
+      "grad_norm": 0.9413540959358215,
+      "learning_rate": 0.00014471365638766518,
+      "loss": 2.216,
+      "step": 245
+    },
+    {
+      "epoch": 4.266666666666667,
+      "grad_norm": 1.0012569427490234,
+      "learning_rate": 0.00014140969162995594,
+      "loss": 2.223,
+      "step": 250
+    },
+    {
+      "epoch": 4.266666666666667,
+      "eval_loss": 2.373790740966797,
+      "eval_runtime": 50.706,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 250
+    },
+    {
+      "epoch": 4.352,
+      "grad_norm": 0.9957796335220337,
+      "learning_rate": 0.00013810572687224668,
+      "loss": 2.2367,
+      "step": 255
+    },
+    {
+      "epoch": 4.437333333333333,
+      "grad_norm": 1.013082504272461,
+      "learning_rate": 0.00013480176211453743,
+      "loss": 2.2146,
+      "step": 260
+    },
+    {
+      "epoch": 4.522666666666667,
+      "grad_norm": 1.0343190431594849,
+      "learning_rate": 0.0001314977973568282,
+      "loss": 2.2362,
+      "step": 265
+    },
+    {
+      "epoch": 4.608,
+      "grad_norm": 1.0079319477081299,
+      "learning_rate": 0.00012819383259911893,
+      "loss": 2.2182,
+      "step": 270
+    },
+    {
+      "epoch": 4.693333333333333,
+      "grad_norm": 1.0466967821121216,
+      "learning_rate": 0.00012488986784140967,
+      "loss": 2.2197,
+      "step": 275
+    },
+    {
+      "epoch": 4.693333333333333,
+      "eval_loss": 2.3711977005004883,
+      "eval_runtime": 50.7236,
+      "eval_samples_per_second": 9.857,
+      "eval_steps_per_second": 1.242,
+      "step": 275
+    },
+    {
+      "epoch": 4.778666666666666,
+      "grad_norm": 1.0417555570602417,
+      "learning_rate": 0.00012158590308370043,
+      "loss": 2.2284,
+      "step": 280
+    },
+    {
+      "epoch": 4.864,
+      "grad_norm": 1.001129150390625,
+      "learning_rate": 0.00011828193832599118,
+      "loss": 2.2411,
+      "step": 285
+    },
+    {
+      "epoch": 4.949333333333334,
+      "grad_norm": 1.0128998756408691,
+      "learning_rate": 0.00011497797356828192,
+      "loss": 2.2368,
+      "step": 290
+    },
+    {
+      "epoch": 5.034666666666666,
+      "grad_norm": 0.9789999127388,
+      "learning_rate": 0.00011167400881057268,
+      "loss": 2.2259,
+      "step": 295
+    },
+    {
+      "epoch": 5.12,
+      "grad_norm": 1.0087758302688599,
+      "learning_rate": 0.00010837004405286342,
+      "loss": 2.1643,
+      "step": 300
+    },
+    {
+      "epoch": 5.12,
+      "eval_loss": 2.370661735534668,
+      "eval_runtime": 50.7317,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 300
+    },
+    {
+      "epoch": 5.205333333333333,
+      "grad_norm": 1.0349854230880737,
+      "learning_rate": 0.00010506607929515418,
+      "loss": 2.1957,
+      "step": 305
+    },
+    {
+      "epoch": 5.290666666666667,
+      "grad_norm": 1.0541808605194092,
+      "learning_rate": 0.00010176211453744494,
+      "loss": 2.1873,
+      "step": 310
+    },
+    {
+      "epoch": 5.376,
+      "grad_norm": 1.0202800035476685,
+      "learning_rate": 9.845814977973568e-05,
+      "loss": 2.2108,
+      "step": 315
+    },
+    {
+      "epoch": 5.461333333333333,
+      "grad_norm": 1.036137342453003,
+      "learning_rate": 9.515418502202643e-05,
+      "loss": 2.1934,
+      "step": 320
+    },
+    {
+      "epoch": 5.546666666666667,
+      "grad_norm": 1.012592077255249,
+      "learning_rate": 9.185022026431717e-05,
+      "loss": 2.2055,
+      "step": 325
+    },
+    {
+      "epoch": 5.546666666666667,
+      "eval_loss": 2.372723340988159,
+      "eval_runtime": 50.7062,
+      "eval_samples_per_second": 9.861,
+      "eval_steps_per_second": 1.242,
+      "step": 325
+    },
+    {
+      "epoch": 5.632,
+      "grad_norm": 1.0501244068145752,
+      "learning_rate": 8.854625550660793e-05,
+      "loss": 2.2097,
+      "step": 330
+    },
+    {
+      "epoch": 5.717333333333333,
+      "grad_norm": 1.0283957719802856,
+      "learning_rate": 8.524229074889867e-05,
+      "loss": 2.1996,
+      "step": 335
+    },
+    {
+      "epoch": 5.802666666666667,
+      "grad_norm": 1.001703143119812,
+      "learning_rate": 8.193832599118942e-05,
+      "loss": 2.2157,
+      "step": 340
+    },
+    {
+      "epoch": 5.888,
+      "grad_norm": 1.0345960855484009,
+      "learning_rate": 7.863436123348016e-05,
+      "loss": 2.216,
+      "step": 345
+    },
+    {
+      "epoch": 5.973333333333334,
+      "grad_norm": 1.0450265407562256,
+      "learning_rate": 7.533039647577093e-05,
+      "loss": 2.2141,
+      "step": 350
+    },
+    {
+      "epoch": 5.973333333333334,
+      "eval_loss": 2.3703668117523193,
+      "eval_runtime": 50.7172,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 350
+    },
+    {
+      "epoch": 6.058666666666666,
+      "grad_norm": 0.986889660358429,
+      "learning_rate": 7.202643171806167e-05,
+      "loss": 2.192,
+      "step": 355
+    },
+    {
+      "epoch": 6.144,
+      "grad_norm": 1.0109078884124756,
+      "learning_rate": 6.872246696035242e-05,
+      "loss": 2.1836,
+      "step": 360
+    },
+    {
+      "epoch": 6.229333333333333,
+      "grad_norm": 1.0342578887939453,
+      "learning_rate": 6.541850220264316e-05,
+      "loss": 2.1894,
+      "step": 365
+    },
+    {
+      "epoch": 6.314666666666667,
+      "grad_norm": 1.0402517318725586,
+      "learning_rate": 6.211453744493392e-05,
+      "loss": 2.1727,
+      "step": 370
+    },
+    {
+      "epoch": 6.4,
+      "grad_norm": 1.0148675441741943,
+      "learning_rate": 5.881057268722466e-05,
+      "loss": 2.1697,
+      "step": 375
+    },
+    {
+      "epoch": 6.4,
+      "eval_loss": 2.3737528324127197,
+      "eval_runtime": 50.7165,
+      "eval_samples_per_second": 9.859,
+      "eval_steps_per_second": 1.242,
+      "step": 375
+    },
+    {
+      "epoch": 6.485333333333333,
+      "grad_norm": 1.0261683464050293,
+      "learning_rate": 5.550660792951541e-05,
+      "loss": 2.1756,
+      "step": 380
+    },
+    {
+      "epoch": 6.570666666666667,
+      "grad_norm": 1.0536677837371826,
+      "learning_rate": 5.220264317180616e-05,
+      "loss": 2.1984,
+      "step": 385
+    },
+    {
+      "epoch": 6.656,
+      "grad_norm": 1.0320463180541992,
+      "learning_rate": 4.889867841409691e-05,
+      "loss": 2.1758,
+      "step": 390
+    },
+    {
+      "epoch": 6.741333333333333,
+      "grad_norm": 1.0172383785247803,
+      "learning_rate": 4.559471365638766e-05,
+      "loss": 2.1988,
+      "step": 395
+    },
+    {
+      "epoch": 6.826666666666666,
+      "grad_norm": 1.0310728549957275,
+      "learning_rate": 4.229074889867841e-05,
+      "loss": 2.1908,
+      "step": 400
+    },
+    {
+      "epoch": 6.826666666666666,
+      "eval_loss": 2.3720057010650635,
+      "eval_runtime": 50.708,
+      "eval_samples_per_second": 9.86,
+      "eval_steps_per_second": 1.242,
+      "step": 400
+    },
+    {
+      "epoch": 6.912,
+      "grad_norm": 1.0129923820495605,
+      "learning_rate": 3.898678414096916e-05,
+      "loss": 2.1975,
+      "step": 405
+    },
+    {
+      "epoch": 6.997333333333334,
+      "grad_norm": 0.9970852732658386,
+      "learning_rate": 3.568281938325991e-05,
+      "loss": 2.1562,
+      "step": 410
+    },
+    {
+      "epoch": 7.082666666666666,
+      "grad_norm": 1.0077489614486694,
+      "learning_rate": 3.237885462555066e-05,
+      "loss": 2.1576,
+      "step": 415
+    },
+    {
+      "epoch": 7.168,
+      "grad_norm": 1.037332534790039,
+      "learning_rate": 2.9074889867841408e-05,
+      "loss": 2.1547,
+      "step": 420
+    },
+    {
+      "epoch": 7.253333333333333,
+      "grad_norm": 1.0144597291946411,
+      "learning_rate": 2.5770925110132158e-05,
+      "loss": 2.1841,
+      "step": 425
+    },
+    {
+      "epoch": 7.253333333333333,
+      "eval_loss": 2.3749217987060547,
+      "eval_runtime": 50.7304,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 425
+    },
+    {
+      "epoch": 7.338666666666667,
+      "grad_norm": 1.0164906978607178,
+      "learning_rate": 2.2466960352422905e-05,
+      "loss": 2.1584,
+      "step": 430
+    },
+    {
+      "epoch": 7.424,
+      "grad_norm": 1.0603595972061157,
+      "learning_rate": 1.9162995594713652e-05,
+      "loss": 2.1727,
+      "step": 435
+    },
+    {
+      "epoch": 7.509333333333333,
+      "grad_norm": 1.0436800718307495,
+      "learning_rate": 1.5859030837004403e-05,
+      "loss": 2.1668,
+      "step": 440
+    },
+    {
+      "epoch": 7.594666666666667,
+      "grad_norm": 1.0156831741333008,
+      "learning_rate": 1.2555066079295153e-05,
+      "loss": 2.1508,
+      "step": 445
+    },
+    {
+      "epoch": 7.68,
+      "grad_norm": 1.0232970714569092,
+      "learning_rate": 9.251101321585902e-06,
+      "loss": 2.1795,
+      "step": 450
+    },
+    {
+      "epoch": 7.68,
+      "eval_loss": 2.374783754348755,
+      "eval_runtime": 50.7314,
+      "eval_samples_per_second": 9.856,
+      "eval_steps_per_second": 1.242,
+      "step": 450
+    },
+    {
+      "epoch": 7.765333333333333,
+      "grad_norm": 1.0554380416870117,
+      "learning_rate": 5.947136563876652e-06,
+      "loss": 2.1888,
+      "step": 455
+    },
+    {
+      "epoch": 7.850666666666667,
+      "grad_norm": 1.0208615064620972,
+      "learning_rate": 2.6431718061674008e-06,
+      "loss": 2.1802,
+      "step": 460
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 464,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 8,
+  "save_steps": 25,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 6.849559855921889e+17,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-464/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9e5e28485a7b3a1a3db706bc20a8e6c9dd73d2112d08db67065b84e6004e139f
+size 5432

runs/Apr16_18-22-52_zhangshenyi2/events.out.tfevents.1744827774.zhangshenyi2.2126728.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec4ad0475f1313f44cf58cf779cab03805e045e64441b92f6132e8a79c217156
+size 5513

runs/Apr17_01-30-13_zhangshenyi2/events.out.tfevents.1744853415.zhangshenyi2.62022.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2262f236a7087252c093649b65bfc62c127e774ee937b6f91a38742ab90b015c
+size 5476

runs/Apr17_01-35-14_zhangshenyi2/events.out.tfevents.1744853715.zhangshenyi2.62985.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7e892976091436612d91a3d8b7df60e235bbebb5a8f9286935888ab08fce31a1
+size 5476

runs/Apr17_01-44-24_zhangshenyi2/events.out.tfevents.1744854266.zhangshenyi2.64102.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:511cd39a4044653eb8be761a73265a0f24776c754e4fee45310ae2fb208c8a52
+size 5476

runs/Apr17_01-46-00_zhangshenyi2/events.out.tfevents.1744854362.zhangshenyi2.66064.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:16ee93f5a514cfb2cf69b64666985afa4f99a45d878779c5541dd492134c53ba
+size 5476

runs/Apr17_01-51-30_zhangshenyi2/events.out.tfevents.1744854691.zhangshenyi2.66986.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05ffe984f8f8488b41c846d1cf7b5fd31c10691446dcae5bc7e649d0b3cc03b3
+size 30029