ãã¯ããããããŸã.AIã§èšäºãåŠç¿ããŠæ°ããªèšäºãçã¿åºãã«ã¯ãéãå¿
èŠã ãšæã£ãŠãããããã§ããªãããŒã«ã«PCã§ããã蟺ã«èœã¡ãŠããLlamaã¢ãã«ãæã£ãŠããŠãã¥ãŒãã³ã°ããã°äœãšããªããããããšããæãã«è³ã£ã.
å®ã¯ããªãã®æå
ã«ããPCãšããããäžã«ãèœã¡ãŠããããªãŒãã³ãœãŒã¹ã®AIã¢ãã«ãç¹ã«Llama 3ãããã°ãååèšäºãçæã§ãããã§ãã
ããŒã«ã«AIèšäºçæã¯ããã¯ã倢ç©èªãããªã
ãAIã§èšäºçæããšèããšãSFã®ãããªäžçããå€§äŒæ¥ã ãã䜿ããç¹æš©ã®ããã«æãããããããŸãããããããä»ã¯éããŸãããªãŒãã³ãœãŒã¹ã®åŒ·åãªèšèªã¢ãã«ãç¹ã«Metaãå
¬éããLlama 3ã®ç»å Žã¯ããã®åžžèã倧ããèŠããŸããã
Llama 3ã¯ããã®æ§èœã®é«ãã«ããããããã誰ã§ãç¡æã§å©çšã§ãããšããç¹ãæå€§ã®é
åã§ããããã«ã80åãã©ã¡ãŒã¿ã®8Bã¢ãã«ã§ããã°ãææ°ã®ã²ãŒãã³ã°PCãšãŸã§ã¯ãããªããšããããçšåºŠã®æ§èœãæã€PCã§ããã°ååã«åäœããŸããããã«ãããé«é¡ãªã¯ã©ãŠããµãŒãã¹ãå©çšãããšããèªåã®PCã§AIèšäºçæã®ç°å¢ãæ§ç¯ããããšãçŸå®çã«ãªããŸããã
ãªãLlama 3ãããªãã®PCãšçžæ§æçŸ€ãªã®ãïŒ
Llama 3ãããŒã«ã«PCã§ã®èšäºçæã«é©ããŠããçç±ã¯ããã€ããããŸãã
- å®å
šç¡æã®ãªãŒãã³ãœãŒã¹: å©çšã«è²»çšãããããªããããäºç®ãæ°ã«ããAIã詊ããããæ¬æ Œçã«å°å
¥ãããã§ããŸãã
- éžã¹ãã¢ãã«ãµã€ãº: Llama 3ã«ã¯æ§ã
ãªãµã€ãºã®ã¢ãã«ããããPCã®ã¹ããã¯ã«åãããŠéžã¹ãŸããç¹ã«8Bã¢ãã«ã¯ãå人å©çšã«æé©ãªãã©ã³ã¹ãæã£ãŠããŸãã
- 掻çºãªéçºè
ã³ãã¥ããã£: äžçäžã®éçºè
ãLlama 3ã䜿ã£ãæ°ããããŒã«ãå¹ççãªãã¥ãŒãã³ã°æ¹æ³ãæ¥ã
å
±æããŠããŸããå°ã£ããšãã«ã¯å©ããåããããå¿åŒ·ã峿¹ã§ãã
- ãéååãã§ããã«è»œéã«: ã¢ãã«ã®ãµã€ãºã倧å¹
ã«å°ãããããéååããšããæè¡ã䜿ãã°ãããå°ãªãã¡ã¢ãªã§Llama 3ãåãããããã«ãªããŸããããã«ãããããå€ãã®PCã§å©çšã®éãéããŸãã
ããªãã®PCããèšäºçæãã·ã³ãã«å€ããç§èš£
ãã¡ããããããªãããã®ã©ã€ã¿ãŒäžŠã¿ã®èšäºãAIã«æžãããã®ã¯é£ãããããããŸãããããããã¡ãã£ãšãã工倫ã§ãäœãšããªããã¬ãã«ã®èšäºçæã¯ååã«å¯èœã§ãã
- å°éã®ããŒã¿ã§ãã¡ã€ã³ãã¥ãŒãã³ã°: 倧éã®èšäºããŒã¿ã¯äžèŠã§ããããªããæžãããèšäºã®ããŒããã¹ã¿ã€ã«ã«åã£ãã質ã®è¯ãèšäºãæ°åãæ°çŸçšåºŠéããŠLlama 3ãåŠç¿ïŒãã¡ã€ã³ãã¥ãŒãã³ã°ïŒãããã°ããã®åéã«ç¹åããèšäºçæèœåãæ Œæ®µã«åäžããŸãã
- ããã³ããïŒæç€ºæïŒã®å·¥å€«: AIãžã®ãæç€ºã®åºãæ¹ãã¯éåžžã«éèŠã§ããå
·äœçã§æç¢ºãªããã³ãããäžããããšã§ããã¥ãŒãã³ã°ãå®ç§ã§ãªããŠããé©ãã»ã©è³ªã®é«ãèšäºãçæã§ããŸããããã¯ãŸãã§ãåªç§ãªã¢ã·ã¹ã¿ã³ãã«çç¢ºãªæç€ºãåºããããªãã®ã§ãã
- å¹ççãªåŠç¿æ¹æ³ã®æŽ»çš: ãLoRAïŒLow-Rank AdaptationïŒãã®ãããªå¹ççãªãã¡ã€ã³ãã¥ãŒãã³ã°ææ³ã䜿ãã°ãå°ãªãGPUã¡ã¢ãªã§ãçæéã§ã¢ãã«ãç¹å®ã®ã¿ã¹ã¯ã«æé©åã§ããŸãã
ããªãã®åµé æ§ããä»ãAIã§å éãã
ãã€ãŠã¯äžéšã®å°éå®¶ãäŒæ¥ã«ããæã®å±ããªãã£ãAIã«ããèšäºçæããä»ãããªãã®PCã§å®çŸã§ããæä»£ã«ãªããŸãããããã¯ãŸãã«AIæè¡ã®ãæ°äž»åãã§ãã
ãšãŸãããããèš³ãªã®ã§äœãšãããŠã¿ãŸããããã¡ã€ã³ãã¥ãŒãã³ã°ã«ã©ããããæéããããã®ããæªç¥æ°ã ã£ãããã.
ãã¡ã€ã³ãã¥ãŒãã³ã°Pythonã³ãŒã
以äžã®Pythonã³ãŒãã¯ãLlama 3ã¢ãã«ãããŒãããæäŸãããããã¹ãèšäºã§ãã¡ã€ã³ãã¥ãŒãã³ã°ïŒLoRA䜿çšïŒãå®è¡ããçµæãä¿åããŸãã äžèšã®å
¥åå€ã¯ããã®ã³ãŒãã«èªåçã«åæ ãããŸãã ãã®ã³ãŒããPythonãã¡ã€ã«ïŒäŸ: `finetune_llama.py`ïŒãšããŠä¿åããå®è¡ããŠãã ããã
import os
import torch
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
# GPUã®å©çšå¯èœæ§ã確èª
print("GPUãå©çšå¯èœã確èªäž...")
if not torch.cuda.is_available():
print("GPUãèŠã€ãããŸãããFine-tuningã«ã¯GPUãåŒ·ãæšå¥šãããŸãã")
# GPUããªãå Žåã¯ãããã§ã¹ã¯ãªãããçµäºããããCPUã¢ãŒãã§ç¶è¡ãããéžæã§ããŸãã
# exit("GPUããªãããçµäºããŸãã")
else:
print(f"GPUãå©çšå¯èœã§ã: {torch.cuda.get_device_name(0)}")
# --- 1. ã¢ãã«ãšããŒã¯ãã€ã¶ãŒã®ããŒã ---
# Llama 3ã¢ãã«ã®ãã¹ãæå®ããŸããHugging Faceã®ã¢ãã«åïŒäŸ: "meta-llama/Llama-3-8B"ïŒ
# ãŸãã¯ããŒã«ã«ã«ããŠã³ããŒãããã¢ãã«ã®ãã¹ãæå®ããŠãã ããã
MODEL_NAME = "meta-llama/Llama-3-8B" # ãŠãŒã¶ãŒãå
¥åãããã¹ãããã«æ¿å
¥ãããŸã
print(f"ã¢ãã«ãšããŒã¯ãã€ã¶ãŒãããŒãäž: {MODEL_NAME}")
# 4bitéååèšå® (GPUã¡ã¢ãªã®ç¯çŽã«åœ¹ç«ã¡ãŸã)
# bnb_4bit_compute_dtypeã¯ãAmpere以éã®NVIDIA GPUã«æšå¥šãããbfloat16ã䜿çšããŠããŸãã
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4", # NF4 (NormalFloat4) éååã¿ã€ã
bnb_4bit_compute_dtype=torch.bfloat16
)
# ããŒã¯ãã€ã¶ãŒãããŒã
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
# Llama 3ã¯ããã©ã«ãã§bos_tokenãä»äžããªãããšããããããæç€ºçã«è¿œå ã
# ãŸããpadding_side='right'ã¯Llamaã¢ãã«ã«æšå¥šãããèšå®ã§ãã
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# ã¢ãã«ãããŒãããéååèšå®ãé©çšããèªåçã«GPUã«ãããã³ã°ããŸãã
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
quantization_config=bnb_config,
device_map="auto", # å©çšå¯èœãªããã€ã¹ïŒGPUïŒã«èªåçã«ã¢ãã«ã忣
trust_remote_code=True # ãªã¢ãŒãã³ãŒãã®å®è¡ãèš±å¯
)
print("ã¢ãã«ããŒãå®äºã")
# k-bitåŠç¿çšã«ã¢ãã«ãæºå (PEFTã©ã€ãã©ãªã®ãã)
# gradient_checkpointingãæå¹ã«ããããšã§ãã¡ã¢ãªäœ¿çšéãããã«åæžã§ããŸãã
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
print("k-bitåŠç¿çšã«ã¢ãã«ãæºåããŸããã")
# --- 2. ããŒã¿ã»ããã®æºå ---
# ããªãã®ããã¹ãèšäºãã¡ã€ã«ãæ ŒçŽãããŠãããã£ã¬ã¯ããªãæå®ããŸãã
# äŸ: 'your_article_data/' ã®äžã« 'article1.txt', 'article2.txt', ... ãšçœ®ãããŠããå Žå
DATA_DIR = "./your_article_data/" # ãŠãŒã¶ãŒãå
¥åãããã¹ãããã«æ¿å
¥ãããŸã
print(f"ããŒã¿ã»ãããããŒãäž: {DATA_DIR}")
# 'text'圢åŒã§ããŒã¿ã»ãããããŒãããŸããæå®ããããã£ã¬ã¯ããªå
ã®ãã¹ãŠã®.txtãã¡ã€ã«ãèªã¿èŸŒã¿ãŸãã
# åãã¡ã€ã«ã1ã€ã®ãšã³ããªãšããŠæ±ãããŸãã
try:
dataset = load_dataset('text', data_files={'train': os.path.join(DATA_DIR, '*.txt')})
print(f"ããŒã¿ã»ããã®ãµã³ãã«æ°: {len(dataset['train'])}")
except Exception as e:
print(f"ããŒã¿ã»ããã®ããŒãäžã«ãšã©ãŒãçºçããŸããããã£ã¬ã¯ããªãšãã¡ã€ã«åœ¢åŒã確èªããŠãã ãã: {e}")
exit("ããŒã¿ã»ããããŒã倱æã")
# ããŒã¿ã»ãããããŒã¯ã³åãã颿°
# é·ãèšäºãã¢ãã«ã®æå€§å
¥åé·ã«åå²ããŸãã
def tokenize_function(examples):
# Llama 3ã®æå€§å
¥åé·ã¯éåžž8192ã§ãããã䜿ãã®GPUã®VRAMã«åãããŠèª¿æŽããŠãã ããã
# ããã§ã¯äžè¬çãªå€ãšããŠ2048ãèšå®ããŠããŸãã
max_length = 2048
# truncate=Trueã§æå€§é·ãè¶
ããããã¹ããåãæšãŠãŸãã
return tokenizer(examples["text"], truncation=True, max_length=max_length)
# ããŒã¿ã»ãããããŒã¯ã³åããŸãã
# num_procã¯CPUã³ã¢æ°ã«å¿ããŠäžŠååŠçãè¡ããåŠçãé«éåããŸãã
tokenized_dataset = dataset.map(
tokenize_function,
batched=True,
num_proc=os.cpu_count(),
remove_columns=["text"] # å
ã®ããã¹ãåã¯åŠç¿ã«äžèŠã«ãªãããåé€ããŸãã
)
print("ããŒã¿ã»ããã®ããŒã¯ã³åãå®äºããŸããã")
# --- 3. PEFT (LoRA) ã®èšå® ---
# LoRA (Low-Rank Adaptation) ã¯ãå
ã®ã¢ãã«ã®éã¿ãããªãŒãºãã
# å°ããªã¢ããã¿ãŒå±€ã远å ããŠåŠç¿ãããããšã§ãå¹ççã«ãã¡ã€ã³ãã¥ãŒãã³ã°ãè¡ããŸãã
# ããã«ãããGPUã¡ã¢ãªã®äœ¿çšéãæãã€ã€ãé«ãæ§èœãå®çŸã§ããŸãã
lora_config = LoraConfig(
r=16, # LoRAã®ã©ã³ã¯ãå€ã倧ãããããšè¡šçŸåãå¢ãããã¡ã¢ãªæ¶è²»ãå¢ããã
lora_alpha=32, # LoRAã®ã¹ã±ãŒãªã³ã°ä¿æ°ãrã®2åçšåºŠãæšå¥šãããããšãå€ãã§ãã
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], # LoRAãé©çšããå±€ãLlamaã¢ãã«ã§äžè¬çãªå±€ã
bias="none", # ãã€ã¢ã¹ãåŠç¿ããªãèšå®ã
lora_dropout=0.05, # ããããã¢ãŠãçãéåŠç¿ãé²ãããã«èšå®ããŸãã
task_type=TaskType.CAUSAL_LM, # ã¿ã¹ã¯ã¿ã€ããå æèšèªã¢ãã«ã«èšå®ã
)
# ã¢ãã«ã«LoRAã¢ããã¿ãŒã远å ããŸãã
model = get_peft_model(model, lora_config)
print("ã¢ãã«ã«LoRAã¢ããã¿ãŒãé©çšããŸããã")
model.print_trainable_parameters() # åŠç¿å¯èœãªãã©ã¡ãŒã¿æ°ã確èªããŸãã
# --- 4. åŠç¿ã®å®è¡ ---
# åŠç¿æžã¿ã¢ãã«ãä¿åãããã£ã¬ã¯ããª
OUTPUT_DIR = "./llama3_finetuned_model/" # ãŠãŒã¶ãŒãå
¥åãããã¹ãããã«æ¿å
¥ãããŸã
# åŠç¿ã®èšå®
training_args = TrainingArguments(
output_dir=OUTPUT_DIR,
num_train_epochs=3, # ãšããã¯æ°ãããŒã¿ã»ããã®ãµã€ãºãšåžæãã粟床ã«å¿ããŠèª¿æŽããŠãã ããã
per_device_train_batch_size=1, # GPUãããã®ããããµã€ãºãVRAMãå°ãªãå Žåã¯1ã«èšå®ã
gradient_accumulation_steps=4, # åŸé
ãèç©ããã¹ãããæ°ãå®è³ªçãªããããµã€ãºã¯ per_device_train_batch_size * gradient_accumulation_steps ã«ãªããŸãã
optim="paged_adamw_8bit", # 8bit AdamWãªããã£ãã€ã¶ã䜿çšããã¡ã¢ãªå¹çãåäžãããŸãã
save_steps=500, # 500ã¹ãããããšã«ã¢ãã«ãä¿åããŸãã
logging_steps=100, # 100ã¹ãããããšã«ãã°ãåºåããŸãã
learning_rate=2e-4, # åŠç¿çã
fp16=True, # æ··å粟床åŠç¿ãæå¹å (GPUã察å¿ããŠããå Žå)ãVRAMåæžãšé«éåã«å¯äžããŸãã
max_steps=-1, # num_train_epochsã«åºã¥ããŠåŠç¿ããŸãã
group_by_length=True, # åãé·ãã®ã·ãŒã±ã³ã¹ãã°ã«ãŒãåããããã£ã³ã°ãåæžããŸãã
lr_scheduler_type="cosine", # åŠç¿çã¹ã±ãžã¥ãŒã©ãŒã®ã¿ã€ãã
warmup_ratio=0.03, # ãŠã©ãŒã ã¢ããæ¯çã
report_to="none", # ã¬ããŒãå
ãæå®ããªã (wandbãªã©ã䜿çšããªãå Žå)ã
)
# ãã¬ãŒããŒã®åæå
# data_collatorã¯ãã¢ãã«ã®å
¥å圢åŒã«åãããŠããŒã¿ãæŽåœ¢ããŸãã
trainer = Trainer(
model=model,
train_dataset=tokenized_dataset["train"],
args=training_args,
data_collator=lambda data: {
'input_ids': torch.stack([f['input_ids'] for f in data]),
'attention_mask': torch.stack([f['attention_mask'] for f in data]),
'labels': torch.stack([f['input_ids'] for f in data]), # å æèšèªã¢ãã«ã§ã¯ãå
¥åèªäœãã©ãã«ãšãªããŸãã
},
)
# åŠç¿ã®éå§
print("Fine-tuningãéå§ããŸã...")
trainer.train()
print("Fine-tuningãå®äºããŸããã")
# --- 5. åŠç¿æžã¿ã¢ãã«ã®ä¿å ---
# LoRAã¢ããã¿ãŒã®ã¿ãä¿åããŸããããã«ããããã¡ã€ã«ãµã€ãºãå°ãããå¹ççã«ç®¡çã§ããŸãã
trainer.save_model(OUTPUT_DIR)
print(f"åŠç¿æžã¿LoRAã¢ããã¿ãŒã '{OUTPUT_DIR}' ã«ä¿åãããŸããã")
# ä¿åããã¢ããã¿ãŒã䜿ã£ãŠæšè«ãè¡ãæ¹æ³ã®äŸ (ã³ã¡ã³ãã¢ãŠããããŠããŸã):
# ãã®ã³ãŒãã¯ããã¡ã€ã³ãã¥ãŒãã³ã°åŸã«ã¢ãã«ãããŒãããŠæšè«ãè¡ãããã®åèäŸã§ãã
# from peft import PeftModel
#
# # å
ã®ã¢ãã«ãããŒã (åŠç¿æãšåãéååèšå®ã䜿çšããŸã)
# base_model = AutoModelForCausalLM.from_pretrained(
# MODEL_NAME,
# quantization_config=bnb_config,
# device_map="auto",
# trust_remote_code=True
# )
#
# # ä¿åããLoRAã¢ããã¿ãŒãå
ã®ã¢ãã«ã«çµåããŸãã
# peft_model = PeftModel.from_pretrained(base_model, OUTPUT_DIR)
#
# # æšè«ã¢ãŒãã«èšå®ããŸãã
# peft_model.eval()
#
# # ããã¹ãçæã®äŸ
# prompt = "ããŒã«ã«PCã§ã®Llama 3ãã¡ã€ã³ãã¥ãŒãã³ã°ã®å©ç¹ãšã¯"
# inputs = tokenizer(prompt, return_tensors="pt").to("cuda") # å
¥åãGPUã«ç§»å
#
# with torch.no_grad(): # åŸé
èšç®ãç¡å¹åããã¡ã¢ãªäœ¿çšéãåæž
# outputs = peft_model.generate(
# **inputs,
# max_new_tokens=200, # çæããæ°ããããŒã¯ã³ã®æå€§æ°
# do_sample=True, # ãµã³ããªã³ã°ã«ããçæãæå¹å
# top_p=0.9, # Nucleusãµã³ããªã³ã°ã®éŸå€
# temperature=0.7, # çæã®å€æ§æ§ãå¶åŸ¡ããæž©åºŠ
# eos_token_id=tokenizer.eos_token_id # çµäºããŒã¯ã³ID
# )
# print("\n--- çæãããããã¹ã ---")
# print(tokenizer.decode(outputs[0], skip_special_tokens=True))
ææ¥ãžç¶ã