馬爾科夫獎勵過程的python示例

2023-06-23 00:23:39

import numpy as np

# 狀态集合
states = ["Rainy", "Sunny"]

# 行動集合
actions = ["Stay", "Go_out"]

# 機率轉移矩陣
transition_probabilities = [
    [0.7, 0.3],
    [0.4, 0.6]
]

# 獎勵函數
rewards = [
    [0, 0],
    [5, 0]
]

# 選擇最優動作
def get_optimal_action(state):
    if state == "Rainy":
        return "Stay"
    return "Go_out"

# 計算機率
def get_transition_probability(current_state, next_state, action):
    current_state_index = states.index(current_state)
    next_state_index = states.index(next_state)
    action_index = actions.index(action)
    return transition_probabilities[current_state_index][next_state_index]

# 計算獎勵
def get_reward(current_state, action):
    current_state_index = states.index(current_state)
    action_index = actions.index(action)
    return rewards[current_state_index][action_index]

# 初始狀态
current_state = "Rainy"

# 獎勵總和
total_reward = 0

# 模拟5次決策
for i in range(5):
    action = get_optimal_action(current_state)
    reward = get_reward(current_state, action)
    total_reward += reward
    next_state = np.random.choice(states, p=transition_probabilities[states.index(current_state)])
    current_state = next_state

print("Total reward:", total_reward)

馬爾科夫獎勵過程的python示例

繼續閱讀

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡

Small tricks

libsvm for python 安裝

2021年危險化學品經營機關安全管理人員考試題庫及危險化學品經營機關安全管理人員考試技巧

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

無人機--飛控科普

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入