马尔科夫奖励过程的python示例

2023-06-23 00:23:39

import numpy as np

# 状态集合
states = ["Rainy", "Sunny"]

# 行动集合
actions = ["Stay", "Go_out"]

# 概率转移矩阵
transition_probabilities = [
    [0.7, 0.3],
    [0.4, 0.6]
]

# 奖励函数
rewards = [
    [0, 0],
    [5, 0]
]

# 选择最优动作
def get_optimal_action(state):
    if state == "Rainy":
        return "Stay"
    return "Go_out"

# 计算概率
def get_transition_probability(current_state, next_state, action):
    current_state_index = states.index(current_state)
    next_state_index = states.index(next_state)
    action_index = actions.index(action)
    return transition_probabilities[current_state_index][next_state_index]

# 计算奖励
def get_reward(current_state, action):
    current_state_index = states.index(current_state)
    action_index = actions.index(action)
    return rewards[current_state_index][action_index]

# 初始状态
current_state = "Rainy"

# 奖励总和
total_reward = 0

# 模拟5次决策
for i in range(5):
    action = get_optimal_action(current_state)
    reward = get_reward(current_state, action)
    total_reward += reward
    next_state = np.random.choice(states, p=transition_probabilities[states.index(current_state)])
    current_state = next_state

print("Total reward:", total_reward)

马尔科夫奖励过程的python示例

继续阅读

2021-2025年中国运动疗法（KT）带行业市场供需与战略研究报告

cs231n斯坦福基于卷积神经网络的CV学习笔记（一）KNN和线性分类器/分类器损失/反向传播一，KNN图像分类算法二，线性分类器三，线性分类器损失四，反向传播五，神经网络

Small tricks

libsvm for python 安装

2021年危险化学品经营单位安全管理人员考试题库及危险化学品经营单位安全管理人员考试技巧

学习软件测试基础测试第七天

Zeppelin 配置访问 REST APIApache Zeppelin Configuration REST API

【Torch】最简洁logging使用指南

27. Remove Element(列表)题目代码

无人机--飞控科普

Cloud Studio初体验

使用 ctypes 进行 Python 和 C 的混合编程

【python】【数据处理】画多维数据分布图

【python】netconf协议对接管理设备

「Python 网络自动化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 网络设备

在python中创建excel并写入