天天看點

Unity ML-Agents工具包v0.4和Udacity深度強化學習納米學位

We are happy to announce the release of the latest version of ML-Agents Toolkit: v0.4. It contains a number of features, which we hope everyone will enjoy.

我們很高興宣布釋出ML-Agents工具包的最新版本:v0.4。 它包含許多功能,我們希望每個人都會喜歡。

It includes the option to train your environments directly from the editor, rather than as built executables, making iteration time much quicker. In addition, we are introducing a set of new challenging environments, as well as algorithmic improvements to help the agents learn to solve tasks that might previously only be learned with great difficulty or in some cases not at all. You can try out the new release by going to our GitHub release page. More exciting news –  we are partnering with Udacity to launch an online education program – Deep Reinforcement Learning Nanodegree. Read on below to learn more.

它包括直接從編輯器而不是作為内置可執行檔案訓練您的環境的選項,進而使疊代時間更快。 另外,我們将引入一組新的具有挑戰性的環境,以及算法上的改進,以幫助代理學習解決以前可能很難學習甚至在某些情況下根本無法學習的任務。 您可以轉到我們的GitHub釋出頁面嘗試新版本。 更多令人振奮的消息–我們正在與Udacity合作推出線上教育計劃– 深度強化學習納米學位 。 在下面閱讀以了解更多資訊。

環境環境 (Environments)

We include two new environments with our latest release: Walker and Pyramids. Walker is physics-based humanoid ragdoll and Pyramids is a complex sparse-reward environment.

我們的最新版本包括兩個新環境:Walker和Pyramids。 沃克(Walker)是基于實體學的人形布娃娃,而金字塔(Pyramids)是一個複雜的稀疏獎勵環境。

沃克 (Walker)

示範位址

The first new example environment we are including is called “Walker.” It contains agents which are humanoid ragdolls. They are completely physics-based, so the goal is for the agent to learn to control its limbs in a way that can allow it to walk forward. It learns this… with somewhat humorous results. Since there are many degrees of freedom in the agent’s body, we think this can serve as a great benchmark for Reinforcement Learning algorithms that research might develop.

我們包括的第一個新示例環境稱為“ Walker”。 它包含的是類人動物布娃娃。 它們完全基于實體學,是以目标是使代理學習以可以使其前進的方式控制肢體。 它學習到了……有些幽默的結果。 由于代理人的身體有許多自由度,我們認為這可以作為研究可能開發的強化學習算法的一個很好的基準。

金字塔型 (Pyramids)

示範位址

The second new environment is called “Pyramids.” It features the return of our favorite blue cube agent. Rather than collecting bananas or hopping over walls, this time around the agent has to get to a golden brick atop a pyramid of other bricks. The trick, however, is that this pyramid only appears once a randomly placed switch has been activated. The agent only gets a positive reward upon reaching the brick, making this a very sparse-rewarding environment.

第二個新環境稱為“金字塔”。 它具有我們最喜歡的藍色立方體代理的回報。 這次不是圍繞收集香蕉,也不是在牆壁上跳來跳去,這一次,代理商必須走到其他磚塊金字塔頂上的金磚塊上。 但是,訣竅在于,僅當激活随機放置的開關後,此金字塔才會出現。 代理商隻有在到達磚頭時才能獲得積極的回報,這使它成為一個稀疏的獎勵環境。

其他環境變化 (Additional environment variations)

Additionally, we are providing visual observation and imitation learning versions of many of our existing environments. The visual observation environments, in particular, are designed as a challenge for researchers interested in benchmarking neural network models which utilize convolutional neural networks (CNNs).

此外,我們提供了許多現有環境的視覺觀察和模仿學習版本。 尤其是,對于希望對使用卷積神經網絡(CNN)的神經網絡模型進行基準測試感興趣的研究人員而言,視覺觀察環境尤其面臨挑戰。

To learn more about our provided example environments, follow this link.

要了解有關我們提供的示例環境的更多資訊,請單擊此連結 。

好奇心改善學習 (Improved learning with Curiosity)

To help agents solve tasks in which the rewards are fewer and far between, we’ve added an optional augmentation to our PPO algorithm. That augmentation is an implementation of the Intrinsic Curiosity Module, as described in this research paper from last year. In essence, the addition allows the agent to reward itself using an intrinsic reward signal based on how surprised it is by the outcome of its actions. This will enable it to more easily and frequently solve very sparse-reward environments, such as the Pyramid environment described above.

為了幫助代了解決獎勵少而又差的任務,我們在PPO算法中添加了可選的增強功能。 如去年的這篇研究論文所述, 這種增強是内在好奇心子產品的實作。 從本質上講,這種加法允許代理根據其行動結果的驚訝程度,使用内在的獎勵信号來獎勵自己。 這将使它能夠更輕松,更頻繁地解決獎勵稀疏的環境,例如上述的金字塔環境。

編輯教育訓練 (In-Editor training)

One feature which has been requested since the announcement of ML-Agents toolkit is the ability to perform training from within the Unity Editor. We are happy to be taking the first step toward that goal in this release.  It is now possible to simply launch the

learn.py

script, and then press the “play” button from within the editor to perform training. This will allow training to happen without having to build an executable and allows for faster iterations. We think this will save our users a lot of time, as well as shortening the gap between traditional game development workflows and the ML-Agents training process. This is made possible by a revamping of our communication system. Our improvements to the developer workflow will not stop here though. This is just the first step toward even closer integration with the Unity Editor which will be rolling out throughout 2018.

自釋出ML-Agents工具包以來,要求提供的一項功能是能夠在Unity編輯器中執行教育訓練。 我們很高興在此版本中朝着該目标邁出第一步。 現在可以簡單地啟動

learn.py

腳本,然後從編輯器中按“播放”按鈕進行教育訓練。 這樣就可以進行教育訓練,而不必建構可執行檔案,并且可以加快疊代速度。 我們認為這将為我們的使用者節省大量時間,并縮短了傳統遊戲開發工作流程與ML-Agents教育訓練過程之間的差距。 通過改造我們的通信系統,這成為可能。 不過,我們對開發人員工作流程的改進将不止于此。 這隻是與Unity Editor進一步緊密內建的第一步,該編輯器将于2018年全年推出。

TensorFlowSharp更新 (TensorFlowSharp upgrade)

Lastly, we are happy to share that the TensorFlowSharp plugin has now been upgraded from 1.4 to 1.7.1. This means that developers and researchers can now use Unity ML-Agents Toolkit with models built using the near-latest version of TensorFlow and maintain compatibility between the models they train and the models they can embed into Unity projects. We have also improved our documentation around creating Android and iOS executables which take advantage of ML-Agents toolkit. You can check it out here.

最後,我們很高興分享TensorFlowSharp插件現已從1.4更新到1.7.1。 這意味着開發人員和研究人員現在可以将Unity ML-Agents工具包與使用最新版本的TensorFlow建構的模型一起使用,并保持他們訓練的模型與可以嵌入到Unity項目中的模型之間的相容性。 我們還改善了有關使用ML-Agents工具包建立Android和iOS可執行檔案的文檔。 您可以在這裡檢視 。

Udacity深度強化學習納米學位 (Udacity Deep Reinforcement Learning Nanodegree)

We are proud to announce that we are partnering with Udacity on a new nanodegree to help students and our community of users who want a deeper understanding of reinforcement learning.  This Udacity course uses ML-Agents toolkit as a way to illustrate and teach the various concepts. If you’ve been using ML-Agents toolkit or want to know the math, algorithms, and theories behind reinforcement learning, sign up.

我們很自豪地宣布,我們正在與Udacity合作開發一種新的納米學位,以幫助希望更深入地了解強化學習的學生和使用者社群。 該Udacity課程使用ML-Agents工具包作為說明和教授各種概念的方式。 如果您一直在使用ML-Agents工具包,或者想了解強化學習背後的數學,算法和理論,請注冊 。

示範位址

回報 (Feedback)

In addition to the features described above, we’ve also improved the performance of PPO, fixed a number of bugs, and improved the quality of tests provided with the ML-Agents codebase. As always, we welcome any feedback which you might have. Feel free to reach out to us on our GitHub issues page, or email us directly at [email protected].

除了上述功能之外,我們還提高了PPO的性能,修複了許多錯誤,并提高了ML-Agents代碼庫提供的測試品質。 與往常一樣,我們歡迎您提供任何回報。 請随時在我們的GitHub問題頁面上與我們聯系,或直接通過[email protected]向我們發送電子郵件。

翻譯自: https://blogs.unity3d.com/2018/06/19/unity-ml-agents-toolkit-v0-4-and-udacity-deep-reinforcement-learning-nanodegree/