天天看點

FP基礎

由mapreduce想到function programming, 于是看到programming paradigms, 下面的連結進行了比較, 有興趣可以認真研究一下...

<a href="http://en.wikipedia.org/wiki/comparison_of_programming_paradigms">http://en.wikipedia.org/wiki/comparison_of_programming_paradigms</a>

要注意的是, 程式設計語言和paradigms是兩碼事, 比如python, 可以寫oop, 也可以寫fp, sp, edp…

對于大部分語言都是一樣, 語言隻是工具, 怎樣用看使用者.

現在很多人用c++, 于是大家都說自己在面向對象程式設計, 其實很多人根本就不懂什麼是面向對象, 并不是你用c++就是面向對象程式設計.

paradigm, 範式, 在庫恩的理論中,一個科學的範式(paradigm)就是一套關于現實的假設,這套假設比其他假設更好地說明了當今的世界.

是以你到底使用的是什麼paradigm取決于你對于現實的假設, 和世界觀, 就是你看待問題的角度, 沒有什麼範式是絕對好或壞的, 隻是看待世界的角度不同.

如果你是以面向過程的角度來思考問題的, 那麼無論你拿什麼語言寫出的都是pp的代碼, 是以切換paradigm不是換語言, 而是換一種思考問題的角度和方式.

<a href="http://www.defmacro.org/ramblings/fp.html">http://www.defmacro.org/ramblings/fp.html</a>

這篇blog深入淺出的解釋了fp, 做下筆記吧.

fp比較難于了解, 作者想用比較簡單的方式來描述, 不過不太成功, 還是比較難于了解呵呵.

從plato的散步, 得出如下結論,

plato suggested that everything in our world is just an approximation of perfection. he also realized that we understand the concept of perfection even though we never encountered it. he came to conclusion that perfect mathematical forms must live in another world and that we somehow know about them by having a connection to that "alternative" universe.

描述了現實和理想的差距, 現實世界中沒有所謂的完美的東西, 找不到完全一樣的兩片樹葉, 找不到一個完美的圓...但是人們卻能夠了解完美并且用數學等式把它表示出來, 這個确實是非常不可思議的事. 人怎麼能夠了解他從來沒有見過的東西... 是以mathematics是很牛比的, 它可以用來描述現實中不存在的perfect 

是以數學也是哲學, 因為他在現實中并不存在, 隻是一種存在于大腦中的形式系統或演算系統... 是以也和哲學一樣, 關鍵是提出問題而不是給出答案, 因為可以給出答案的就不是哲學問題, 而是現實問題. 

much of the consensus revolves around the fact that mathematics is really a puzzle: we set up a set of basic non-conflicting principles and a set of rules on how to operate with these principles. we can then stack these rules together to come up with more complex rules. mathematicians call this method a "formal system" or a "calculus".

讓我們回到美國大蕭條時期, 在大家都沒有飯吃的時候, 在princeton university有4個牛比的人天天大魚大肉, alonzo church, alan turing, john von neumann, and kurt gödel. 

他們共同的興趣就是形式系統, 而不是現實世界, 他們關注計算問題, 想回答下面的問題...

the four men were interested in formal systems. they didn't pay much heed to the physical world, they were interested in dealing with abstract mathematical puzzles instead.

their puzzles had something in common: the men were working on answering questions about computation.

if we had machines that had infinite computational power, what problems would we be able to solve? could we solve them automatically? could some problems remain unsolved and why? would various machines with different designs be equal in power?

alonzo church提出一種叫做lamda演算的形式系統, 這個系統本質是一種在某種想象中的機器上的程式設計語言(turing machine也是一樣, 隻不過更幸運, turing想象中的機器模型後來被von neumann實作了, 并在現在成為主流). function在greek語中就是lambda, 是以lamda演算其實就是函數演算, 以函數作為函數的參數和傳回值. 

lambda calculus作為fp的理論基礎, 現在也越來越被重視...

in cooperation with other men alonzo church developed a formal system called lambda calculus. the system was essentially a programming language for one of these imaginary machines. it was based on functions that took other functions as parameters and returned functions as results. the function was identified by a greek letter lambda, hence the system's name. using this formalism alonzo was able to reason about many of the above questions and provide conclusive answers.

independently of alonzo church, alan turing was performing similar work. he developed a different formalism (now referred to as the turing machine), and used it to independently come to similar conclusions as alonzo.

本來也許故事就到此結束了...可是二戰爆發了, 軍方對大規模計算有迫切的需求, 那個時候是為了調整火炮的精度, 這個背景下computer得到大力的發展 

the first machine to solve ballistic tables was a mark i built by ibm - it weighed five tons, had 750,000 parts and could do three operations per second.

in 1949 an electronic discrete variable automatic computer (edvac) was unveiled and had tremendous success. it was a first example of von neumann's architecture and was effectively a real world implementation of a turing machine. for the time being alonzo church was out of luck.

大家采用了turing的形式系統架構, 是以alonzo church運氣不好, 成為了非主流... 

一直到人工智能之父john mccarthy, 從新發現了它的價值, 并基于它實作了lisp語言(implementation of alonzo's lambda calculus that worked on von neumann computers)

in late 1950s an mit professor john mccarthy (also a princeton graduate) developed interest in alonzo church's work. in 1958 he unveiled a list processing language (lisp). lisp was an implementation of alonzo's lambda calculus that worked on von neumann computers! many computer scientists recognized the expressive power of lisp. in 1973 a group of programmers at mit's artificial intelligence lab developed hardware they called a lisp machine - effectively a native hardware implementation of alonzo's lambda calculus!

functional programming隻是lambda calculus的一個實作版本, 但無法涵蓋所有的方面, 因為lambda calculus是理想中的理想系統, 而在實體世界中, 太多的限制. 

functional programming is a practical implementation of alonzo church's ideas. not all lambda calculus ideas transform to practice because lambda calculus was not designed to work under physical limitations.

you're probably thinking that there's no way i can rationalize the monstrosity of a function above. when i was learning functional programming i was thinking that too. i was wrong. there are very good arguments for using this style. some of them are subjective. for example, people claim that functional programs are easier to understand. i will leave out these arguments because every kid on the block knows that ease of understanding is in the eye of the beholder. fortunately for me, there are plenty of objective arguments.

since every symbol in fp is final, no function can ever cause side effects. you can never modify things in place, nor can one function modify a value outside of its scope for another function to use (like a class member or a global variable). that means that the only effect of evaluating a function is its return value and the only thing that affects the return value of a function is its arguments. 

沒有side effects, 是以每個funciton都可以便于ut, 一般oo, 如果function裡面有很多db, i/o, 就無法進行獨立ut

if a functional program doesn't behave the way you expect it to, debugging it is a breeze. you will always be able to reproduce your problem because a bug in a functional program doesn't depend on unrelated code paths that were executed before it. in an imperative program a bug resurfaces only some of the time. because functions depend on external state produced by side effects from other functions you may have to go through a series of steps in no way related to the bug. in a functional program this isn't the case - if a return value of a function is wrong, it is always wrong, regardless of what code you execute before running the function.

once you reproduce the problem, getting to the bottom of it is trivial. it is almost pleasant. you break the execution of your program and examine the stack. every argument in every function call in the stack is available for your inspection, just like in an imperative program.

walking through the stack you look at arguments passed to functions and their return values. the minute a return value doesn't make sense you step into the offending function and walk through it. you repeat this recursively until the process leads you to the source of the bug!

這個應該是fp最大的好處, 可以把程式員從目前的測試和debug噩夢中解救出來...尤其c,c++程式員應該深有體會, 大型系統的debug有時候那種生不如死的感覺...

最主要就是fp執行過程中, 不會受外界幹擾, 是以沒有不确定性, issue出現一次, 就會每次出現.

而現在主流語言的debug, 最大的問題就是常常issue不可重制, 因為在執行過程中有太多的外部因素會影響到結果, 比如class member or a global variable, 是以有很大的不确定性.

是以fp的quality更容易保證, ut也更有效, debug也更容易.

a functional program is ready for concurrency without any further modifications. you never have to worry about deadlocks and race conditions because you don't need to use locks! no piece of data in a functional program is modified twice by the same thread, let alone by two different threads. that means you can easily add threads without ever giving conventional problems that plague concurrency applications a second thought!

the concurrency story doesn't stop here. if your application is inherently single threaded the compiler can still optimize functional programs to run on multiple cpus. take a look at the following code fragment:

in a functional language the compiler could analyze the code, classify the functions that create strings s1 and s2 as potentially time consuming operations, and run them concurrently.

這個是fp的又一大優勢, 因為程式執行不依賴于, 外部狀态, 是以并發不需要任何race conditions或lock.

而且更牛比的是, fp在執行時compiler會将程式自動的并發處理. 這個對目前, 多核處理器而言, 無疑可以大大提高性能. 

an ideal situation is updating relevant parts of the code without stopping any part of the system at all. in an imperative world this isn't possible. consider unloading a java class at runtime and reloading a new definition. if we were to do that every instance of a class would become unusable because the state it holds would be lost.

in a functional program all state is stored on the stack in the arguments passed to functions. this makes hot deployment significantly easier! in fact, all we'd really have to do is run a diff between the code in production and the new version, and deploy the new code. the rest could be done by language tools automatically! if you think this is science fiction, think again. erlang engineers have been upgrading live systems without stopping them for years.

an interesting property of functional languages is that they can be reasoned about mathematically. since a functional language is simply an implementation of a formal system, all mathematical operations that could be done on paper still apply to the programs written in that language. 

本身就是形式系統, 便于直接進行數學計算

functions that operate on other functions (accept them as arguments) are called higher order functions.

how, and when, do you use higher order functions?

when you see that a particular piece of code is repeated, you break it out into a function (fortunately they still teach this in schools). if you see that a piece of logic within your function needs to behave differently in different situations, you break it out into a higher order function.

舉個例子, 如下

imperative programming

function programming

雖然名字比較吓人, 其實是個非常簡單的操作, 以減少參數為目的的函數封裝 

怎麼用? 

一般做開發的時候都講究通用性, 希望寫一份代碼可以用于所有的場合 

但是使用的時候, 每次都從通用的代碼開始用, 會比較麻煩... 

是以在fp中會先定義一個通用的function, f(x,y,z,w) 

而對于某些場景, 比如x,y都是固定的, 其實隻需要指定z, w, 是以每次調用都指定4個參數, 會覺得很麻煩... 

做法就是做層封裝, 

j(z, w) {f(x=1, y=2, z, w)} 

這樣調用就很友善, 這個過程就叫做curry… 

當然你也可以說, 這個是一種adapter pattern

adapter pattern is best known when applied to the "default" abstraction unit in java - a class. in functional languages the pattern is applied to functions. the pattern takes an interface and transforms it to another interface someone else expects. here's an example of an adapter pattern:

in academic circles this trivial technique is called currying (after a logician haskell curry who performed mathematical acrobatics necessary to formalize it). because in fp functions (as opposed to classes) are passed around as arguments, currying is used very often to adapt functions to an interface that someone else expects. since the interface to functions is its arguments, currying is used to reduce the number of arguments (like in the example above).

functional languages come with this technique built in.

as you can see, we've simply created a wrapper for the original function. in fp currying is just that - a shortcut to quickly and easily create wrappers.

lazy (or delayed) evaluation is an interesting technique that becomes possible once we adopt a functional philosophy.

we will discuss the advantages here.

lazy evaluation provides a tremendous potential for optimizations. a lazy compiler thinks of functional code exactly as mathematicians think of an algebra expression - it can cancel things out and completely prevent execution, rearrange pieces of code for higher efficiency, even arrange code in a way that reduces errors, all guaranteeing optimizations won't break the code.

lazy evaluation provides a higher order of abstraction that allows implementing things in a way that would otherwise be impossible.

lazy languages allow for definition of infinite data structures, something that's much more complicated in a strict language. for example, consider a list with fibonacci numbers.

of course there ain't no such thing as a free lunch(tm). lazy evaluation comes with a number of disadvantages. mainly that it is, well, lazy. many real world problems require strict evaluation. for example consider the following:

in a lazy language you have no guarantee that the first line will be executed before the second!

這個确實是比較難于了解的概念, 如果基于原文, 你基本不可能了解什麼是closure…是以隻能google… 

<a href="http://www.ibm.com/developerworks/cn/linux/l-cn-closure/index.html">http://www.ibm.com/developerworks/cn/linux/l-cn-closure/index.html</a>

閉包(closure)是詞法閉包(lexical closure)的簡稱, 閉包是由函數和與其相關的引用環境組合而成的實體

oo程式設計範式中的對象是“整合了函數的資料對象”,那麼閉包就是“整合了資料的函數對象”  借用一個非常好的說法來做個總結:對象是附有行為的資料,而閉包是附有資料的行為。 

可以實作閉包的語言必須具備一下特性:

函數是一階值(first-class value),即函數可以作為另一個函數的傳回值或參數,還可以作為一個變量的值。

函數可以嵌套定義,即在一個函數内部可以定義另一個函數

上面是定義, 下面看個例子, 

powerfn本身是一個function, 但是他用到了外部變量power, 是以powerfn和power的結合的實體, 就是一個閉包. 包含funciton和他的上下文. 

這兒有趣的是, 你在調用square的時候, 作為makepowerfn參數的power本應該早就不存在了(因為makepowerfn調用結束, power就會被抛出棧), 而實際上這兒确可以得到power的值. 

這就取決于閉包的實作, 它儲存了powerfn函數當時的上下文.

closures是怎樣實作的? 這些資料不再被簡單的作用域和生命周期所限制, 是以他們一定不做stack上, 他們必須被實作在heap上. 是以closure被實作為a function, 加上an additional reference指向環境變量. 

是不是有點象面向對象?資料和操作的結合, 這點上面已經說過

the first observation is that local variables are no longer limited to simple scope rules and have an undefined lifetime. the obvious conclusion is that they're no longer stored on the stack - they must be stored on the heap instead. a closure, then, is implemented just like a function we discussed earlier, except that it has an additional reference to the surrounding variables.

閉包有啥用? 更好的抽象, 簡化代碼, 加強子產品化, 不外乎這些...

closures和curry  會不會覺得curry和closures有些相似?  上面的例子也可以用curry簡單的達到同樣的目的 function square( int base) { return pow (base, 2)} 對于這個, 我覺得, closures和curry其實不是一個層面的概念,  closures是定義了一種實體, 類似oo裡面的對象的概念, 從資料和方法結合的角度  curry是定義了一種操作和方法, curry可以用closures去實作, 也可以不用, 參考上面的例子.

我們通常知道的function, 比如下面的例子, example調用add 

在example的代碼段執行到add, 發現是function, 跳轉到add的代碼段執行 

執行完, 将局部變量pop退棧, 然後将傳回值付給example的局部變量i, 最後傳回到example的代碼段, 繼續執行 

發現下個函數是square, 再繼續上面的過程

而所謂的continuation, 當add執行完, 不用傳回到example的代碼段, 而是直接跳轉到square的代碼段, 并将add傳回值作為輸入, 這樣效率上應該高些 

怎麼實作? 

傳統的函數在add裡面你是不知道下個函數的代碼段指針的, 是以如果要實作continuation, 隻需要在調用add的時候, 把調用序列也作為參數傳入, 這樣add就不需要傳回到example, 而可以直接跳轉. 

when we learned about functions we only learned half truths based on a faulty assumption that functions must return their value to the original caller. 

in this sense continuations are a generalization of functions. a function must not necessarily return to its caller and may return to any part of the program. 

a "continuation" is a parameter we may choose to pass to our function that specifies where the function should return. 

the description may be more complicated than it sounds. take a look at the following code:

我們可以用cps的方法去寫所有的程式, 因為普通函數其實就是continuation的一種特殊形式, 即跳轉到caller, 而實際上很多編譯器就是這麼做優化的

we can write entire programs in cps(continuation passing style), where every function takes an extra continuation argument and passes the result to it. we can also convert any program to cps simply by treating functions as special cases of continuations (functions that always return to their caller). this conversion is trivial to do automatically (in fact, many compilers do just that). 

cps的方式不需要stack, 我們隻需要簡單的将局部變量放到任一memory裡面即可. 

stack主要用于嵌套調用, 因為需要不斷的保持調用過程中所有function的執行環境, 以便下層function傳回後可以繼續執行. 

但對于cps, 沒有傳回, 隻是一直執行下去, 那麼就沒有保留之前環境的必要, 是以确實不需要stack 

cps version needs no stack! no function ever "returns" in the traditional sense, it just calls another function with the result instead. 

we don't need to push function arguments on the stack with every call and then pop them back, we can simply store them in some block of memory and use a jump instruction instead. 

we'll never need the original arguments - they'll never be used again since no function ever returns!

in what situations are continuations useful? usually when you're trying to simulate state in an application of inherently stateless nature to ease your life. a great application of continuations are web applications.

從下面的例子可以看出什麼意思, 你可以把函數中的多分支拆分成多函數, 當然下面的例子, 你根本看不出這樣做的好處, 但是當if或swith的分支達到幾百個的時候, 你就知道維護這樣一個龐大的條件判斷是一件多麼恐怖的事情. 而用這種方式可以簡單的把分支拆開, 以便于管理.

let's dive into pattern matching with an example. here's a fibonacci function in java:

and here's an example of a fibonacci function in our java-derived language that supports pattern matching:

when is pattern matching useful?  in a surprisingly large number of cases! every time you have a complex structure of nested ifs, pattern matching can do a better job with less code on your part.  another benefit of pattern matching is that if you need to add or modify conditions, you don't have to go into one huge function.  you simply add (or modify) appropriate definitions. this eliminates the need for a whole range of design patterns from the gof book. the more complex your conditions are, the more pattern matching will help you. once you're used to it, you start wondering how you ever got through your day without it.

這兒說的讓gof的很多設計模式變的沒有價值, 最典型的就是工廠模式, 通過類分裝和拆分條件分支, 而對于fp, 通過更簡單的pattern matching就可以做到

本文章摘自部落格園,原文釋出日期:2012-05-11 15:40