<a href="http://blog.csdn.net/comaple/article/details/7912529">http://blog.csdn.net/comaple/article/details/7912529</a>
We make statistics of logs and extract useful information from thestatistics in almost real-time with Storm. Logs are read from Kafka-likepersistent message queues into spouts, then processed and emitted over thetopologies to compute desired results, which
are then stored into distributeddatabases to be used elsewhere. Input log count varies from 2 millions to 1.5billion every day, whose size is up to 2 terabytes among the projects. The mainchallenge here is not only real-time processing of big data set; storing
andpersisting result is also a challenge and needs careful design andimplementation.
淘寶使用storm和消息隊列結合,每天能夠處理2百萬到15億條日志,日志量達到2TB的近實時處理。
上周開始學習storm的使用,現在探索出來兩種使用場景。
1, 通過配置drpc伺服器,将storm的topology釋出為drpc服務。用戶端程式可以調用drpc服務将資料發送到storm叢集中,并接收處理結果的回報。這種方式需要drpc伺服器進行轉發,其中drpc伺服器底層通過thrift實作。适合的業務場景主要是實時計算。并且擴充性良好,可以增加每個節點的工作worker數量來動态擴充。
2, 第二種場景是通過beanstalkd來實作資訊的導入,将topology任務送出到storm叢集後可以通過開發beanstalkd用戶端來向叢集中發送資訊,這種方式用戶端收不到結果回報。這個場景适合純粹的資料分析處理的業務場景。
端口可以不用配置,預設是:3772
Nimbus節點的配置:
storm.zookeeper.servers:
- "10.10.249.195"
- "10.10.249.196"
#
# nimbus.host: "nimbus"
## Locations of the drpc servers
drpc.servers:
- "10.10.249.197"
# - "server2"
Supervisor節點的配置:
########### These MUST be filled in for astorm configuration
nimbus.host: "10.10.249.195"
supervisor.slots.ports:
-6700
-6701
- 6702
Drpc伺服器節點配置
該節點隻需配置zookeeper位址即可。預設開放的端口:3772
-"10.10.249.195"
-"10.10.249.196"
啟動drpc服務:./storm drpc
<a href="http://blog.csdn.net/comaple/article/details/7896167"></a>