Apache Flink

這是個計算引擎, 號稱" 4G of Big Data" (note1), 快, 易用,開源, 效能佳, 但沒有儲存系統

  • Batch Processing
  • Interactive processing
  • Real-time stream processing
  • Graph Processing
  • Iterative Processing
  • In-memory processing

Flink is an alternative of Mapreduce, it processes data more than 100 times faster than MapReduce.

Flink is independant from hadoop but it can use hdfs to read, write, store, process the data. Flink does not provide its own data storage system.it takes data from distributed storage.

Flink  ecosystem:   (note2)


 Storage: 讀寫別家的資料庫大概都沒什麼問題

  • HDFS – Hadoop Distributed File System
  • Local-FS – Local File System
  • S3 – Simple Storage Service from Amazon
  • HBase – NoSQL Database in Hadoop ecosystem
  • MongoDB – NoSQL Database
  • RBDBMs – Any relational database
  • Kafka – Distributed messaging Queue
  • RabbitMQ – Messaging Queue
  • Flume – Data Collection and Aggregation Tool


Deploy: 能分配部署資源 :

  • Local mode – On single node, in single JVM
  • Cluster – On multi-node cluster, with following resource manager
    • Standalone – This is the default resource manager which is shipped with Flink
    • YARN – This is very popular resource manager, it is part of Hadoop, introduced in Hadoop 2.x
    • Mesos – This is a generalized resource manager.
  • Cloud – on Amazon or Google cloud

Runtime :

the Distributed Streaming Dataflow, which is also called as kernel of Apache Flink. This is the core layer of flink which provides distributed processing, fault tolerance, reliability, native iterative processing capability, etc.





  • Streaming – Flink is a true stream processing engine.
  • High performance – Flink’s data streaming Runtime provides very high throughput
  • Low latency – Flink can process the data in sub-second range without any delay
  • Event Time and Out-of-Order Events – Flink supports stream processing and windowing where events arrive delayed or out of order
  • Lightning fast speed – Flink processes data at lightning fast speed (hence also called as 4G of Big Data)
  • Fault Tolerance – Failure of hardware, node, software or a process doesn’t affect the cluster
  • Memory management – Flink works in managed memory and never get out of memory exception
  • Broad integration – Flink can be integrated with various storage system to process their data, it can be deployed with various resource management tools. It can also be integrated with several BI tools for reporting
  • Stream processing – Flink is a true streaming engine, can process live streams in sub-second interval
  • Program optimizer – Flink is shipped with an optimizer, before execution of a program it is optimized
  • Scalable – Flink is highly scalable. With increasing requirements we can scale flink cluster
  • Rich set of operators – Flink has lots of pre-defined operators to process the data. All the common operations can be done using these operators
  • Exactly-once Semantics – It can maintain custom state during computation
  • Highly flexible Streaming Windows – In flink we can customize windows by triggering conditions flexibly, to get required streaming patterns. We can create window according to time t1 to t5 and data driven windows.
  • Continuous streaming model with backpressure – Data streaming applications are executed with continuous (long lived) operators. Flink’s streaming engine naturally handles backpressure.
  • One Runtime for Streaming and Batch Processing – Batch processing and data streaming both have common runtime in flink
  • Easy and understandable Programmable APIs – Flink’s APIs are developed in a way to cover all the common operations, so programmers can use it efficiently.
  • Little tuning required – Requires no memory, network, serializer to configure

初看這Apache Flink, 電視台轉型需用到,以往直播用SNG車, 上衛星, 現在改串流技術,  光這樣成本就不知省多少,用途滿廣, 也可處理髒資料,推薦產品用, 作預測.


(note1: http://data-flair.training/blogs/apache-flink-production-fortune-500-companies-top-real-world-use-cases/)

(note2: data-flair.training/blogs/apache-flink-comprehensive-guide-tutorial-for-beginners/)







  • for IaaS

OpenStack是美國國家航空暨太空總署Rackspace共同打造的雲端開源軟體,以Apache許可證授權,並且是一個自由軟體和開放原始碼項目,來打造基礎設施即服務(Infrastructure as a Service) (note2)

  • 3個 大modules:

運算模組網通模組儲存模組,加上一套集中式管理的儀表板模組,來組合成一套OpenStack共享服務,並且以提供虛擬機方式,對外帶來運算資源,以便利彈性擴充或調度 (note1))

所以網管用的, 網通硬體, 程式化,虛擬化是必然. 硬體的軟體化, 虛擬化.ek4

各模組(套件), 請看 (note1)


  • 網通模組(套件,module) :Neutron



類似 Amazon AWS 的 VPC。



  • Nova運算專案[1]
  • Swift物件導向數據存貯專案[2]
  • Glance虛擬機器磁碟映像檔(Virtual Machine Image)傳送服務[3] [4]
  • Horizon- 提供簡易Web界面和管理控制台[5]
  • Cinder – 提供Block資料存取
  • Keystone – 提供身份驗證機制
  • Neutron – 提供網路管理功能
  • Trove – 提供資料庫管理功能
  • Sahara – 提供海量資料運算佈署功能
  • Ceilometer – 提供計量與監控功能
  • Heat – 提供自動延展虛擬機功能


  • Trove 資料庫服務套件 (Database as a Service)

Trove主要負責銜接簡化實際資料庫的使用,提供OpenStack各個服務一個具延展性且可靠的雲端資料庫服務(Cloud Database-as-a-Service),Database服務包含了銜接傳統關聯式資料庫與新興非關聯式資料庫.

  用 Ubuntu os .

這是好東西, 對的方向, 省錢, 加值. 亞洲接受度滿快的.


(note1: https://kairen.gitbooks.io/openstack-liberty/content/conceptions/index.html)

(note2: https://zh.wikipedia.org/wiki/OpenStack)


(OpenStack 資源整理:https://kairen.gitbooks.io/openstack-liberty/content/openstack-resource.html)

Linux alternative




WORD PRESS 告 WIX (note2)

Wix used Automattic’s GPL’d Rich Text Editor project in their app, then open sourced the modified and updated version (but not the complete app) on GitHub under the permissive MIT license.

  • The GNU: General Public License (GNU GPL or GPL)

a widely used free software license, which guarantees end users the freedom to run, study, share and modify the software (note1)

The GPL is a copyleft license (a play on the word copyright), also called a viral license.

This means regardless of the amount of GPL’d components you use in your code, you have to release its source code, as well as the rights to modify and distribute the entire code.

Furthermore, according to the GPL family of licenses, you also need to release the source code under “the same GPL license" (hence the name viral license). (note2)

所以Wix 用了GPL的codes 到自己的產品裡,  在Git Hub 開放產品的改版,但卻掛MIT的license

要用 Open Source code 可以, 但用完也要開放code 給別人用, 這是GPL的精神.

because it includes GPL code and you distributed the app, the “entire" thing needs to be GPL.”

  • 問題點:

how much of your software do you need to release as open source under the GPL license

The GPL v2:  requires you to share the code of all ‘derivative work’.

  • 問題點: ‘derivative work’ 的定義


(note1: https://en.wikipedia.org/wiki/GNU_General_Public_License)
(note2: http://www.whitesourcesoftware.com/whitesource-blog/wordpress-wix-fiasco/?utm_source=linkedin&utm_medium=social&utm_term=blog-wordpress-wix-fiasco&utm_content=blog-wordpress-wix-fiasco&utm_campaign=pat-gr)



Asana 我已有用了,


OTRS:  a powerful open source tool that can handle nearly any size company and any amount of field engineers or internal IT staff. Every ticket generated by the system retains a history and can automatically generate an alert email to the assigned technician. 這OTRS 是個提醒工作的工具吧, 很閒的人應該會很想用用看吧…

FreshBooks : 會計解決方案, 開發票用

Collabtive: 開源專案管理工具

其他工具, 有會計, 有專案管理, 在企業裡應請資訊長評估,適用,導入.

open source 工具, 大企業會有些疑慮, 但也許很適合創業者及中小企業運用

將這篇文收藏起來, 總有一天會考慮用這些好用的工具.

%d 位部落客按了讚: