這是個計算引擎, 號稱" 4G of Big Data" (note1), 快, 易用,開源, 效能佳, 但沒有儲存系統
- Batch Processing
- Interactive processing
- Real-time stream processing
- Graph Processing
- Iterative Processing
- In-memory processing
Flink is an alternative of Mapreduce, it processes data more than 100 times faster than MapReduce.
Flink is independant from hadoop but it can use hdfs to read, write, store, process the data. Flink does not provide its own data storage system.it takes data from distributed storage.
Flink ecosystem: (note2)
- HDFS – Hadoop Distributed File System
- Local-FS – Local File System
- S3 – Simple Storage Service from Amazon
- HBase – NoSQL Database in Hadoop ecosystem
- MongoDB – NoSQL Database
- RBDBMs – Any relational database
- Kafka – Distributed messaging Queue
- RabbitMQ – Messaging Queue
- Flume – Data Collection and Aggregation Tool
Deploy: 能分配部署資源 :
- Local mode – On single node, in single JVM
- Cluster – On multi-node cluster, with following resource manager
- Standalone – This is the default resource manager which is shipped with Flink
- YARN – This is very popular resource manager, it is part of Hadoop, introduced in Hadoop 2.x
- Mesos – This is a generalized resource manager.
- Cloud – on Amazon or Google cloud
the Distributed Streaming Dataflow, which is also called as kernel of Apache Flink. This is the core layer of flink which provides distributed processing, fault tolerance, reliability, native iterative processing capability, etc.
- Streaming – Flink is a true stream processing engine.
- High performance – Flink’s data streaming Runtime provides very high throughput
- Low latency – Flink can process the data in sub-second range without any delay
- Event Time and Out-of-Order Events – Flink supports stream processing and windowing where events arrive delayed or out of order
- Lightning fast speed – Flink processes data at lightning fast speed (hence also called as 4G of Big Data)
- Fault Tolerance – Failure of hardware, node, software or a process doesn’t affect the cluster
- Memory management – Flink works in managed memory and never get out of memory exception
- Broad integration – Flink can be integrated with various storage system to process their data, it can be deployed with various resource management tools. It can also be integrated with several BI tools for reporting
- Stream processing – Flink is a true streaming engine, can process live streams in sub-second interval
- Program optimizer – Flink is shipped with an optimizer, before execution of a program it is optimized
- Scalable – Flink is highly scalable. With increasing requirements we can scale flink cluster
- Rich set of operators – Flink has lots of pre-defined operators to process the data. All the common operations can be done using these operators
- Exactly-once Semantics – It can maintain custom state during computation
- Highly flexible Streaming Windows – In flink we can customize windows by triggering conditions flexibly, to get required streaming patterns. We can create window according to time t1 to t5 and data driven windows.
- Continuous streaming model with backpressure – Data streaming applications are executed with continuous (long lived) operators. Flink’s streaming engine naturally handles backpressure.
- One Runtime for Streaming and Batch Processing – Batch processing and data streaming both have common runtime in flink
- Easy and understandable Programmable APIs – Flink’s APIs are developed in a way to cover all the common operations, so programmers can use it efficiently.
- Little tuning required – Requires no memory, network, serializer to configure
初看這Apache Flink, 電視台轉型需用到,以往直播用SNG車, 上衛星, 現在改串流技術, 光這樣成本就不知省多少,用途滿廣, 也可處理髒資料,推薦產品用, 作預測.
- To Install Apache Flink on Linux follow this installation Guide.
- To Install Apache Flink on Windows follow this installation Guide.