Apache Flink

這是個計算引擎, 號稱" 4G of Big Data" (note1), 快, 易用,開源, 效能佳, 但沒有儲存系統

  • Batch Processing
  • Interactive processing
  • Real-time stream processing
  • Graph Processing
  • Iterative Processing
  • In-memory processing

Flink is an alternative of Mapreduce, it processes data more than 100 times faster than MapReduce.

Flink is independant from hadoop but it can use hdfs to read, write, store, process the data. Flink does not provide its own data storage system.it takes data from distributed storage.

Flink  ecosystem:   (note2)

apache-flink-ecosystem-components

 Storage: 讀寫別家的資料庫大概都沒什麼問題

  • HDFS – Hadoop Distributed File System
  • Local-FS – Local File System
  • S3 – Simple Storage Service from Amazon
  • HBase – NoSQL Database in Hadoop ecosystem
  • MongoDB – NoSQL Database
  • RBDBMs – Any relational database
  • Kafka – Distributed messaging Queue
  • RabbitMQ – Messaging Queue
  • Flume – Data Collection and Aggregation Tool

以上都可

Deploy: 能分配部署資源 :

  • Local mode – On single node, in single JVM
  • Cluster – On multi-node cluster, with following resource manager
    • Standalone – This is the default resource manager which is shipped with Flink
    • YARN – This is very popular resource manager, it is part of Hadoop, introduced in Hadoop 2.x
    • Mesos – This is a generalized resource manager.
  • Cloud – on Amazon or Google cloud

Runtime :

the Distributed Streaming Dataflow, which is also called as kernel of Apache Flink. This is the core layer of flink which provides distributed processing, fault tolerance, reliability, native iterative processing capability, etc.

主從架構:

maxthonsnap20170216092524

 

特色:

  • Streaming – Flink is a true stream processing engine.
  • High performance – Flink’s data streaming Runtime provides very high throughput
  • Low latency – Flink can process the data in sub-second range without any delay
  • Event Time and Out-of-Order Events – Flink supports stream processing and windowing where events arrive delayed or out of order
  • Lightning fast speed – Flink processes data at lightning fast speed (hence also called as 4G of Big Data)
  • Fault Tolerance – Failure of hardware, node, software or a process doesn’t affect the cluster
  • Memory management – Flink works in managed memory and never get out of memory exception
  • Broad integration – Flink can be integrated with various storage system to process their data, it can be deployed with various resource management tools. It can also be integrated with several BI tools for reporting
  • Stream processing – Flink is a true streaming engine, can process live streams in sub-second interval
  • Program optimizer – Flink is shipped with an optimizer, before execution of a program it is optimized
  • Scalable – Flink is highly scalable. With increasing requirements we can scale flink cluster
  • Rich set of operators – Flink has lots of pre-defined operators to process the data. All the common operations can be done using these operators
  • Exactly-once Semantics – It can maintain custom state during computation
  • Highly flexible Streaming Windows – In flink we can customize windows by triggering conditions flexibly, to get required streaming patterns. We can create window according to time t1 to t5 and data driven windows.
  • Continuous streaming model with backpressure – Data streaming applications are executed with continuous (long lived) operators. Flink’s streaming engine naturally handles backpressure.
  • One Runtime for Streaming and Batch Processing – Batch processing and data streaming both have common runtime in flink
  • Easy and understandable Programmable APIs – Flink’s APIs are developed in a way to cover all the common operations, so programmers can use it efficiently.
  • Little tuning required – Requires no memory, network, serializer to configure

初看這Apache Flink, 電視台轉型需用到,以往直播用SNG車, 上衛星, 現在改串流技術,  光這樣成本就不知省多少,用途滿廣, 也可處理髒資料,推薦產品用, 作預測.

 

(note1: http://data-flair.training/blogs/apache-flink-production-fortune-500-companies-top-real-world-use-cases/)

(note2: data-flair.training/blogs/apache-flink-comprehensive-guide-tutorial-for-beginners/)

(Installation:

)

 

 

 

Docker

  1.  入門:
  • 讀這份 document : https://docs.docker.com/

這software engineering 必用

2. Typical Docker Platform Workflow  有5個步驟 :

  1. Get your code and its dependencies into Docker containers:
  2. Configure networking and storage for your solution, if needed.
  3. Upload builds to a registry (ours, yours, or your cloud provider’s), to collaborate with your team.
  4. If you’re gonna need to scale your solution across multiple hosts (VMs or physical machines), plan for how you’ll set up your Swarm cluster and scale it to meet demand.
  5. Finally, deploy to your preferred cloud provider (or, for redundancy, multiple cloud providers) with Docker Cloud. Or, use Docker Datacenter, and deploy to your own on-premise hardware
  1. Docker component  for Windows //https://docs.docker.com/docker-for-windows/

Requirement-

  • Docker for Windows requires 64bit Windows 10 Pro, Enterprise and Education (1511 November update, Build 10586 or later) and Microsoft Hyper-V. Please see What to know before you install for a full list of prerequisites.
  • Docker for Windows requires Microsoft Hyper-V to run. After Hyper-V is enabled

 

中文的Docker介紹: Docker OpenSource專案簡介

 

 

 

 

 

 

OpenStack

  • for IaaS

OpenStack是美國國家航空暨太空總署Rackspace共同打造的雲端開源軟體,以Apache許可證授權,並且是一個自由軟體和開放原始碼項目,來打造基礎設施即服務(Infrastructure as a Service) (note2)

  • 3個 大modules:

運算模組網通模組儲存模組,加上一套集中式管理的儀表板模組,來組合成一套OpenStack共享服務,並且以提供虛擬機方式,對外帶來運算資源,以便利彈性擴充或調度 (note1))

所以網管用的, 網通硬體, 程式化,虛擬化是必然. 硬體的軟體化, 虛擬化.ek4

各模組(套件), 請看 (note1)

maxthonsnap20161122055049

  • 網通模組(套件,module) :Neutron

Neutron套件為其它OpenStack服務提供網路連接即服務(Network-Connectivity-as-a-Service)功能。比如OpenStack運算,為租戶提供API定義網路和使用。基於插件式的架構,使其支援眾多的網路供應商和技術,,IT人員可分配IP位址、靜態IP或是動態IP。且IT人員也可以使用SDN技術,像是OpenFlow協定來打造更大規模或是多租戶的網路環境

此外,允許部署和管理其他網路服務,像是入侵偵測系統(IDS)、負載平衡、防火牆、VPN等。

類似 Amazon AWS 的 VPC。

這肯定對網通硬體廠商的轉型很重要.

 

  • Nova運算專案[1]
  • Swift物件導向數據存貯專案[2]
  • Glance虛擬機器磁碟映像檔(Virtual Machine Image)傳送服務[3] [4]
  • Horizon- 提供簡易Web界面和管理控制台[5]
  • Cinder – 提供Block資料存取
  • Keystone – 提供身份驗證機制
  • Neutron – 提供網路管理功能
  • Trove – 提供資料庫管理功能
  • Sahara – 提供海量資料運算佈署功能
  • Ceilometer – 提供計量與監控功能
  • Heat – 提供自動延展虛擬機功能

網管用的

  • Trove 資料庫服務套件 (Database as a Service)

Trove主要負責銜接簡化實際資料庫的使用,提供OpenStack各個服務一個具延展性且可靠的雲端資料庫服務(Cloud Database-as-a-Service),Database服務包含了銜接傳統關聯式資料庫與新興非關聯式資料庫.

  用 Ubuntu os .

這是好東西, 對的方向, 省錢, 加值. 亞洲接受度滿快的.

參考:

(note1: https://kairen.gitbooks.io/openstack-liberty/content/conceptions/index.html)

(note2: https://zh.wikipedia.org/wiki/OpenStack)

(note3:http://www.ithome.com.tw/newstream/109292)

(OpenStack 資源整理:https://kairen.gitbooks.io/openstack-liberty/content/openstack-resource.html)

IaaS

IaaS 看這幾家, 夠了,

我以前跟很多IDC 打交道,  菲律賓, 新加坡,  印尼, 日本, 泰國, 越南,馬來西亞…等

大陸的IaaS 要了解一下

  1. AWS
  2. AT&T
  3. GoGrid // http://www.gogrid.com  (note1)

GoGrid is a cloud infrastructure service, hosting Linux and Windows virtual machines managed by a multi-server control panel and a RESTful API.

1.RackSpace //www.rackspace.com (note2) (note6)

Rackspace(NYSE:RAX)全球三大雲計算中心之一,1998年成立,是一家全球領先的主機託管及雲端運算提供商,公司總部位於美國,在英國,澳大利亞,瑞士,荷蘭及香港設有分部。

^ http://www.rackspace.com/information/aboutus/
^ http://www.rackspace.com/whyrackspace/network/datacenters/
^ http://www.rackspace.com/whyrackspace/support/
^ http://tech.fortune.cnn.com/2011/12/02/rackspace/
^ http://en.wikipedia.org/wiki/OpenStack

2.世紀互聯 // www. cloudex.cn  (note3))

3. 華勝天成 // IaaS管理系統 (note4)

北京华胜天成科技股份有限公司(以下简称:华胜天成)是中国IT综合服务提供商,是国内将服务网络覆盖至整个大中华区域及部分东南亚的本土IT服务商。

旗下拥有两家上市公司:
华胜天成(上海证券交易所上市公司:600410),
香港ASL公司(香港联合交易所上市公司:00771)
2015年12月,集团成员企业兰德网络在全国中小企业股份转让系统挂牌(新三板:834505)。

华胜天成的业务方向涉及云计算、大数据、移动互联网、物联网、信息安全等领域,业务领域涵盖IT产品化服务、应用软件开发、系统集成及增值分销等多种IT服务业务,提出IT服务产品化的公司

4.神州數碼 // 營運管理平台  (note5)

(note1:https://en.wikipedia.org/wiki/GoGrid)

(note2:https://zh.wikipedia.org/wiki/Rackspace)

(note3: http://www.ch.21vianet.com/)

(note4:http://baike.baidu.com/link?url=uM7fmy1OoGt8OZcUrT1rN2_VxuM-pFEVMxQo7Fln2txROHoGoCatLQhaxZUg0HHInI77Hqz2CCI9QfgX6Zbwyzf_ho-aY9ctkYUrwaxBhwU2k7QQ8kvTTLSfx6NnpM_GwB0F5x36K3THVs6MV7lZr1pyCwgQPMrUFE1jzFFi_VP7PcQSp5tNwTMy7lBAmOoB)

(note5:http://baike.baidu.com/link?url=HjUqz_71vINjs034hqLLQ7noWrPjryAfjCCpJU9O6t6dNE6rkwATrDIp8Xc6krg-iC70LfkZQCdlVFf1dhcf1hVdrFd2mozGkLwPxgQn0myK3URXI4oN-X6lnRTumlkeX2dfF0ad1Pi_6IqgJULpEvQhtem58KnEoVcvNmEUPI7zWUhaGmoRtmRp-MyXygoD)

(note6: Rackspace company profile)

%d 位部落客按了讚: