Open source application for big data

1. Hadoop

  • 這我知 要細看一下網站及文件
  • OS:Windows、Linux 和 OS X
  • website:

2. Hypertable

  • Hypertable 在互联网公司当中非常流行,它由谷歌开发,用来提高数据库的可扩展性
  • 与 Hadoop 兼容,提供商业支持和培训。
  • OS:Linux 和 OS X
  • website:

3. Mesos

  • Apache Mesos 是一种资源抽象工具,有了它,企业就可以鼗整个数据中心当成一个资源池,它在又在运行 Hadoop、Spark 及类似应用程序的公司当中很流行
  • OS:Linux 和 OS X
  • website:

4. Presto 

  • Presto 由 Facebook 开发,自称是“一款开源分布式 SQL 查询引擎,用于对大大小小(从 GB 级到 PB 级)的数据源运行交互式分析查询
  • OS:Linux
  • website:

5. Solr

  • 这种“快若闪电”的企业搜索平台声称高度可靠、扩展和容错
  • OS:与操作系统无关
  • website:

6. Spark

  • 這我寫過
  • Apache Spark 声称,“它在内存中运行程序的速度比 Hadoop MapReduce 最多快 100 倍,在磁盘上快 10 倍
  • OS:Windows、Linux 和 OS X
  • website:

7. Storm

  • Apache Storm 用来处理实时数据
  • OS:Linux
  • 相关网站:

我提過這些技術該如何看,我有興趣的是1.4.5.6,但目前我個人用不到, 要考量時間的機會成本. 不是瞎學就能解決問題, 台灣的企業有多少需要big data 作決策, 擔心市場應用的程度.  我只是要找到應用解決問題,數據要多大,坦白說對我並不重要, 黑貓白貓, 抓的到老鼠就是好貓.


Ansible : Playbooks, Inventory, Module index

Popular topics: (note1)


  • Module index (note2)


  • Playbooks (note3)

Playbooks are Ansible’s configuration, deployment, and orchestration language. They can describe a policy you want your remote systems to enforce, or a set of steps in a general IT process. //其實就像是編劇的劇本

相當多, 需要什麼 再查什麼 (note3)

  • Inventory (note4)








Twitter 技巧

工具關鍵是在人,  善用工具, 組織猶如數位神經一樣靈敏. 員工願不願意分享, 關鍵還是在人, 組織設計用心與否關鍵還是在人,人對了, 心正了,目標清楚了,  知識, 工具, 各種機會就是在那裡, 看懂, 實踐,應用, 分享, 會用的就是會用, 公司加快了效率, 提生了生產力. 好公司都是這樣的.






  • 架構


  • Playbooks


  • Variables


  • Answer Tower

ansible tower



(Ansible documentation //


Ansible 初探

Ansible (note1)

  • an open-source automation engine that automates software provisioning, configuration management, and application deployment (note2)
  • IT automation engine for using to drive complexity out of their environments and accelerate DevOps initiatives.
  • Ansible, an open source community project sponsored by Red Hat, is the simplest way to automate IT.
  • Ansible is the only automation language that can be used across entire IT teams – from systems and network administrators to developers and managers.
  • Ansible by Red Hat provides enterprise-ready solutions to automate your entire application lifecycle – from servers to clouds to containers and everything in between.
  • Ansible Tower by Red Hat is a commercial offering that helps teams manage complex multi-tier deployments by adding control, knowledge, and delegation to Ansible-powered environments.

The goal of automation process is for update without impact of operational capacity

IT automation, Agile development, DevOps, Deployment, Applicaiton update, Testing,

  • CD,CI (Continus Delivery, Continus Integration)  

CI systems are build systems that watch various source control
repositories for changes, run any applicable tests, and automatically build (and ideally test) the
latest version of the application from each source control change, such as Jenkins (

The key handoff for CD is that the build system can invoke Ansible upon a successful build.
Users who also run unit or integration tests on code as a result of the build will also be one step ahead of the game.

Jenkins can utilize Tower to deploy the built artifact into multiple environments,

but a QA/stage environment modeled after production ups the ante and substantially improved predictability along the lifecycle. The data provided back by Ansible can then be referenced, and directly correlated to a Tower job in the Build Systems job.

Ansible’s unique multi-tier, multi-step orchestration capabilities, combined with its push-based architecture, allow for extremely rapid execution of these types of complex workflows

  • Ansible feature (note3)


  • What is Ansible?


  • Ansible 自動化組態技巧





(Why Ansible :


(note3: What is Ansible? //

The next generation enterprise


I got this chart from MIT class.

Keep in mind this chart and think which area your company belong to. How deep are your company understand your customers, and what’s the value you can creat for them.

As individual, this chart also useful to consider what the relationship between your self and your company. To what extent do you understand your company’s need?   Let’s think.


Top 10 highest tech investment priorities


It makes sense.  I am interested in


  • A payment service provider
  • a US technology company,operating in over 25 countries, that allows both private individuals and businesses to accept payments over the Internet.
  • Stripe focuses on providing the technical, fraud prevention, and banking infrastructure required to operate online payment systems (note1.3)

Stripe 的金流服務可以收取全球 139 種幣別,不過目前該服務只在 12 國提供,除了美國、加拿大、英國和愛爾蘭之外,其他如澳洲、德國、法國、荷蘭和西班牙等八個國家尚在 beta 階段。現在他們已經有幾個大客戶像是 Salesforce、Lyft、Shopify、Squarespace 和 TED 等 (note2)


(note2: Stripe:連 PayPal 創辦人都投資的金流公司)


Amazon Marketplace Web Service (Amazon MWS)

  • 亚马逊商城网络服务 (Amazon MWS)

an integrated Web service API that helps Amazon sellers to programmatically exchange data on listings, orders, payments, reports, and more. XML data integration with Amazon enables higher levels of selling automation, which helps sellers grow their business. By using Amazon MWS, sellers can increase selling efficiency, reduce labor requirements, and improve response time to customers.



Google, AI, Machine Learning

Google cloud platform (note5)

Target : the enterprise cloud market

  • Position- To be a developer-friendly platform
  • Weakness

1) Not strong and no impression  on cloud service and the enterprise segment

2) No contribution to the open source community before.


  • Stength : ASIC, GPU and TPU hardware in its cloud
  • Opportunity

1) begin to work with open source projects (note3)

  • Cloud Native Computing Foundation- the open-source container management tool

run by the Linux Foundation (note4)

Other partners in the new foundation include AT&T, Box, Cisco, Cloud Foundry Foundation, CoreOS, Cycle Computing, Docker, eBay, Goldman Sachs, Huawei, IBM, Intel, Joyent, Kismatic, Mesosphere, Red Hat, Switch SUPERNAP, Twitter, Univa, VMware and Weaveworks.

Absent : Microsoft, Amazon, Pivotal, and  Taiwanese tech companies)

the popular open source container orchestration system

  • TensorFlow for machine learning

-Spanner for launching massive distributed databases

-Draco for 3D graphics compression

2) To be a  developer-friendly platform


1)letting customers run whatever open source stack they choose on Google’s infrastructure,

2)releasing and supporting open source projects and making the ecosystem

3)the partners who build tools and technologies on top of GCP, a first class citizen on the platform.

4) treating them as part of the whole and the net is bringing the tech you want and using Google technology or using any of the [partner] services

The KSF :

1)Being open to win the mind shares of developers

2) much more supportive of the open source community makes people feel better about Google and makes developers feel better about working with their tools because they can avoid lock-in

  • Threat : peers: AWS (2006. 1st public cloud, market leader, 1st mover), Microsoft, IBM
  • Strategy

UsingKubernetes, the popular open source container orchestration system offer robust open source tools, something that surprised some people in this market.

  • 4 ways Google will enable enterprises to adopt machine learning and AI (note2)

1). Machine learning computing in Google Cloud

a deep learning algorithm can have tens of millions of parameters, training these machine learning models requires enormous computational resource

the Cloud Machine Learning Engine. This capability is designed for companies with data scientists and machine learning experts who are able to build their own unique machine learning models with libraries such as Tensorflow.

Google’s infrastructure as the solution to speed training times and improve the return on investment. Google has specialized ASIC, GPU and TPUhardware in its cloud to accelerate training and improve the ROI with on-demand cloud resource utilization. After the model is trained, it is deployed in range of platforms—from on-premise to mobile devices.

2. Algorithms and pretrained machine learning models

建ML model 需用 the machine learning engine, 用 Google’s pre-trained models (full list) using APIs to add machine learning capability to their applications, such as understanding natural language, images and natural language.

An API beta for understanding videos

demo: This 3-minute video of the demonstration of the Cloud Video Intelligence beta

3. Google acquires Kaggle for data

Google acquired Kaggle for data sets and talent. Kaggle, founded in 2010, is a community of 850,000 data scientists from around the world that hosts competitions to create the most accurate predictive models and market models, as well as to acquire new public data sets in a variety of fields.

4. Expertise

the Advanced Solutions Lab for customers with ambitious goals to develop machine learning to solve complex problems.




(note4: The mission of Linux foundation :The mission of this new foundation is to “help facilitate collaboration among developers and operators on common technologies for deploying cloud native applications and services,” )


(Reference : Google Cloud Platform 入門)



%d 位部落客按了讚: