Currently, there are more and more ARM based datacenter hardware options on the market, and their performance has been continuously improving. Thus more and more users and customers are starting to consider using these datacenter hardware options for their business. Big Data is one of the most important areas.
On the contrary, the open source ecosystem for Big Data on ARM is not that perfect: most of the software in the Big Data ecosystem does not care too much about running on ARM in advance, or developers have not officially tested their codes on ARM, and there are a lot of unsolved problems. In order to make those software solutions able to run on ARM, one has to search and read tons of articles and to do a lot of patches and build a numbers of dependencies on their own. And once the upstream changes or upgrades, there might be new problems since it is not tested on ARM in upstream. All these challenges made users concerned to use ARM for their business.
In order to change this situation and make the Big Data open source ecosystem more friendly to ARM platform and its users, our team started by proposing adding ARM CI to those open source projects. By doing this, the projects will be fully tested on ARM and also all future changes will as well be tested on ARM. In the process, we fixed a lot of problems directly in upstream, which benefits all users. And then, we started to perform performance comparison tests between ARM and x86, to give users an overview of the status. And there are also large numbers of TODO items, for the future.
In this session, you can learn the current status of ARM CI for Big Data ecosystem projects like Hadoop, Spark, Hbase, Flink, Storm, Kudu, Impala etc. and our efforts on fixing ARM related problems. We will also introduce our future plans.