The more cores you can use, the better — especially with big data. But the easier a big data framework is to work with, the harder it is for the resulting pipelines, such as TensorFlow plus Apache Spark, to run in parallel as a single unit.
Researchers from MIT CSAIL, the home of envelope-pushing big data acceleration projects like Milk and Tapir, have paired with the Stanford InfoLab to create a possible solution. Written in the Rust language, Weld generates code for an entire data analysis workflow that runs efficiently in parallel using the LLVM compiler framework.
The group describes Weld as a “common runtime for data analytics” that takes the disjointed pieces of a modern data processing stack and optimizes them in concert. Each individual piece runs fast, but “data movement across the [different] functions can dominate the execution time.”
In other words, the pipeline spends more time moving data back and forth between pieces
To read more see the full post at: MIT-Stanford project uses LLVM to break big data bottlenecks - InfoWorld