Case Analysis of Deep Learning Technology Based on Spark and BigDL

This article presents the practical experience of Intel and JD in building a large-scale image feature extraction framework using deep learning techniques, leveraging Spark and BigDL. The framework is designed to handle massive image datasets efficiently and effectively.

Image feature extraction plays a crucial role in applications such as similar image retrieval, deduplication, and object recognition. Before adopting BigDL, we experimented with deploying feature extraction applications on multi-machine GPU clusters. However, these approaches came with several challenges:

GPU-based resource allocation is complex and often leads to issues like memory shortages, which can cause out-of-memory (OOM) errors and application crashes.

On a single machine, clustering methods require developers to manually manage data partitioning, loading, and fault tolerance, increasing development complexity.

GPU-based applications also have many dependencies, such as CUDA, making deployment and maintenance difficult. Incompatibilities between OS versions or GCC versions often require recompilation and packaging.

These technical hurdles made it challenging to scale GPU-based systems for large-scale image processing tasks.

In real-world scenarios, images often have complex backgrounds, and the subject occupies only a small portion of the image. To improve feature extraction accuracy, it’s essential to isolate the main object from the background. Therefore, the image feature extraction pipeline typically involves two steps: first, detecting the target using an object detection algorithm, and then extracting features from the detected region. In this case, we used SSD [1] for object detection and DeepBit [2] for feature extraction.

Jingdong has hundreds of millions of product images stored in distributed open-source databases. Efficiently retrieving and processing such large-scale data is a critical challenge in the image feature extraction pipeline. Traditional GPU-based solutions face additional obstacles in this context:

Data transfer from the database to the GPU is time-consuming and not well optimized.

Preprocessing image data in a distributed environment is complex and lacks mature tools for resource management, distributed processing, and fault tolerance.

Scaling GPU-based systems to handle large-scale image datasets is technically challenging due to software and hardware limitations.

To address these challenges, we turned to BigDL, an open-source distributed deep learning framework developed by Intel. BigDL runs on Spark and offers comprehensive support for deep learning algorithms. It leverages Spark's distributed computing power, allowing seamless scaling to hundreds or even thousands of nodes. Additionally, BigDL utilizes Intel MKL and parallel computing technologies to deliver high performance on Intel Xeon servers, achieving results comparable to those of mainstream GPUs.

In our use case, BigDL was customized to support multiple models, including detection and classification. Models were ported from specific environments like Caffe, Torch, and TensorFlow into the BigDL ecosystem. This allowed us to optimize the entire pipeline for speed and efficiency.

The feature extraction pipeline using BigDL in a Spark environment is illustrated in Figure 1. It involves reading images from a distributed database, preprocessing them, performing distributed object detection with SSD, cropping the target regions, and finally extracting features using the DeepBit model. The extracted features are then stored on HDFS.

Figure 1 Image Feature Extraction Pipeline Based on BigDL

BigDL enables the entire data pipeline—from data ingestion, partitioning, preprocessing, prediction, to result storage—to be implemented seamlessly within Spark. Users can run deep learning applications without modifying cluster configurations, making it highly accessible for existing big data environments like Hadoop or Spark.

Beyond distributed deep learning capabilities, BigDL provides user-friendly tools such as image preprocessing libraries and model loading utilities that support third-party frameworks. These features simplify the process of building and maintaining end-to-end pipelines.

BigDL's image preprocessing library is built on OpenCV [5], offering a wide range of common image transformation and enhancement functions. Developers can easily construct their own preprocessing pipelines using these tools, while also having the flexibility to customize operations through the library's API.

Case Analysis of Deep Learning Technologies Based on Spark and BigDL

Piezoelectric Discs For Flowmeter Sensor

Piezoelectric ceramic disc

Quick delivery

High performance

Application: flow meter measurement


There are many kinds of USF used in closed pipeline according to the measuring principle, and the most commonly used are propagation time method and Doppler method. Among them, time difference ultrasonic flowmeter is used to measure fluid flow by the principle that the time difference of sound wave propagating downstream and countercurrent is proportional to the velocity of fluid flow. It is widely used in raw water measurement of rivers, rivers and reservoirs, process flow detection of petrochemical products, water consumption measurement of production process and other fields. According to practical application, time-difference ultrasonic flowmeter can be divided into portable time-difference ultrasonic flowmeter, fixed time-difference ultrasonic flowmeter and time-difference gas ultrasonic flowmeter.


Ultrasonic flow-meters use at least two transducers aligned so that ultrasonic pulses travel across the flow of liquid or gas in a pipe at a known angle to the flow.



Technical data:

Electromechanical coupling coefficient Kp: > 0.62

Dielectric Loss tg δ: <2%



Nominal Piezo discs for ultrasonic flowmeter:

OD14.2*1MHz PZT-51

OD14.6*1MHz PZT-51

OD15*1MHz PSnN-5

OD15*2MHz PSnN-5

OD20*1MHz PSnN-5

OD20*2MHz PSnN-5

OD15*1MHz PZT-51

OD15*2MHz PZT-51

OD20*1MHz PZT-51

OD20*2MHz PZT-51

Size, Frequency and Electrode on request.


20x2img 2019


28x1img 2040

Piezoelectric Ceramic Disc,Piezoelectric Disk Flow Meter,Piezoelectric Flow Sensor,Piezo Discs Flowmeter

Zibo Yuhai Electronic Ceramic Co., Ltd. , https://www.yhpiezo.com