Visive AI News

Xpeng's Vision-First Strategy: A Game-Changer in Autonomous Driving

Discover how Xpeng's shift to a vision-based approach could redefine the future of autonomous vehicles. Learn why this move might outpace lidar in the race f...

September 21, 2025
By Visive AI News Team
Xpeng's Vision-First Strategy: A Game-Changer in Autonomous Driving

Key Takeaways

  • Xpeng's vision-based approach, similar to Tesla, is gaining traction over lidar for its cost-effectiveness and scalability.
  • Xpeng's VLA (Vision, Language, Action) system uses customer data to train AI, potentially leading to more robust real-world performance.
  • Despite the advantages, lidar remains a crucial component for robotaxi companies like Waymo and Zoox, offering better accuracy in complex environments.
  • Xpeng's CEO, Xiaopeng He, tested Tesla’s FSD and was impressed, signaling a potential collaboration or competitive push in the autonomous vehicle market.

Xpeng's Vision-First Strategy: A Game-Changer in Autonomous Driving

The automotive industry is at a crossroads, with companies like Tesla and Xpeng leading the charge in developing autonomous vehicles (AVs). While Tesla has long championed a vision-based approach, Xpeng, a Chinese automaker, is now following suit, abandoning lidar in favor of cameras and AI. This strategic shift could have significant implications for the future of AV technology.

The Vision-First Approach: Cost and Scalability

Xpeng's decision to move away from lidar is rooted in practicality and cost-effectiveness. Candice Yuan, senior director and head of product at Xpeng’s Autonomous Driving Center, explained that lidar data is difficult to integrate into their AI system. “The lidar data can’t contribute to the AI system,” Yuan stated. Instead, Xpeng’s system, called Navigation Guided Pilot (XNGP), relies on 10 to 30-second short videos from customer vehicles to train its large language model.

Key advantages include:

  • Cost-Effective:** Lidar sensors are expensive and require heavy data labeling and complex integration. Vision-based systems, on the other hand, use cheaper cameras and are more scalable across a global fleet.
  • Scalable:** Xpeng’s VLA (Vision, Language, Action) system can be updated and improved continuously with real-world data, making it more adaptable to various environments.
  • Real-World Performance:** By using customer data, Xpeng can train its AI to handle a wide range of scenarios, from urban traffic to rural roads.

The Role of AI in Training

Xpeng’s approach to AI training is innovative and data-driven. The company’s VLA system uses a combination of visual data, language understanding, and action prediction to improve its autonomous driving capabilities. This holistic approach allows the AI to make more informed decisions, potentially leading to safer and more reliable autonomous vehicles.

The Competition: Lidar vs. Vision

Despite the advantages of the vision-first approach, lidar remains a crucial component for many robotaxi companies. Waymo and Zoox, for instance, use lidar to enhance the accuracy of their systems, especially in complex urban environments. Lidar data provides a detailed 3D map of the surrounding environment, which is invaluable for navigating through poor lighting, bad weather, and edge cases.

Why lidar still matters:

  1. Accuracy: Lidar provides high-resolution, 3D data that is essential for precise navigation.
  2. Edge Cases: In complex urban environments, lidar can detect and handle edge cases that vision-based systems might miss.
  3. Safety: The additional data from lidar can improve the overall safety of autonomous vehicles.

Xpeng and Tesla: A Competitive Push

Xpeng’s CEO, Xiaopeng He, visited Silicon Valley last year and tested Tesla’s Full Self-Driving (FSD) system. He was impressed by its performance and even cheekily asked to borrow a Tesla equipped with FSD, while inviting Elon Musk to China to try Xpeng’s XNGP system. This interaction signals a potential competitive push or collaboration between the two companies, both of which are betting big on vision-based autonomous driving.

Key points of comparison:

  1. Performance: Both XNGP and FSD are designed to operate anywhere, at least theoretically.
  2. User Supervision: Like Tesla’s FSD, XNGP still requires constant driver supervision and readiness to take over.
  3. Market Impact: Xpeng’s vision-first approach could position it as a strong competitor in the global AV market, especially in regions where cost and scalability are key factors.

The Bottom Line

Xpeng’s shift to a vision-based approach is a strategic move that could redefine the future of autonomous driving. While lidar remains a valuable tool for robotaxi companies, the vision-first approach offers significant advantages in cost, scalability, and real-world performance. As Xpeng continues to refine its AI with customer data, the company is well-positioned to challenge established players like Tesla and potentially lead the way in the global AV market.

Frequently Asked Questions

What is Xpeng's VLA system?

Xpeng's VLA (Vision, Language, Action) system is a sophisticated AI model that uses visual data, language understanding, and action prediction to improve autonomous driving capabilities. It relies on short videos from customer vehicles to train the AI, making it more adaptable to real-world scenarios.

Why did Xpeng abandon lidar in favor of cameras?

Xpeng abandoned lidar because the data from lidar sensors is difficult to integrate into their AI system. Cameras, on the other hand, are cheaper, more scalable, and provide data that can be more easily used to train the AI, leading to better real-world performance.

How does Xpeng's XNGP compare to Tesla's FSD?

Xpeng's XNGP and Tesla's FSD are both vision-based systems designed to operate anywhere, but they still require constant driver supervision. Xpeng's CEO, Xiaopeng He, tested Tesla’s FSD and was impressed, indicating a potential competitive push or collaboration between the two companies.

What are the advantages of using lidar in autonomous vehicles?

Lidar provides high-resolution, 3D data that is essential for precise navigation, especially in complex urban environments. It can detect and handle edge cases that vision-based systems might miss, improving overall safety and accuracy.

What is the impact of Xpeng's vision-first approach on the AV market?

Xpeng's vision-first approach is a strategic move that could redefine the future of autonomous driving. It offers significant advantages in cost, scalability, and real-world performance, positioning Xpeng as a strong competitor in the global AV market, especially in regions where cost and scalability are key factors.