Which models can be converted to Core ML?

Core ML supports models from PyTorch, TensorFlow/Keras, ONNX, and popular architectures for computer vision (ResNet, YOLO), NLP (BERT, transformers), and others. Limitations exist for custom operations without equivalents.

Which tool is better: coremltools or mlx-lm?

coremltools is a universal tool for most tasks, supporting conversion and quantization. mlx-lm is tailored for LLMs and more efficient for transformers on Apple Silicon. The choice depends on model type: use coremltools for smaller networks, mlx-lm for large language models.

How do you verify the model works correctly after conversion?

We compare the outputs of the original model and the Core ML version on test data, measuring the maximum absolute error (Max Abs Error). For Float16, the error must be below 1e-4. Additionally, we run the model on real Apple devices and monitor inference speed.

Does conversion affect model accuracy?

Conversion can reduce accuracy due to quantization (Float32→Float16 or INT8). We apply post-training optimization and test on a representative sample. In most cases, accuracy drop is below 0.5%, while performance increases 2–3x.

How long does model conversion take?

Time depends on model complexity and desired optimization. Simple conversion without quantization takes from a few hours to 1 day. With full Neural Engine optimization, quantization, and testing—from 2 to 5 days.

Which models can be converted to Core ML?

Core ML supports models from PyTorch, TensorFlow/Keras, ONNX, and popular architectures for computer vision (ResNet, YOLO), NLP (BERT, transformers), and others. Limitations exist for custom operations without equivalents.

Which tool is better: coremltools or mlx-lm?

coremltools is a universal tool for most tasks, supporting conversion and quantization. mlx-lm is tailored for LLMs and more efficient for transformers on Apple Silicon. The choice depends on model type: use coremltools for smaller networks, mlx-lm for large language models.

How do you verify the model works correctly after conversion?

We compare the outputs of the original model and the Core ML version on test data, measuring the maximum absolute error (Max Abs Error). For Float16, the error must be below 1e-4. Additionally, we run the model on real Apple devices and monitor inference speed.

Does conversion affect model accuracy?

Conversion can reduce accuracy due to quantization (Float32→Float16 or INT8). We apply post-training optimization and test on a representative sample. In most cases, accuracy drop is below 0.5%, while performance increases 2–3x.

How long does model conversion take?

Time depends on model complexity and desired optimization. Simple conversion without quantization takes from a few hours to 1 day. With full Neural Engine optimization, quantization, and testing—from 2 to 5 days.

Optimizing ML Models for Core ML on Apple Devices

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1566 services

Optimizing ML Models for Core ML on Apple Devices

Medium

from 1 day to 3 days

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1317
Development of a web application for FEEDME
1226
Website development for BELFINGROUP
925
Development of an online store for the company FURNORO
1156
B2B Advance company logo design
620
Development of a web application for Enviok
894

Show more works

When trying to deploy a trained model on iPhone, developers often run into incompatibility: PyTorch or TensorFlow won't work directly on Apple Silicon. Without conversion to Core ML, you lose access to the Neural Engine (ANE)—which offers a 15x performance boost and 10x reduction in power consumption. A typical scenario: you train a YOLOv8 object detector in PyTorch and want to run it in real time on an iPhone. Direct export via torch.onnx often fails due to unsupported operations—for example, torch.nn.functional.scaled_dot_product_attention or custom layers. We have helped dozens of teams migrate models ranging from computer vision to LLMs for iOS and macOS.

Why Apple Neural Engine is Critical for Performance

Apple Neural Engine is a specialized coprocessor that executes matrix operations in 16-bit format. For neural networks, it provides up to 15x acceleration and 10x lower power consumption compared to CPU. However, ANE only accepts models converted to Core ML with 16-bit or quantized representation. If your model uses operations incompatible with ANE (e.g., dynamic resolutions or custom activations), inference will run on CPU or GPU, draining battery and dropping frames.

How to Convert a PyTorch Model to Core ML

The primary tool is Apple's coremltools library. It supports conversion from PyTorch, TensorFlow/Keras, and ONNX. Example for PyTorch:

Conversion code example

import coremltools as ct
import torch

model = torch.load('model.pth')
traced_model = torch.jit.trace(model, example_input)
coreml_model = ct.convert(traced_model, inputs=[ct.TensorType(shape=example_input.shape)])
coreml_model.save('model.mlpackage')

The result is a .mlpackage file that can be run on iPhone, iPad, and Mac via MLModel. During conversion, it is important to consider input data format and dynamic shapes. For models with variable-length sequences (NLP), use ct.EnumeratedShapes.

Comparison of Conversion Tools

Tool	Supported Frameworks	Features
coremltools	PyTorch, TensorFlow, ONNX	universal, quantization, palettization
mlx-lm	HuggingFace Transformers	optimized for LLMs on Apple Silicon
ONNX→Core ML	via coremltools	intermediate conversion

The choice of tool depends on the architecture: for traditional convolutional networks, coremltools is sufficient; for LLMs (e.g., LLaMA 3), mlx-lm is better—it provides up to 40% speed improvement due to MPS graph optimization.

What to Do with Unsupported Operations

Not all operations have a direct equivalent in Core ML. Typical solutions:

replace the custom layer with a supported one (e.g., torch.nn.functional.leaky_relu with ct.nn.leaky_relu);
implement the operation via ct.CompositeOp;
simplify the model in advance—replace rare functions with standard ones.

For instance, when converting YOLOv8, we encountered the nn.Upsample operation with mode='bilinear'—coremltools only supports 'nearest' via the upsample_bilinear layer. Solution: replace mode before tracing. In complex cases, we use ct.converters.onnx as an intermediate format.

Optimization for Neural Engine

ANE works best with static tensor sizes and quantization. We use 8-bit quantization (linear_symmetric) or 4-bit palettization to reduce model size. This yields up to 30% speed improvement on ANE without noticeable quality loss. Typical metrics after optimization:

Quantization Type	Model Size	Accuracy Drop (top-1)	Speed Improvement
Float32	100%	—	1x
Float16	50%	<0.1%	2x
INT8 (linear)	25%	0.2–0.5%	3x
4-bit palette	12%	0.5–1%	4x

For models with high-level embeddings (e.g., BERT), we use quantization via ct.quantize_weights with calibration on a representative sample.

How to Test the Model After Conversion

After conversion, we always compare outputs of both versions on a test set. We monitor max absolute error—for Float16 it should not exceed 1e-4. Additionally, we run the model on iPhone, iPad, and Mac, measuring latency (p99) and power consumption. In our practice, p99 latency for YOLOv8 on iPhone 15 Pro is 2 ms, and power consumption drops by 80% compared to CPU inference.

How Long Does Conversion Take?

Time depends on model complexity and desired optimization. Simple conversion without quantization takes from a few hours to one day. If full ANE optimization, quantization, and device testing are needed—expect two to five days. We always provide a preliminary estimate after analyzing your model.

What the Work Includes

Model analysis and conversion strategy determination.
Conversion with optimal settings (quantization, dynamics).
ANE optimization and testing on real devices.
Delivery of a ready .mlpackage with documentation.
Support during integration into Swift/Objective-C app.

Why Choose Us

Our experience includes over 50 successfully converted models for iOS and macOS. We are certified Apple engineers (ADP, WWDC participants). We guarantee the model works on all target devices. Average project duration ranges from several days to two weeks depending on complexity.

get a consultation on converting your model: we will analyze the architecture, select optimal tools, and provide realistic timelines. Order a test conversion of one model—see the result before starting full-scale work.