Contract to Import AI Files 2025 Update

To allow the import of Adobe Illustrator files into Inkscape, we are developing a Python extension. This post outlines the progress made in the second month of the project, January 2025. The work is carried out inside the git repository extension-ai.

Below, you can find a list of what Manpreet has worked on in January 2025.

Text

The development of text-related functionalities is ongoing in the dev branch. Key changes include:

Styles

Duplicates

Duplicates handling has been a primary focus during the month, with significant progress made for both older (CS series) and newer (2020) Illustrator file formats. The previous assumption was that all objects were delineated by art dictionaries, allowing for the removal of all objects between two art dictionaries as duplicates, except for one. However, this assumption proved to be incorrect, making it necessary to adopt a different approach.

Understanding the Problem

Illustrator converts all objects into paths, similar to the Path > Object to Path option in Inkscape. These baked path based representations are then saved in the file, likely to ensure compatibility with PostScript interpreters. Additionally, Illustrator also needs to retain the original object's properties such as a rectangle's width and height or a circle's radius along with the original path before any filters are applied to the object. We refer to these objects as control objects. This extra data is stored in the file, resulting in duplicate objects.

Identifying duplicates

The control objects are not meant to be processed by the PostScript interpreter. In PostScript, the % character denotes the start of a comment, which is ignored during interpretation. Illustrator makes use of Pseudo comments, specifically %_ which are typically ignored by the PostScript interpreter but are recognized by Adobe Illustrator and other applications that parse Illustrator files. Illustrator encloses control objects within these pseudo-comments.

Deduplication

The control objects include dictionaries containing shape properties (stored twice sometimes) and the original outline or the path before the filters are applied. This can be seen as the blue colored highlight when an object is selected. The baked objects structure in the file depends on the control objects and the known styles that are applied. To identify the last control object and remove the other duplicates the deduplicate_object and deduplicate_live_shape functions have been implemented. If the known styles are not present, the baked object is kept and the control objects are ignored. To achieve this, the deduplication step is deferred until the next object is detected.

Other Fixes and Improvements