Customizing Pimcore Classification Store: Removing Grid View Duplicates and Simplifying DataHub Imports
Introduction
Classification Store is a Pimcore data type used to manage product data when different attributes apply to different product types. It works through two components: groups and key definitions.
Groups act as containers for sets of data used on specific product categories, while key definitions are attributes you define once and then select in the groups where they should appear. This is useful when you manage multiple product categories, each with a large set of category-specific data, and do not want irrelevant attributes cluttering the interface.
Take cutting tools, for example. Drilling tools need completely different specifications than grooving tools, though some attributes overlap between them. That is where Classification Store is especially useful.
The issue starts when the same attributes appear across different groups.
Going back to the drilling and grooving example, both categories share attributes like Coolant and Cutting Edge Length. You then have two separate Classification Store groups, each with its own specific attributes, plus these shared ones. If many groups needed the same attributes, you would usually place them directly on the product class instead of in Classification Store.
This is a simplified example. In reality, you are usually dealing with at least 20 different Classification Store groups where a handful of attributes are shared acros 10 or more groups. For users, this attribute duplication creates friction in two places: the grid view and DataHub imports.
Who This Is For
This article is intended for Pimcore teams working with complex product catalogs, especially developers, solution architects, data managers, and PIM teams responsible for imports, exports, and large-scale product data maintenance.
The Business Impact of Duplicate Attributes
For teams managing large product catalogs, this is not just a technical inconvenience. Duplicate Classification Store attributes slow down bulk updates, complicate imports and exports, increase the risk of inconsistent product data, and make everyday data maintenance harder than it needs to be.
The larger the catalog, the more this technical limitation turns into a business problem: more manual work, slower updates, and higher maintenance costs.
Grid View Impact
When you are working in grid view, whether you are doing bulk updates or preparing an export, you run into immediate problems. Let us say you need to update "Cutting Edge Length" across your product catalog. That single attribute appears in a dozen different Classification Store groups. Even though it is conceptually the same field, you have to perform the same update multiple times, once for each group where it appears. What should be a five-minute task turns into repetitive, error-prone work that eats up hours.
The same confusion happens when building grid view exports. You are looking at multiple columns for what you think is one attribute, trying to figure out which version you actually need. Do you export all of them? Just some? Users waste time deciphering which column belongs to which group, and mistakes are common.
The confusion becomes even clearer in the grid view itself. Instead of one clean "Cutting Edge Length" column, users see multiple columns, one for each Classification Store group.
Users waste time deciphering which column belongs to which group. Do you export all of them? Just some? Which one has the data you actually need? Mistakes become much more likely, and what should be straightforward data work becomes a puzzle.
DataHub Import Complexity
DataHub imports make the same issue even more visible. You need to map the same attribute separately for every single Classification Store group it appears in.
For example, say you are importing product data and "Coolant" exists across 20 groups. You cannot just map it once. You have to map it 20 times. One source attribute becomes 20 separate fields in Classification Store. The mapper treats each instance as completely different, even though they're identical.
And this is not just an initial setup problem. Every time you tweak import mappings or debug data problems, you are working through duplicated configuration again.
The mapping UI only lets you pick one data key at a time. For each group that key lives in, you create another mapping. That is just one attribute. Now scale that across 50+ attributes spread through different groups, and the configuration becomes very difficult to maintain.
This is where the scale becomes clear:
ATTR_LC appears in 162 groups. That means 162 mappings for what everyone thinks of as a single field. When you have dozens of shared attributes like this,
initial import setup eats up days, not hours. And if your data structure changes, you need to revisit all those mappings again.
Data teams end up spending far more time dealing with technical overhead than actually working with the data.
How Often This Happens
This is not an edge case. We see this scenario on roughly 50% of projects where Classification Store is the primary way to handle product attributes. Any catalog with diverse product types and shared specifications can run into it. The more product categories you manage and the more attributes they share, the worse it gets.
Out-of-the-Box Behavior and Limitations
How Standard Classification Store Works
Out of the box, Classification Store treats each group as its own isolated context. Add the same key definition to multiple groups, and it creates separate instances, one per group.
Behind the scenes, it stores data as group + key pairs. Define "Cutting Edge Length" once, add it to Group A and Group B, and you effectively have two distinct fields: GroupA-CuttingEdgeLength and GroupB-CuttingEdgeLength.
This keeps groups independent, which is useful for flexibility. But when attributes need to work across groups, that independence becomes the source of the problem.
Why Duplication Happens
This is not a bug. It is how the system was designed. Classification Store assumes each group is its own context, and attributes belong to that specific classification.
Real catalogs do not always work that way. "Coolant" means the same thing on drilling tools and grooving tools, but Classification Store does not know that. To the system, they are separate attributes that happen to share a name. Users think "update Coolant everywhere", while the system asks: "Which Coolant? Drilling? Grooving? Milling?"
Grid View Limitations
Grid view creates a column for every group-key combination. If "Cutting Edge Length" exists in 12 groups and a product has 8 of those groups, you get 8 columns. The system cannot merge them because, technically, they are different data sources.
DataHub Mapping Limitations
DataHub needs mappings for specific group-key pairs. Your source file has one "Coolant" column, but if Pimcore has "Coolant" in 20 groups, you need 20 separate mappings. No shortcuts, no wildcards: each one has to be configured individually.
Our Solution: One Logical Attribute Across Groups
Our solution kept the existing many-group Classification Store structure, but changed the operational view so users and integrations work with one visible representation of each shared attribute. Thekey definitions stay the source of truth. What changes is how the group context is resolved when the same key exists in multiple groups.
Instead of letting the same key appear as separate columns because it belongs to multiple groups, we collapse the grid/DataHub surface to one logical attribute. From the user perspective, any shared attribute appears once. From the backend perspective, Pimcore can still have many Classification Store groups. The customization decides whether the group should be ignored, resolved from the active product groups, or later replaced by one canonical group.
This matters because the problem is not the key definition itself. The problem is the repeated group-key pairing. We keep the useful part of Classification Store, flexible key definitions, and remove the duplicated working surface that makes grids and DataHub mappings hard to maintain.
What Changes in the Background
On every product save, a high-priority PRE_ADD/PRE_UPDATE event listener inspects the in-memory Classification Store object, merges any values from non-canonical groups into the canonical group, and then replaces the active group list with only the canonical group ID. This happens entirely in memory before Doctrine writes anything, so the product is normalized before it reaches persistence.
On every read that needs the active group, AbstractProduct::getInovaraClassificationStoreGroup() resolves the canonical group by name rather than returning whichever group happens to be first in the active list. This keeps downstream code deterministic and avoids depending on active group ordering.
public function normalizeStore(Classificationstore $store): void
{
$groupId = $this->getGroupId();
$items = $store->getItems();
foreach ($items as $sourceGroupId => $keys) {
if ((int) $sourceGroupId === $groupId) {
continue;
}
foreach ($keys as $keyId => $languageValues) {
foreach ($languageValues as $language => $value) {
if (($items[$groupId][$keyId][$language] ?? null) !== null) {
continue;
}
$items[$groupId][$keyId][$language] = $value;
}
}
}
$store->setItems($items);
$store->setActiveGroups([$groupId => true]);
}
Key Architectural Changes
- Logical single-attribute strategy: TechnicalData attributes are exposed once in operational screens even when they exist in multiple Classification Store groups.
- Neutral group handling: groupId 0 is used as a signal that the grid/filter layer should work across groups for the same key instead of creating one column per group-key pair.
- Runtime value resolution: when Pimcore needs a concrete value, custom Classification Store/grid data logic can resolve the neutral group context to the relevant active group.
- Configuration metadata propagation: gridConfigDialog and gridTabAbstract preserve extra field metadata so custom Classification Store column behavior survives saved grid configuration round-trips.
- Optional physical consolidation: when the business wants to simplify the data model itself, existing values can be moved and verified with a console command before old groups are removed.
Step-by-Step Implementation
Prerequisites
The only requirement before writing any code is that the canonical group exists in the Classification Store.
Open Pimcore admin > Settings > Classification Store > Groups and create a group named "Technical Data" inside the correct store if it does not already exist.
The resolver looks it up by name and throws a RuntimeException on first use if it cannot find it, so a missing group will be caught immediately during save/import handling. In this project, the resolver constants are STORE_ID = 1, FIELDNAME = TehnicalData, and GROUP_NAME = Technical Data.
Before enabling the normalization logic, verify that shared keys have compatible values across groups. If the same product/language/key has different populated values in different groups, the merge rule needs a business decision before the listener can safely choose one value.
Backend Changes
Start with the original many-group behavior. Patch the grid helper so Classification Store filters can work with a neutral group ID. In practice, when the feature join has groupId 0, the SQL joindoes not add a groupId condition; it filters by fieldname, keyId, and language. This is the part that allows one logical grid column to search values across multiple Classification Store groups.
Patch grid value loading next. Pimcore normally expects a concrete group-key pair. The custom grid data logic checks for groupId 0 and, when the product has exactly one active Classification Store group, resolves the neutral group to that active group before calling getLocalizedKeyValue(). The Classificationstore model patch follows the same idea: if groupId 0 is requested and the store containsone item group, it uses that group automatically.
Preserve column metadata through the admin grid configuration lifecycle. The gridTabAbstract and gridConfigDialog patches pass the extra metadata field through selected columns and saved configs. This is important because the custom Classification Store column needs more context than a standard Pimcore field.
Add the save listener after the grid behavior is stable. NormalizeTechnicalDataGroupListener listens on PRE_ADD and PRE_UPDATE with high priority, checks only Product objects, reads getTehnicalData(), and delegates the actual merge to TechnicalDataMainGroupResolver::normalizeStore(). Keeping the merge logic in the resolver makes it reusable and easier to test.
Configuration
The Output Data Config Toolkit configuration should make Classification Store selection key-oriented instead of group-oriented. The relevant setup is:
output_data_config_toolkit.classification_store.display_mode: all
With display_mode set to all, users can select from the available keys without depending on the currently assigned groups of one object. With grouped set to false, the config dialog does not load and render a huge group hierarchy. This is both faster and easier to understand when the business goal is to select the attribute once.
The ClassController customization should also limit loaded keys to the relevant Classification Store. Without that filter, an ungrouped key selector can become toobroad on installations with multiple stores.
Finally, keep field and group naming centralized. In this project the field name is TehnicalData, and the later consolidation target group is Technical Data. Centralizing those values reduces the risk of grid configs, DataHub logic, and migration code drifting apart.
Testing and Validation
Grid view
Open the column configurator on the Product listing. Each Technical Data key should now appear exactly once instead of repeating once per group. Select one, add it as a column, and confirm that values display correctly across products that belong to different families.
Admin UI save
Edit a product's Technical Data values directly in the admin, save, and reload. Confirm that values persist and that the canonical Technical Data group remains the active group for the field.
Verify via SQL that no product has more than one active group after the save:
SELECT id, COUNT(*) AS group_count
FROM object_classificationstore_groups_Product
WHERE fieldname = 'TehnicalData'
GROUP BY id
HAVING COUNT(*) > 1;
This query should return no rows.
Results and Benefits
Cleaner Grid Views
After implementing the solution, the grid view configuration changes dramatically. Users no longer see the same attribute repeated for every group it appears in. They see it once.
Before the change, "Cutting Edge Length" appears 12 times in the column selector, one for each Classification Store group. Updating values means editing 12 separate columns, trying to remember which ones have already been touched. After the change, it appears once. Select it, and you are done. All groups are handled automatically behind the scenes.
This simplifies bulk editing too. Make a change once, and it applies across all relevant groups. No more repetitive updates, no more wondering if you missed one.
Simpler DataHub Mapping
DataHub setup becomes straightforward, especially for non-technical users who need to manage imports.
Previously, mapping "Cutting Edge Length" had to be done separately for each of the 162 groups it appears in, requiring a checklist to make sure none were missed and that they were all configured consistently. Now it is mapped once, and the system handles distribution to all groups automatically. Total mapping time drops from hours to minutes.
Users can focus on the actual data logic, which source fields map to which attributes, instead of wrestling with technical overhead. Import configurations are cleaner, easier to troubleshoot, and much faster to set up or modify.
Performance Considerations
normalizeStore() works on the in-flight Classification Store object. It does not need to load every group/key relation during each save; it only works with the data already present on the object. getGroupId() performs one lookup and caches the result inside the resolver instance, so repeated saves/import rows in the same request do not repeatedly query the canonical group ID.
The grid-side improvement is also practical: users and the grid configuration UI no longer need to render or manage repeated columns for the same shared key. On large stores, keeping the selector ungrouped and scoped to the relevant store avoids loading a deep group hierarchy just to choose one attribute.
Maintainability
STORE_ID, FIELDNAME, and GROUP_NAME are constants in TechnicalDataMainGroupResolver, so there is one place to update if anything changes. Both the listener and the guard apply early-return checks scoped to Product and TehnicalData, so neither affects any other object type or field.
The Pimcore API surface to monitor on upgrades is Classificationstore::setItems(), setActiveGroups(), setLocalizedKeyValue(), and getLocalizedKeyValue(), plus the patched admin grid classes. These are the places where behavior is closest to Pimcore internals, so they should be part of the upgrade checklist.
When to Apply This Solution
Project Size and Complexity Indicators
This solution makes sense at scale. If you have hundreds of product attributes scattered across dozens of Classification Store groups, and a significant number of those attributes are shared, this is where the customization pays off.
On the other hand, if you only have a handful of duplicated attributes, stick with the out-of-the-box behavior. Modifying core Classification Store functionality for three or four shared attributes is not worth the development effort or the maintenance overhead down the road.
Frequent Exports
If your team regularly exports product data for catalogs, price lists, channel feeds, or analytics, and they are constantly cleaning up duplicate columns in Excel or CSV files afterward, that is a clear signal.
Post-export cleanup should not be part of the workflow. Merging columns, deduplicating data, and making sure you have one clean "Cutting Edge Length" column instead of twelve scattered versions is wasted time that adds up fast across weekly or daily exports.
Regular Imports
Frequent DataHub imports are another indicator. If you are constantly updating product data from suppliers, ERPs, or external systems, and your import configurations are hard to maintain because of duplicated mappings, the solution eliminates that pain.
Instead of preparing source files with separate columns for each group variation, or managing mapping templates with hundreds of redundant entries, you map once and move on.
Trade-offs to Consider
This is not a zero-cost change. You are modifying how Classification Store handles data at a fundamental level. That means:
• Custom code that needs to be maintained through Pimcore upgrades
• Potential compatibility issues with future Classification Store features
• Your team needs to understand the customization when troubleshooting issues
The ROI comes from time savings on repetitive tasks. If your team spends hours per week dealing with duplicate attributes in grids and imports, the customization pays for itself quickly. If duplication is a minor annoyance, the trade-off might not be worth it.
Conclusion
Classification Store's out-of-the-box behavior works well until products start sharing attributes across multiple groups. When that happens, duplication creates real friction: messy grid views, bloated import mappings, and hours of repetitive work.
Our customization addresses this by changing how Classification Store handles shared attributes at the backend level. Users see each attribute once, map it once, and update it once. The system handles the rest.
This is not a universal fix. It makes sense for projects with hundreds of attributes and frequent data operations. For smaller setups, the standard approach is usually enough.
If you are spending significant time managing duplicate attributes in your PIM, this solution can turn hours of repetitive work into minutes of streamlined data management.
If you have questions about implementing this on your project, reach out. We are happy to discuss whether this approach fits your specific setup.