Two Guys and a Toolkit - Week 4: Grouping

New to Shotgun or Toolkit? Check out Our Story, learn about Our Product, or get a Quick Toolkit Overview. If you have specific questions, please visit our Support Page!


Welcome back for part four of our series dedicated to building a simple pipeline using Toolkit.

Last week, we talked a lot about publishing and how various types of data flow down the pipe in our simple pipeline. This laid the foundation for future discussions about new features that do not come out of the box with Toolkit. We will revisit the topic of publishing in future posts when we will talk about larger potential features. We also plan to devote some time to discussing the more theoretical and philosophical aspects of publishing and digital asset management. As we move into that realm it will be important for us to have an open dialog with all of you. These topics will be less about how Josh and I did something and more about what could be done, and what approaches to asset management and publishing work best in what situations. It would be great if everyone could start thinking about this now, and if you have some thoughts on how we should approach this or what points we should be sure to hit, please let us know!

This week, I’ve been working on a grouping mechanism for published files and have a working proof of concept implementation that I will share. We’ll discuss the bits and pieces, but also the potential uses for such a feature.

Below are a few of the pages that we found useful for this week’s post:

All About Fields
Query Fields
Publishing and Sharing your Work
Load Published Files
App and Engine Config Reference: Hooks

Grouping Published Files

A simple explanation of what a group of published files is would be to say that they are a collection of multiple PublishedFile entities represented by a single item in Shotgun. From a user’s perspective, using the loader app to reference or import this group would result in ALL of the group’s contents being referenced or imported.

The grouping of published files is not a concept that is native to Shotgun, but it is something that can be added. It requires a number of small changes, and a truly-flexible implementation would take a good bit of thought and additional development beyond what I’ve put into the proof of concept that we will be discussing.

Why Group Publishes?

We should talk about why we would want to do this. What are the advantages of being able to group published files together?

What problems we can solve with this depends on what area of the pipeline we are looking at using it in. Below are a few example use cases, but I’m sure all of you can come up with many more. If you’re using something similar in your studio, or even if you have an idea of how it could be used that I’ve not covered, let us know!

Rendered Elements:

Publishing rendered elements from lighting to be used by a compositor can often produce a large number of published files. Grouping these published files before they flow down the pipe can allow the tools to logically structure these elements in a way that informs other code, like the routine that imports a published image sequence into Nuke, on how these elements fit together. Add to that the ability to store some sort of metadata file (or even a pre-built Nuke script?) as part of the group and it’s easy to see how quite a bit of information about the collection of elements can be gathered and sent along the way.

Another advantage to this sort of organization of published files is that we have a point in time when we know that a set of files are intended to be used together. We can group those compatible published elements so that when the compositor loads those into their session they know that they are getting everything, and that each element should be compatible with all of the others.

Another possibility would be to group in the camera(s) used to render the elements, along with any Alembic caches. This would help the compositor get everything they need into their Nuke script required for 3D compositing, and guarantee that they are using the same camera and geometry caches that the lighter or FX artist used to produce the rendered elements.

Look Development:

An Asset is made up of a number of components by the time it is ready for use in a shot. This typically includes a model, texture maps, shaders, and a rig. Each of these components come together to make the logical whole of the Asset, and a change to one component often requires updates to one or more of the others. An example is that when a model changes, if that requires the UV layout to also change, then the texture maps produced for the previous version of the model might need to be tweaked before they can be used on the new model. Similarly, topology changes might require rigging to adapt their work to the new model.

Given that at some point along the way we know what texture maps and rig pair properly with a specific version of the model, we could group those published files together. We would have a nice package of files that we know are meant to be used together.

Geometry Caches:

We discussed last week publishing multiple Alembic caches out of a scene file rather than a single cache containing the entire scene. One situation where that can be used to our advantage is when multiple resolutions of a mesh are exported and published from an Asset’s rig. It’s fairly common to build more than one resolution of geometry into a rig, as this allows background characters (or, even more importantly, crowds) to be made up of lower-density geometry than foreground, hero characters. It is also typical for these different geometry resolutions to be incorporated into a single rig, as it allows a rigging team to develop and maintain one rig rather than duplicating effort across multiple rigs for the same character.

This setup lends itself well to exporting an Alembic cache per resolution of the character. In this way, in Maya we would end up with a cache reference per resolution of Asset, and we could provide a tool that unloads/reloads those references when the user requests a specific resolution of the Asset to be used.

As for how grouping comes into play, the idea would be to bundle up all of the cache resolutions that were published for an Asset and provide a single group that has linked to it each of those caches as children. When a user loads the group into their scene, that group flattens out to a component list of published files and each one is referenced into the scene accordingly.


The basic implementation of groups as I have built them is simple. I added a children field to the PublishedFile entity in Shotgun. This field is configured to take a list of other PublishedFile entities, which are then considered to be its children. It’s simple and flexible, and creating new fields is a piece of cake.

This led to some difficulty, however. I wanted to add a couple more fields related to child entities and figured I would be able to make that happen with query fields. I was mostly right about that, and was able to get one of the two working after some frustration. I wanted to create is_group and child_count fields. The former comes in handy if you want a quick yes/no answer on whether something is a group, and the latter is more useful in the Shotgun web interface, as it’s a visual indicator of how many children a group has. I wanted is_group to end up as a boolean field, but I was not able to figure out how to make that work as a query field. I’m not saying it isn’t possible, only that I got frustrated and moved on before figuring it out. As for child_count, I did get it to work, as you can see:

You can also see that it took me 36 publishes to get one that was completely correct.

I got it to work, but I honestly don’t know why or how. Below is what I did to make it work, but I couldn’t for the life of me describe why that gives me the correct behavior. I just tried stuff until it did what I wanted it to do, but the words in that query field configuration dialog make little or no sense to me. I’m sure there’s a logical structure there, but to me it is nearly inscrutable.

I don't know what this means.

Initially, I purposely did not speak with an expert about the hows and whys related to query fields. I figured I would take a crack at it the way that I normally would have when working for a studio and see how it went. Randomly flailing about until it works is a time-honored tradition of mine, but it’s obviously not the ideal way to have to learn something.

Since then, I’ve had the opportunity to take a second look at this, and also got some feedback from some of the team. The general consensus seems to be that it would be best to not use query fields for this sort of thing at all. Instead, it would be better to make them normal fields and have the publish routine populate them at the time the group is created. This is simple and also bypasses the limitations of query fields; you can’t filter or sort on them, and they’re not accessible via the Python API.

Code and Configuration:

From a code and configuration standpoint, there were a few hoops to jump through. My first thought was to make use of the publish app’s post_publish hook to build the groups. It looked great, because it’s already provided a list of secondary publish tasks, which is exactly the list of things that I want to group. There were problems, though, as the secondary tasks did not come with the accompanying published file records from Shotgun, and I didn’t want to have to go to the database and look that stuff up again when I knew that the data was already available in the secondary_publish hook. What I ended up doing was taking the data returned from the publish routine and shoving it into the secondary publish item. Since that item is a reference to the same dictionary that is passed to the post_publish hook, I was able to save myself a trip to the database. You can see that here and how I extracted and used it here.

There’s another problem, as well, which is that in the official release of the publish app, the post_publish hook does not receive the same arguments as the other publish hooks.

This is most unfortunate.

Since I was planning to publish something from this hook I needed more than I was given, which exposes a bit of an unfortunate circumstance. Hooks are intended to allow for customization without the need to take ownership of an entire app. This is fairly successful, but with this I ran into something that required that I take control of the app itself, because I needed to change what a hook receives.

What I’ve done is fork tk-multi-publish, which can be found here. The changes are minor, and all of it is simply to provide more bits of data to the post-publish hook. The specific commit for these changes can be found here.

A more flexible solution might be to use the parent application object to store a dictionary of data that can be shared between hooks. That would allow one hook to store something away that another executed later could make use of. We will be discussing this as a team very soon, as it’s a situation that comes up often and it would be good to have a general-purpose solution for it.

As for configuration changes, I had to update several small things. You’ll notice in this commit in my forked publish app that there are two new keys added to the app’s info.yml file; one specifies whether to group a secondary output type’s publishes, and the other specifies the name of that group should one be created. The rest of the configuration changes can be found here and are very straight forward.

I know that I could have avoided altering the app itself by performing the grouping operation in the secondary publish hook, because it would have had access to all of the data that I needed already. Had I been doing this as a TD at a studio that’s exactly what I would have done, but in this case it seemed like a good example of the limitations of how publish hooks work.

Manifests and Metadata:

You might have noticed that the screenshot earlier in this post showing my group in the Shotgun web interface lists the path as a JSON file.

If you did then you also probably noticed the template that was added to templates.yml. In my hacked-together implementation, all I’ve done is shove the children of the group into a JSON file and used that as the path for my group.

This could just as easily contain nothing or anything deemed useful. Josh also had the idea for groups of rendered elements that the path for the group could be a Nuke script. It might be really cool for, as an example, an FX artist pumping out complex elements from Houdini to do their slap comp of everything the way that it should be put together, and then publish a group of elements with the group itself containing the node network that properly pieces the elements together. This would allow for custom comp setups curated by the artist handing off the elements. The possibilities are vast, so use your imagination and then tell us about it!


That’s it for this week. I hope that we’ve given everyone some things to consider and talk about. As always, we would love to hear from anyone that has comments, whether they be public or private. In fact, a portion of next week’s post will be directly related to comments we’ve received, as Josh will be going into publishing rendered elements from Maya for use in Nuke. In addition, he will be outlining how we’ve been managing our custom code via the tk-framework-simple framework. We will also update everyone about some small tweaks to Toolkit that we will have made (or will be making) that have come out of this project. Our hope is to continue to find things that we can immediately put into use that will make life easier for everyone using Toolkit.

Next week marks the halfway point for this blog series. Our intention is to dive into larger, more discussion-heavy topics as we progress. Some of those potential topics I mentioned at the end of Week 3, but we are always open to ideas, so feel free to let us know things you would like to see in the future!

About Jeff & Josh

Jeff was a Pipeline TD, Lead Pipeline TD, and Pipeline Supervisor at Rhythm and Hues Studios over a span of 9 years. After that, he spent 2+ years at Blur Studio as Pipeline Supervisor. Between R&H and Blur, he has experienced studios large and small, and a wide variety of both proprietary and third-party software. He also really enjoys writing about himself in the third person.

Josh followed the same career path as Jeff in the Pipeline department at R&H, going back to 2003. In 2010 he migrated to the Software group at R&H and helped develop the studio’s proprietary toolset. In 2014 he took a job as Senior Pipeline Engineer in the Digital Production Arts MFA program at Clemson University where he worked with students to develop an open source production pipeline framework.

Jeff & Josh joined the Toolkit team in August of 2015.

Labels: , ,


At October 9, 2015 at 3:00 PM , Blogger Raphael Matto said...

Hey guys, great discussion!

I like the grouping idea. Thinking forward, let me see if I understand:

In your lighting shot example, imagine a comp artist imports group 001_SH001.v001 and gets a camera, geo cache, & a few render layers. Then if the lighting TD forgot a render layer & spits it out / publishes it, on publish the lighter is asked if he wants to:

1. Add it to a new group
2. Duplicate the previous group and add it to that

In our scenario here he'd choose #2 (but he'd choose door #1 if anm had changed) ...

... either way you end up with group 001_SH001.v002. The comp artist could then either start from scratch -- if there's new anm & we're going with door #1 -- or update his current comp to get the new render layer (door #2). Or go rogue and dig around on the file system (door #3, there's always a door 3). Do I have that sort of right? One question that occurs to me is: how does the comp artist know that group v2 is a minor update to group v1 and not new anm? Also, many times comp peeps have an in-house tool that simply takes the most recent version of everything. There might be some issues controlling behavior there if you want to preserve the groups. Same with shaders.

Shotgun Pipeline Team


At October 15, 2015 at 12:18 PM , Blogger Jeff Beeland said...

Hey Raphael,

I would imagine scenario #2 would be the way to go. The group-creation/publish process would need to be setup to "backfill" the group with the latest-available publish for each element/file that it finds that's of interest. Meaning the "v002" group would contain that newly-created element, plus the latest-available versions of all other items relevant to that group (camera, geo cache, and other render layers).

That leads to another question, which is that if groups are backfilled, what happens when some published file (or file sequence) SHOULDN'T be carried forward to new groups? The answer to that is actually something we'll be covering in a future post, which is published-file deprecation. Our intern, Jesse, has implemented this already, and has made use of statuses on PublishedFile entities to do so. If a user deprecates a specific PublishedFile entity, it should/would be filtered out during the grouping process. The behavior from there would be to take the "latest version that is not deprecated." If all versions of the published file have been deprecated, then it's no longer included in future versions of the group.

The solution to the desire to always be on the latest version of some element could probably come from a lot of different directions with some time to brainstorm. The way that I've seen this handled in the past revolves around subscription-based workflows, where a record is kept in the database of what each workarea is "currently using." You could then have a custom read node in your chosen compositing DCC app that references a specific subscribed publish file, but not hardcoded to a specific version. I've seen this done by using a spec of some sort that provides information that logic under the hood uses to find the correct subscription and then flattens that out to a path on disk. What that means is that there's no need to "import" a new version of a publish when it becomes available; instead, the user tells the system they're now subscribed to the new version, and the read node then evaluates to the new path. You get the safety of not always reading the latest available, but enough ease of use that it's not disruptive to the workflow to move to a new/different version of some published file/element.

I hope that helps explains some of the possibilities! Thanks a ton for the comment...feel free to keep going. :)


At October 15, 2015 at 12:23 PM , Blogger Jeff Beeland said...

One more thing that I missed!

You mention wanting to have some idea WHAT has changed in the new version of the group. I think that would be possible through whatever app is written to manage subscriptions were you to be working with a system similar to what I described in the previous response. If you know what the user is currently subscribed to, it means that you:

1) Know if something newer is available, and can present that to the user (maybe on file open?).
2) Do a quick version-number diff of all subscriptions compared to what's available.
3) Present a list of "New stuff is available" publishes, and for groups include a list of specific published files contained within that will be new.

Does that make sense and sound reasonable?


Post a Comment

<< Home

<< Older Posts     Newer Posts >>

Our Story

We are industry folk who love production. A handful of us met while building...
Read More

Subscribe to Our Blog

Follow Us!