21 Replies Latest reply on Oct 11, 2012 9:30 AM by Ben Frantzdale

    Ideas from Watching Linus Torvalds on the Software Revision System git

    Jason Nicholson

      I finished watching a video on git by Linus Torvalds located here.  He spoke about the differences in Source Code Management.  Source Code Management is similar to PDM systems.  git was designed to be distributed, branched, and merged.  Branching would be taking a SolidWorks file, modifying it, and then merge back into the SolidWorks file database.

       

      This question/discussion is probably more for the developers.  What can we learn from git to better develop PDM systems?

       

      I'll post more later about git but you can get anything I am going to say from Linus from the video above.

        • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
          Jim Sculley

          Jason Nicholson wrote:

           

          I finished watching a video on git by Linus Torvalds located here.  He spoke about the differences in Source Code Management.  Source Code Management is similar to PDM systems.  git was designed to be distributed, branched, and merged.  Branching would be taking a SolidWorks file, modifying it, and then merge back into the SolidWorks file database.

           

          This question/discussion is probably more for the developers.  What can we learn from git to better develop PDM systems?

           

          I'll post more later about git but you can get anything I am going to say from Linus from the video above.

          Branching and merging is easy for source code.  It is comprised of text files.  Binary data such as a Solidworks file is much more difficult.

           

          Jim S.

            • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
              Jason Nicholson

              Jim,

               

              You are correct with respect to thinking about individual files.  However, you must think of the whole PDM database has what's being updated.  While individual file conflicts can occur, the whole database will not have conflicts that are unsolvable.  So think of branching and merging the whole database.

              • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                Paul Kellner

                Binary vs text files is a non-issue. A computer compares both in a binary world.

                 

                The real issues with SW stem from the fact that from the beginning the developers NEVER considered making SW PDM capable. This means that all PDM systems today (including Enterprise) are basically workarounds that have to deal with whatever SW has put out. After all, can anybody list some API calls that are specific to PDM and revision tracking? I don't think so.

                 

                That being said there are two issues fundamental to Feature Based Solid Modeling (FBSM). There is a final solid representation of an object and there is the path to get to that object. For example a unit cube can be an extrude of a sketch of a unit square, a sweep of a sketch of a unit square, a loft of two unit squares, etc. On the ERP and manufacturing side of things all you care about is the final solid representation (and of course the drawing). On the engineering side of things you care about the way in which the solid is built. If you don't care about this, then go back to AutoCAD. Thilo Trotwein has shown us that it is possible to completely remove the solid information from a SolidWorks files and get back the same solid. This is because a SW file contains both the geometry and the feature tree.

                 

                In the thinking of SW in terms of GIT the final solid could be considered the compiled, finished program. The feature tree could be considered the program that made the solid. If you focus then on the feature tree part of the information in the SW file, then, yes, you can do branching, merging and revisioning just as you would with a program but for one thing. SW never put in the hooks to do this.

                 

                Putting in these hooks is not trivial. Here is why. If you have ever recorded a macro a couple things will be apparent from the beginning. First of all not all macros record what it is that was done. And so they will not, when played back, recreate what the user thought was being recorded. Second, macros record operations on geometry from the user's view point. By that I mean that in order for the macro to recreate what was recorded it needs to be played back at the same zoom and view orientation that it was created at so that screen picks will still work right. It is not guaranteed that playing the macro back will always work. So a macro that creates a feature is not truly parametric unless something is done to it that will remove the necessity of user intervention to pick things.

                 

                On the other hand, SW might have what is necessary to do what GIT needs already. Long ago I learned a trick for fixing otherwise intractable problems. Delete the entire feature tree and then restore it. Obviously SW has to destroy all geometry to do this. And obviously there is some record internal to SW that allows everything deleted to be restored. So if what was deleted and later restored in the feature tree could be "captured" as a true representation of the part, that would be enough to provide the "program" on which to do revision control (git). In fact the whole rollback/delete/restore system is a revision control system of sorts. It doesn't matter one whit whether that information is in binary or man readable format.

                  • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                    Jason Nicholson

                    SolidWorks Files = Individual Units (smallest unit)

                    SolidWorks PDM database = The Sum of All SolidWorks Files

                     

                    Git way of thinking:

                    1.  It must be distributed

                    2.  What you put in must be what you get out. ie file corruption is flagged as serious.  Unchanged files should be the same down to the byte.

                    3.  Branching and merging must be easly and fast.  With the way I am saying, you branch the whole database in theory and merge the database by resolving revisions since last change.  This DOES leave room for individual file conflicts.

                    4.  IT MUST BE FAST.  This is important not because 5 seconds matters compared to 5 minutes but it does matter how often someone commits (updates) the database.  If it takes a 5 minutes to update an assembly in the database, then an engineer will wait a longer time to update the database with there changes.

                     

                    Its best to just watch the video.

                     

                    Please do not get hung up on individual files.  Think in terms of the whole database.  The individual files are the smallest units.  This means conflicts between individual files that have been changed by two different people will have to be resolved seperately.  When you think about it, this is not that big of a deal.  This is what would have to happen anyway.

                      • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                        Paul Kellner

                        Actually, many PDM systems are git like. They pull files from the "repository" to the local machine (local repository) make changes and push them back. SW files already have an internal ID that would take the place of part of the SHA1 hash Linus was talking about. What PDM systems don't do well is branching and merging. That is one part of gitting that I would like to see.

                         

                        Pulling the whole PDM file tree to a local machine is a bit of a daunting task compared to pulling Linux source code.

                          • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                            Jason Nicholson

                            Paul your getting what I am saying.

                             

                            With respect to :

                            Paul Kellner wrote:

                             

                            SW files already have an internal ID that would take the place of part of the SHA1 hash Linus was talking about.

                             

                            I doubt internal ID can account for corruption which does happen.  A hash like SHA1 I think would be better.

                             

                             

                            Paul Kellner wrote:

                             

                            What PDM systems don't do well is branching and merging. That is one part of gitting that I would like to see.

                             

                            .

                             

                            I agree.

                             

                             

                            Paul Kellner wrote:

                             

                             

                            Pulling the whole PDM file tree to a local machine is a bit of a daunting task compared to pulling Linux source code.

                            Yes this could be a problem.  Especially in very large systems.  Here at my company, I have 30,000 files which consumes 60GB.  I use a tool called FreeFileSync to copy them off the network into a folder on my hard drive.  I do this so that I can use SolidWorks Explorer's search functionality which really is "Windows Desktop Search."  64 bit versions of windows cannot index network drives or UNC paths so I maintain a local copy to work around this and search all of our SolidWorks files.

                             

                            60GB is not a problem with today's hard drives.  So for most this isn't a problem.  However, the problem occurs at the large corporate level where you have 1TB or more of files.  So distribution would need be partial rather than fully implemented in this situation.  Distribution would increase SolidWorks file opening performance and thus assembly performance.  Distribution would decrease network activity by getting rid of reduntant read/writes to the network.  For instance, when I use FreeFileSync once every couple days, the total file input/output is ~300mb.  Saving files locally is faster than on the network and thus increase SolidWorks performance for assemblies.

                    • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                      Matt Feider

                      Solidworks chooses to store geometry and model information in a binary format.  I agree that this is hard to do the type of versioning and branching as with source code for programming.

                       

                      if you think about it any parametric modeling system is simply a graphical method to create a macro (series of steps or functions) with parameters/variables (parametrics) that are processed logically from the beginning to the end to form geometry.  Thinking abou it this way sure sounds like programming to me.  If there was anyway to save a solidworks file with just the macro information (feature tree, and all the parameters) in a text file why couldn't we store information like this.

                       

                      Pro/E used to be able to do something simlar called Trail files.  You could rerun these trail files at any time to fully recreate entire parts and models.  And if you think about it that information should be all you ever need as long as the rebuild always produces the same result.  That is of course assuming you can export out all these paramters and graphical choices you make as you use the tool.  I think anyone who has used configurations knows that even solidworks doesn't even allow access to some choices to be different between configurations.  Those items are often such because there is no easy way to get to the underlying data in the above fashion.

                       

                      --Matt

                        • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                          Jim Sculley

                          Matt Feider wrote:

                           

                          Solidworks chooses to store geometry and model information in a binary format.  I agree that this is hard to do the type of versioning and branching as with source code for programming.

                           

                          if you think about it any parametric modeling system is simply a graphical method to create a macro (series of steps or functions) with parameters/variables (parametrics) that are processed logically from the beginning to the end to form geometry.  Thinking abou it this way sure sounds like programming to me.  If there was anyway to save a solidworks file with just the macro information (feature tree, and all the parameters) in a text file why couldn't we store information like this.

                           

                          It has been tried before.

                           

                          http://drgoulu.com/2006/09/30/cadml/

                           

                          It is slow.  If you've ever downloaded a model from an OEM that uses macros to distribute their part models, you will see that 'rebuilding' a model from a macro is much slower than a normal rebuild.

                           

                          Jim S.

                            • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                              Matt Feider

                              agreed, as the binary store of the model geometry is of benifit to the user.  Only storing the macro wouldn't work at the assembly level until the geometry had been created. But I do think the point is still valid, what if the PDM system could tell you the differences in the macro that built two versions.  Kind of like the user checking out both and running solidworks utilities --> compair features on them to do the same.  I agree the system would have a hard time branching or controlling the geometry.  But my point wasn't to separate the macro from the geometry.  Just to point out that the PDM tools could read the macro information and have an extra layer of information on the files they control.  At this level they could definetly do similar things to git.

                               

                              Just think what we could do if we could play at that macro level and merge two macro together in a controlled way.  Of course a rebuild would be necessary to generate the new geometry.  You can kind of do this with copy/paste now, but not with everything.  And of course there are always fixes to be done as you change child/parent relationships and geometry.

                               

                              --Matt

                                • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                                  Jason Nicholson

                                  This is not about branching and merging individual files.  This is about branching and merging the whole database while thinking of the indivdual files as the smallest units.  Watch the video.

                                    • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                                      Matt Feider

                                      understood, but you have to walk before you run, such a tool would have to deal with the building blocks first before it can deal with the larger structure.  If you are just looking at the database (i.e. an entire assembly, or project) most PDM systems aleady have BOM comparisons to show what the assembly structure consists of at different revisions/versions.  So if your branching/merging simply consists of looking at the structure and replacing pieces that is achievable with many tools today.

                                       

                                      PDM systems strong point are their ability to manage the relationships between the CAD files.  This is in fact one of the areas that most version control systems lack.  Subversion, CVS and many others simply control a directory of files.  They aren't 'smart' enough to read the relationships between code files (includes or other) to know what files are really required.  They simply work on an entire directory or folder structure in the Operation system.  I haven't personally used GIT but I don't think it has the ability to understand references.

                                        • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                                          Paul Kellner
                                          So if your branching/merging simply consists of looking at the  structure and replacing pieces that is achievable with many tools today.

                                          That is not how git handles branching and merging of files.

                                          PDM systems strong point are their ability to manage the relationships  between the CAD files.  
                                          This is in fact one of the areas that most  version control systems lack. 
                                          Subversion, CVS and many others simply  control a directory of files.

                                          If you had listened to Torvalds video you wouldn't have made this comparison. git does not control a directory of files.

                                           

                                          But you are right, git does not have the ability to control or recognize the relationships among files. Nobody ever said it did, just that SW version/revision control should work like git. As I mentioned before, SW has never put PDM hooks into their software. SW is not open source and does not follow a published standard for file handling.

                                           

                                          As I am reading up on how git works, however, it may be possible for it to handle changes in SW files without knowing much about them and in a robust way and without screwing with the cardinal rule of SW file handling, "two files may not have the same filename". Put it still won't be a PDM system as it won't be able to track changes to parts within assemblies or configuration parts nor will it handle updating custom properties or drawing notes.

                                           

                                          But, since it is open source, some enterprising API programmer might be able to add this in.

                                • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                                  Paul Kellner

                                  Instead of watching, or perhaps before watching the "Wrath of Torvalds" check out this link which has a succinct summary of how git works. LINK

                                  • Re: Ideas from Watching Linus Torvalds on the Software Revision System git
                                    Ben Frantzdale

                                    Jason, I really like the direction you are taking this. I have a software and mechanical background: before I touched SolidWorks, I had used RCS, CVS, SVN, and Perforce. When I first saw my PDM options, I was shocked and horified that the mechanical-engineering tools are stuck in the dark ages. My group started using PDMW, but with the limitations I saw out of the gate, I started using Git (Tortoise Git) on top of it and so far it has worked great. On a regular basis, I update from PDM, overwriting my local files, and then check that change into Git. I have maybe a hundred commits of our ~900MB project (that is, the moral equivalent of hundreds of copies of my SolidWorks project) and it only takes up 5.1GB. Whereas sometimes our PDM administrator has to turn off access to PDM for hours to "create a snapshot", I just do "git tag prototype_build_1", which is instant, and keep right on working.

                                     

                                    Merging is a pain for two reasons, but better than not being able to branch. As you point out, individual files aren't mergable. That's anoying but as you point out, not too bad if you aren't stepping on anyone's toes, since Git happily sees my branch as changing a file while the other branch didn't touch it and resolves things using my version. The bigger annoyance is that assemblies automatically rebuild, so, e.g., drilling a hole in one little part will change everything up to the top-level assembly file even if none of the mates changed. As a result, when I merge a change I try to keep my changes to one or two parts and try to keep the main copy of the destination assemblies. Then I just rebuild the main assembly and check in the merge. But Git has a lot of settings that would allow this sort of thing to be addressed...

                                     

                                    Echoing your original post, I think distributed version control has a lot of potential for SolidWorks. With SolidWorks files being so big, being distributed should be great for multi-site organizations: Each site could have a repository that pulls from the other sites so no engineers ever have to wait for hundreds of megabytes to download from the "one master repository" on the other side of the country.

                                     

                                    The lowest-hanging fruit I see would be a script to check if the feature tree of an assembly has changed or just the "compiled" solid model. With that, many merges would go very smoothly. Any idea how to do that?