![]() |
Thousands of Objects in a Graph |
Post Reply
|
Page 123> |
| Author | |
sap630
Contributor
Joined: 22.Sep.2009 Points: 10 |
Post Options
Thanks(0)
Quote Reply
Topic: Thousands of Objects in a GraphPosted: 22.Sep.2009 at 08:39 |
|
Supposing we have 5000 objects called "object_type_1" linked to 10000 objects called "object_type_2" in one graph (picture one very large ERD where object_type_1 is entities, and object_type_2 are attributes). What options are available to view all this information section by section (i.e. small subsets)?
Is it possible to link relationships across seperate graphs instances? If I now create a new object_type_1 object in the graph (5001 object_type_1's now exist), and I need to Connect it to specific object_type_2 objects that already exist; is there a way to filter the 10000 objects based on their name property? |
|
![]() |
|
stevek
MetaCase
Joined: 11.Mar.2008 Points: 643 |
Post Options
Thanks(0)
Quote Reply
Posted: 22.Sep.2009 at 11:50 |
|
Technically, you could have one graph (conceptual information, abstract syntax) with several diagrams (representational information, concrete syntax). The conceptual graph has 15000 objects, but each diagram only shows say 50 of them. In each diagram you'd have visible the objects you want in that diagram, plus any others from other diagrams that you want to see relationships to. You can use the right-hand column of the Graph Browser in the main MetaEdit+ window to filter the objects based on their name property and type, and copy and paste the desired object_type_2 from there.
A better approach would be to decompose your graph of 15000 objects into sensible units. Studies of human cognition show that we simply can't work well with such a large number all in one graph, even if we filter or use views. But if we break it down into subgraphs/modules, each of which can be considered at a higher level as its own unit, we can cope fine. You'll probably end up with 3 levels of graphs: 1 top-level graph with 20 "module" objects, each of which decomposes to its own graph with 20 "module" objects, each of which decomposes to a normal graph with 10-15 object_type_1 and 20-30 object_type_2 instances. An alternative would be 4 levels, with 7-8 module objects rather than 20 - or then a combination, 3 levels deep in some places and 4 in others.
Aim for modules with high cohesion (the objects in there make sense together and interact with each other) and low coupling (minimum number of relationships across subgraph boundaries). To make a relationship from an object A in one subgraph to an object B in a different subgraph, you can simply reuse object B in the first subgraph. Alternatively, you could have a new object type object_type_2_ref, with a single property that points to an object_type_2. In either case, you can open the Info... dialog of the object_type_2, to find out in which graph it is defined.
With 15000 objects you probably also need to think about integrating the work of multiple users. The multi-user repository of MetaEdit+ makes this easy and transparent, so all your users can work together - much simpler than trying to merge and reconcile multiple independent edits with an old-fashioned textual version control system.
|
|
![]() |
|
sap630
Contributor
Joined: 22.Sep.2009 Points: 10 |
Post Options
Thanks(0)
Quote Reply
Posted: 23.Sep.2009 at 11:56 |
|
Thanks Steve.
Any chance you have a MXT file describing the CWM (Common Warehouse Metamodel) specification? Also, any XSLT to convert CWM models into MXM files? |
|
![]() |
|
stevek
MetaCase
Joined: 11.Mar.2008 Points: 643 |
Post Options
Thanks(0)
Quote Reply
Posted: 23.Sep.2009 at 13:18 |
|
Sorry, we don't have an MXT file for CWM. The CWM XSchema is over 1MB, 74 packages, 470 classes. As with all XSchemas, it's massively underspecified for use as a metamodel, so there's no way to automatically make a good MetaEdit+ metamodel for it. Instead, you need to understand what they intended, and make your own decisions about what makes a good modeling language for human use, as opposed to just being able to store the data.
Experience shows that a MetaEdit+ metamodel to contain the same information as an OMG XSchema is much smaller and easier to understand - much of the bloating of OMG schemas is due to the unsuitability of MOF to describe metamodels and of XMI to store them.
If you don't need full CWM compatibility, but just to import an existing data set, I'd suggest making your own metamodel based on the needs of your domain. You can then build a naive text-to-model transformation that is able to read just what is in your existing data set, and build the MXM file you want. That's a couple of orders of magnitude faster than trying to make a full, bulletproof XSLT and MXT for CWM. And remember that even if you had the full versions, the chances of being able to import correctly from all other tools that claim CWM support are slim indeed (cf. XMI for UML).
|
|
![]() |
|
sap630
Contributor
Joined: 22.Sep.2009 Points: 10 |
Post Options
Thanks(0)
Quote Reply
Posted: 15.Oct.2009 at 23:55 |
|
Is it possible for one meta-meta-model enforce the design of a meta-model, which enforces the design of a model?
e.g. The ERD in Examples. |
|
![]() |
|
stevek
MetaCase
Joined: 11.Mar.2008 Points: 643 |
Post Options
Thanks(0)
Quote Reply
Posted: 16.Oct.2009 at 00:13 |
|
I'm not sure I understand your question, but I'll try and answer.
The meta-metamodel in MetaEdit+ is GOPPRR, which is fixed. You can use the metamodeling tools to define your own metamodel, as we did when building the ER metamodel. Having done that, you can use the modeling tools to build your own models, as we did when building the example ER diagram "Orders and Products". The model conforms to the metamodel, and the metamodel conforms to the meta-metamodel.
If you are envisaging multiple layers of people who can "enforce the design" of the next level down, that can work well too. You don't need extra meta-levels, though. For instance, you can make a base metamodel and give that to a few other metamodelers, each of whom can make extensions to it (in accordance with your instructions of what they are allowed to add, change, subtype etc.). Each extended metamodel can be given to a group of modelers, who can make models that will conform to that extended metamodel - and also to your base metamodel (insofar as your instructions require).
We also have customers who partly automate the process of extending a metamodel, to make sure that the middle level of metamodelers only follow the top level's instructions, or simply to make it easier for the middle level.
As we did in the graphical GOPRR modeling language, you can also build a modeling language whose domain is "modeling languages", and which generates the MetaEdit+ metamodel XML import format, MXT files. By drawing a model and pressing the Generate button, you can thus create a metamodel. Your metamodeling language could be similar to GOPRR or completely different: the only requirement is that it generates valid MXT files.
|
|
![]() |
|
sap630
Contributor
Joined: 22.Sep.2009 Points: 10 |
Post Options
Thanks(0)
Quote Reply
Posted: 31.Oct.2009 at 15:02 |
Fascinating; dunno how I missed that manual. The Family Tree example is all about metamodeling the concept of a family trees. but using the individual tools (graph tool, object tool, etc). I had no idea that the GOPPR project along with the link you gave me would make meta-metamodelling easier! It didn't click that Figure 1-3 from the Evaluation tutorial:
can actually be used in the GOPPR project, of which you can then Export and Build. Here is a broad description of our current process in a typical data warehouse environment:
Database design in steps 2 and 3 is done visually (MDA development) with automatic code generation. Extract-Transform-Loads (ETL) is also done visually with automatic code generation using another tool. The current tools are powerful in what they do best. Our biggest problem, however, is that all our metadata is scattered everywhere, so we are investigating the use of a central repository such as Apache Jackrabbit (or the commercial version called CRX). In addition to having a central metadata repository, we would need:
As such, I am trying to determine if MetaEdit+ would be an ideal tool for:
Also, wondering if there are any plans to open up the MetaEdit+ repository into a more JCR (Jackrabbit/CRX) like repository? |
|
![]() |
|
stevek
MetaCase
Joined: 11.Mar.2008 Points: 643 |
Post Options
Thanks(0)
Quote Reply
Posted: 02.Nov.2009 at 13:38 |
|
So what you need is a modeling language for describing database schema and ETL transformations. It will have concepts like Table, Column, and various ETL operations, e.g. Split to split a string value based on a separator character.
You can then build models of your databases and transformations, e.g. in the first RDBMS is a Table "Employee" that has a Column "Name"; in the second RDBMS is a table "Personnel" that has Columns "First Name" and "Last Name"; and between them is an ETL transformation that uses a Split from "Name" with the first part (role) going to "First Name" and the second to "Last Name":
/----first---> "First Name"
"Name" ----- "Split on: space"
\---second---> "Last Name"
Hopefully it is obvious this is a simplification! The main thing I wanted to make clear is how the various meta-levels would work. That will stay the same whether you have one column or 50 000, and whether you have a simple modeling language or a complex one. You can extend the modeling language as you go, as you mention in point 3 above.
By modeling all this in MetaEdit+, you can solve the problems you currently face by having several tools. If you change "Last Name" to "Surname", you don't need to find the places in both your ER tool and your ETL tool where that is referenced: you just change it once in the model in MetaEdit+, and that change is visible in both the schema models and the ETL models. I'd probably have separate Graph types for schema modeling and transformation modeling, with the Column type used by both. The schemas define the Column objects; the transformations use them. You can have multiple users working in the same MetaEdit+ repository - some building schemas, some building transformations, maybe someone extending the modeling language. Versioning and locking happen at the level of objects, so you can work together without the tool getting in the way.
You can write generators to check the things you need to ensure, e.g. that every column in the target database is mentioned on the RHS of some ETL rule, and that the ETL rules only reference columns that are actually in the respective tables. You can make the warnings from those checks show up when you want, e.g. only when doing a build, or instantly in the diagram if a modeler tries to connect an illegal column (e.g. makes the mapping backwards).
You can also write generators to produce the SQL that creates the schemas in your first and second databases, and the ETL script (whatever format your ETL engine needs). If you prefer, you can export models so your existing ER and ETL tools can open them (exporting to XML and transforming with XSLT, or writing a generator to create a text format readable by the tools, or using the MetaEdit+ API to access the model data directly).
As for Jackrabbit: to be honest, I don't think just having all the models (i.e. your "metadata") in one content repository is enough of a solution. That just puts it in one place; you can do that by putting your current ER and ETL files on the same hard disk! The big question isn't where it is, but are there tools to access it and know what it means (i.e. that a Column is defined in the schema and used in the ETL). With MetaEdit+, you get the repository and the tooling.
|
|
![]() |
|
Luc
Member
Joined: 05.Nov.2009 Location: Australia Points: 3 |
Post Options
Thanks(0)
Quote Reply
Posted: 05.Nov.2009 at 03:40 |
|
Steve - our application has the equivalent of thousands of database tables, tens of thousands columns, two thousands screens/web pages, tens of thousands of fields, etc. Starting from any one of these objects we generate source code for our application. For example, starting from a database table definition we generate DDL and database access methods. From a screen definition we generate validation rules for each updatable field on the screen.
These objects are linked to each other, thus a screen field is linked, via an intermediate abstract object we call "element", to database columns. (Maybe we could have modelled our metadata differently - however that is what we have now). The object type "element" contains the specifications of each data item, whether a database column, screen field, attribute in an xml message/document. These specifications contain the usual data type information, plus lists of domain values and their representation on different media - for example for an element called "maritalStatus" a domain value would be "single", and its representation might be "SIN" for storage on the database, "Single" for display on a web page or screen, and its xmlName might be "mst".
Now, suppose we have a separate graph for each database table, and a separate one for each screen or web page, since these graphs all connect via the element objects, how do we avoid replicating the information from the elements into the two types of graphs?
Further, in another view we might want to draw an ER diagram of the tables. It is true that it is not useful to graw an ER diagram containing thousands of tables, so it does make sense to group the tables somehow. However, this does not remove the need for a table in one group to be linked to another table in a different group. So, how would we deal with the situation where we need to draw a model of that particular link and neighbouring tables?
Thus, we have hundreds of thousands of metadata objects interlinked in various ways. We were wondering if it would be possible for a user to select an object, say a particular database table, and for metaEdit+ to draw a particular type of graph relating to the table, its columns and the elements? And, in another use-case, for metaedit+ to draw a different graph, eg an ER diagram centred on that table, but containing adjacent tables, perhaps one or two or three steps away?
Thanks
|
|
![]() |
|
stevek
MetaCase
Joined: 11.Mar.2008 Points: 643 |
Post Options
Thanks(0)
Quote Reply
Posted: 05.Nov.2009 at 13:10 |
|
Luc - having the "maritalStatus" element in many graphs, even of different types, is no problem for MetaEdit+. In fact, it's what MetaEdit+ does best! The same object can be in many graphs: it's not a copy or duplicate, it's the same identical object. Graphs point to their objects, rather than strongly containing them: many graphs can point to the same object.
Similarly, objects can point to other objects. This allows you to create references, so you don't need to directly include a 'foreign' object in a graph, but can have a different type of object directly in the graph, and have that object point to the 'foreign' object. In your ER diagram example, that's one way of modularizing your database: try and keep most links between tables internal to the module containing those tables, but allow links to tables in other modules via these reference objects. Of course nothing stops you from directly including the elements from outside the group if you want; it's just often easier for modelers to understand if you make explicit which links are considered internal and which external.
As you say, the exact choice of how to model and link screen fields with database tables is an open question. At the small/new/simple end of the scale, people make the screen fields primary: they just want to model the UI and have the database automatically generated. At the large/legacy/complex end of the scale, people make the database primary: the schema exists and is largely fixed, and when you create a UI field you need to link it to some existing database column. Somewhere in that scale is the best solution for your needs, and we'd be happy to help you find that.
As to generating graphs on the fly, that's certainly possible. The MetaEdit+ generators can produce new graphs in MetaEdit+'s Model XML format, and import those for the modeler to see. Reading a new graph however is hard on the human brain, a bit like a map to an unfamiliar town, or even worse a familiar area where all the towns have been rearranged into different positions. As far as possible I'd thus aim to make the existing models naturally answer the questions the modelers are likely to want to ask - that's largely a question of creating the right metamodel, e.g. one with extra concepts to better cope with questions of large scale (compared to ER diagrams).
Tooling helps here too: e.g. you can select any object in a graph and ask for its Info, which will show you all the other graphs where it is used and allow you to jump directly to that object in those graphs. Another nice feature of MetaEdit+ is the generation of reports that are linked to objects: e.g. you could create reports that would show the information that the modeler would want, and he can then double-click the text of the desired object in the report output and jump straight to that object in the model. These features are really useful when you want to explore a large model.
|
|
![]() |
|
Post Reply
|
Page 123> |
| Tweet |
| Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |