I am at a new job-site now. And I was thinking of extending
JdbcSql to help learn a schema faster using some techniques discovered at Harbinger/Peregrine and nuBridges.
So essentially you would run survey mode and it would return something like the following:
Summary Statistics:
- # catalogs
- # tables
- # views
- # sequences
- # indexes
- # columns
- # schemas
- # data types
- # native types
Table Row Count Statistics:
- # empty tables
- # tables with <>
- # tables with 10-99 rows
- # tables with 100-999 rows
- # tables with 1000-9999 rows
- # tables with 100000-99999 rows (and so on until the largest table is found)
Table Column Count Statistics:
- # tables with 1 column
- # tables with 2 columns
- # tables with 3-4 columns
- # tables with 5-7 columns
- # tables with 8-15 columns
- # tables with 16-31 columns
- # tables with 32-63 columns
- # tables with 64-127 columns (and so on until the table with the most columns is found)
Column Name Distribution Frequency
- # column names that appear in 1 table only
- # column names that appear in 2 tables
- # column names that appear in 3-4 tables
- # column names that appear in 5-7 tables
- # column names that appear in 8-15 tables (and so on until the column name that appears most is found)
List of all tables by name - with # rows, #columns,#columns in other tables
List of all tables by row count - biggest to smallest
List of all tables by column count - biggest to smallest
Then perhaps some table specific analysis
Tables with columns, columns with count(distinct foo)
The format of the output could be:
- pure text
- self-referential html
- xml
- database create schema
Building a full monitoring schema would be even better as well as a set of screens that let you drill down on a particular table at choice. Or perhaps a front-end in swing that allows you to drill-down while a back end is performing column analysis - that way you wouldn't have to wait.
Wouldn't seem to be very difficult and would give me a project to do swing experimentation.