mitchell_j_friedman

Name:
Location: Decatur, Georgia, United States

Thursday, December 29, 2005

Got the writing bug again

It seems like once I got that next job - contract java developing for Intercontinental Hotels Group through Matrix Resources - I was done writing.

And I probably wouldn't be writing here now - but I lost the key to ultra edit and that's pretty much how I update my faq

So rather than updating that I'll try to get this to be a little more constant - using my buddy Kipp's blog for inspiration.

Thursday, March 10, 2005

Learning a database schema faster(er)

I am at a new job-site now. And I was thinking of extending JdbcSql to help learn a schema faster using some techniques discovered at Harbinger/Peregrine and nuBridges.

So essentially you would run survey mode and it would return something like the following:

Summary Statistics:
  • # catalogs
  • # tables
  • # views
  • # sequences
  • # indexes
  • # columns
  • # schemas
  • # data types
  • # native types
Table Row Count Statistics:
  • # empty tables
  • # tables with <>
  • # tables with 10-99 rows
  • # tables with 100-999 rows
  • # tables with 1000-9999 rows
  • # tables with 100000-99999 rows (and so on until the largest table is found)
Table Column Count Statistics:
  • # tables with 1 column
  • # tables with 2 columns
  • # tables with 3-4 columns
  • # tables with 5-7 columns
  • # tables with 8-15 columns
  • # tables with 16-31 columns
  • # tables with 32-63 columns
  • # tables with 64-127 columns (and so on until the table with the most columns is found)
Column Name Distribution Frequency
  • # column names that appear in 1 table only
  • # column names that appear in 2 tables
  • # column names that appear in 3-4 tables
  • # column names that appear in 5-7 tables
  • # column names that appear in 8-15 tables (and so on until the column name that appears most is found)
List of all tables by name - with # rows, #columns,#columns in other tables

List of all tables by row count - biggest to smallest

List of all tables by column count - biggest to smallest

Then perhaps some table specific analysis

Tables with columns, columns with count(distinct foo)



The format of the output could be:
  • pure text
  • self-referential html
  • xml
  • database create schema
Building a full monitoring schema would be even better as well as a set of screens that let you drill down on a particular table at choice. Or perhaps a front-end in swing that allows you to drill-down while a back end is performing column analysis - that way you wouldn't have to wait.

Wouldn't seem to be very difficult and would give me a project to do swing experimentation.

Sunday, February 27, 2005

Loading Data from Public Internet into Local Data Files

In support of the Free World Address Database, I had started a project to simplify loading publicly available data into a local database.

Some assumptions:
  1. The code would be written in java and made available open source probably as part of LGPL
  2. The data would not be duplicated, it would be retrieved directly from the Internet
  3. Loading the data would be tested on my local copy of MySQL
The first step of this project is essentially complete. As documented on my open source java page, a person could download my java code, compile and run it and load local tables. Well almost. Still need to:
  1. Add a main to LoadDataFile
  2. Build a jar file
  3. Give instructions and documentation on running LoadDataFile from jar file
In any case, multiple files are now available using delimiters as well as positional fields:
  1. country sub entities
  2. census counties
  3. census zip codes

Next steps:
  1. finish census by including census tracts, places, county subdivisions
  2. implement sax processor to load country info from cia - http://www.cia.gov/cia/publications/factbook/appendix/appendix-d.html
So far this has been a fun little project

Thursday, February 24, 2005

Support Census files

I have been interested in working with geographical data for a long time, which is why I put together the Free World Address Database page.

I have also been looking to do some open source work in both java and mysql, so it seems like making the Census Files more available, would be an interesting piece of work.

1. Create a utility (in java) which would take an input file and a set of field locations and insert a comma at those points. Something like
java FixedToDelim <InputFile> "3,5,14,22" ","
2. Create a set of database tables for all census bureau files:
  • Census Tracts
  • Places
  • Counties
  • County Subdivisions
  • Zip Code Tabulation Areas
3. Document procedure for creating the tables, downloading the census tables, FixedToDelim, and then load the database tables (into mysql).

4. Perhaps even create a further java program which would take db access parameters, connect to the database, create tables if necessary, get files from web, convert them, update database.
java LoadCensusTables "<dburl>"

Sunday, January 30, 2005

Starting this blog up

I haven't been much of a blogger to date, neither as a reader or a writer. But I have been a ranter. I rant in person, sure, but also in email, on a corporate twiki, and even on personal web pages. So I figure I'd start this blog off by transfering my various rants from those places to here - one rant at a time. So here's what's coming:

    From my home page
  • Osmosis or Diffusion? Thesis or Hypothethis?

  • The Pizza Algorithm?

  • Free World Address Database

  • Resume FAQ

  • Science Fiction Recommendations



    From Elsewhere:
  • Six Degrees

  • Yahoo Egroups

  • Wikis

  • Skills Inventory



    Corporate wiki:
  • Easy Fudge Recipe

  • Other Recipes

  • Quotes Page

  • Amusing Stuff



So if I were to do one blog a day, and devote that to each of the above,
that would keep me busy blogging for almost two weeks - pretty unlikely,
but possible.

mjf

Thursday, January 06, 2005

Is this thing on?

Well I've meant to do this and Kippster inspired me, so we'll see where this goes...