Sunday, November 05, 2017

visualizing data

this post is not about a completed project or a problem figured out; this is a work in process, a musing over a problem that I'm trying to solve...

so I got this large data set composed of 1000's of files; each file contains about 1000 records of data. the files are text files.

each record in the file contains the following information

  1. date
  2. uptime
  3. ps output, a few thousand entries with process info
    1. pid
    2. ppid
    3. owner
    4. cmd line
    5. time
    6. tty
    7. etc
  4. top output
    1. pid
    2. ppid
    3. owner
    4. cmdline
    5. cpu time
    6. mem used
    7. shared mem used
    8. etc
  5. lsof output
    1. pid
    2. owner
    3. cmd line
    4. files
      1. f1
      2. f2
      3. f3
      4. .
      5. .
      6. .
      7. fn
  6. gpfs status

The questions that I want to answer are

  1. what processes are running at any given moment in time
  2. how many processes are running at any given moment in time
  3. resource usage utilization of any process at any given moment
  4. resource usage utilization of any process over a time period
  5. which files are open at any time
  6. the opening and closing of files over time by any given process or all processes
  7. be able to answer any question not limited to 1..6 above
so should I store it all in a database or in memory? memory should be ok, there's plenty of RAM, at least for now.

the process info should be organized by (time, pid), then (pid, time).
time should convert to unix time




0 Comments:

Post a Comment

<< Home