DTIC "Topic Space" Data Visualization, 1882 to 2011
"Topic Space" refers to a 3D space used to differentiate one article from another based on the extent to which it falls into one or another topic.
Legend
The hyperglyph legend for these visualizations (click to enlarge).
Geospatiotemporal Visualizations
Images below represent all articles with abstracts containing the term "neuro" from 1882 through 2011
Examples
Search Term - 'medicine'
|
Search Terms - 'laser,optic'
|
Search Term - 'cyber'
|
DTIC Author Topic Space Visualization, 1882 to 2011
The method begins with an initial search of seven "core" keywords, in this case I chose 1) physics, 2) chemistry, 3) biology, 4) geology, 5) engineering, 6) military, and 7) social. My choice of these seven was based on a manual review of article field categories. An entirely different set of 7 or any number of "core words" is worth much exploration. The next step consists of collecting all the keywords in all the articles that contained a "core" keyword. In other words, first I searched on "physics", collected all of the keywords in all the articles with abstracts containing the word "physics", and did a word frequency count of those words. I did the same thing with the other six "core" keywords. This results in a "core keyword vector" of 7 columns and several thousand rows. This file can be downloaded here. Note that there are many words in common among the 7 core word vectors. It will be an interesting process to refine the columns, or choose different "core" keywords, etc. |
|
I have created a form (send a request to access it) allowing the user to enter from 1 to 3 search terms which creates a database query returning all articles containing each of the 1 to 3 terms (using an AND search, not an OR search to keep the results small). I then "dot" the abstracts for each of the returned articles with the seven "core word vectors" to measure the extent to which the article falls into the 7 "core" keyword categories. I then render the resulting objects in 3d space where the xyz axes represent one of two possible "topic spaces". For the first topic space, the x axis represents the "physics-like" properties of the article, the y axis measures the "chemistry-like" nature of the article, and the z axis measures the "biology-like" nature of the article. For the second topic space, the x axis represents the "engineering-like" properties of the article, the y axis measures the "military-like" nature of the article, and the z axis measures the "social-like" nature of the article. I could have used any 3 of the 7 "core" keywords for the xyz axes, I do plan to explore these, and add that as an option to the form.
|
|
Word and Character Count Visualizations