Skip to content

Commit 4abba67

Browse files
committed
fix: move project over to veracity repo
1 parent 02b6956 commit 4abba67

File tree

12 files changed

+1446
-1
lines changed

12 files changed

+1446
-1
lines changed

.gitignore

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
target/
76+
77+
# Jupyter Notebook
78+
.ipynb_checkpoints
79+
80+
# IPython
81+
profile_default/
82+
ipython_config.py
83+
84+
# pyenv
85+
.python-version
86+
87+
# pipenv
88+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91+
# install all needed dependencies.
92+
#Pipfile.lock
93+
94+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95+
__pypackages__/
96+
97+
# Celery stuff
98+
celerybeat-schedule
99+
celerybeat.pid
100+
101+
# SageMath parsed files
102+
*.sage.py
103+
104+
# Environments
105+
.env
106+
.venv
107+
env/
108+
venv/
109+
ENV/
110+
env.bak/
111+
venv.bak/
112+
113+
# Spyder project settings
114+
.spyderproject
115+
.spyproject
116+
117+
# Rope project settings
118+
.ropeproject
119+
120+
# mkdocs documentation
121+
/site
122+
123+
# mypy
124+
.mypy_cache/
125+
.dmypy.json
126+
dmypy.json
127+
128+
# Pyre type checker
129+
.pyre/
130+
131+
/.vscode

README.md

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,57 @@
1-
# KnowledgeGraphGenerator-
1+
# knowledge Graph Generator
2+
3+
Knowledge Graph Generator - A way to create a node graph to visualize connections between data. The program is written in python 3.9.2 and mainly uses pandas for data management. It has a simple UI interface (currently with some bugs) that lets you define data, nodes, and edges and also generate the files. The program itself does not visualize the data, but generate the necessary files to use a visulization tool like [Gephi](https://gephi.org/).
4+
5+
## Running the program
6+
7+
You have two options to run the program. Either run the executable in releases, or with python.
8+
9+
### Running with python
10+
11+
First you need to install the required libraries. To do this, run `python -m pip install -r requirements.txt`. After that you can run the program with `python main.py`.
12+
13+
## How it works
14+
15+
The program is sectioned into 5 sections:
16+
* [Data](#data)
17+
* [Data settings](#data-settings)
18+
* [Node settings](#node-settings)
19+
* [Edge settings](#edge-settings)
20+
* [Generate Graph](#generate-graph)
21+
22+
### Data
23+
24+
In this section it lets you define what files you want to import. It will let you import multiple files, but in the current version it only uses the first one. If a file is removed from the data section, it will not remove all the connected nodes and edges to that file in the current version.
25+
26+
So a rule of thumb: Only use 1 file (as of the beta version)
27+
28+
### Data settings
29+
30+
In this section it lets you define what the column types are. To sets the data columns, just click the file in the datapane on the left. The settings menu is a little buggy, so you might need to click the datafile a few times before the column settings want to behave properly. There are only a couple of options as of right now: *integer*, *float*, *string*, and *boolean*. All columns are read as strings when the program launches, so make sure to change the column types before you get data from it if you want to do more data processing on it later.
31+
32+
### Node settings
33+
34+
In this section it lets you define what nodes you want. Select the datafile you want in the pane on the left, and the nodes you can add shows up on the bottom. The select node pane is a list of your columns that you can set as nodes. When you check a column of as a node, it appears in the node pane on the left.
35+
36+
### Edge settings
37+
38+
In this section it lets you define edges from the nodes you have added. This menu is also a little buggy, so you might need to click a node a few times before the menu behaves properly. In the menu under the node pane to the left, and edge pane to the right, you see your selected node on the bottom, and a box to select other nodes on the right. Under you can select if you want the edge to be directional or not.
39+
40+
### Generate Graph
41+
42+
In this section you define the output files for the program. Here you set the path for the nodeFile and edgeFile. A warning about setting the node and edge file is that it will reset the file to 0 bytes, so be sure not to overwrite any files you want. To generate the files after setting the path for the node and edge path, just hit the *Generate Graph* button. This make take some time depending on the size of the data and the amount of nodes and edges, but usually finishes in under 15 seconds.
43+
44+
## Known Bugs
45+
46+
These are the known bugs:
47+
* Settings menues not acting correctly before they have been clicked a few times.
48+
* Edges are only removed visually, but are not actually removed.
49+
* Nodes are only removed visually, but are not actually removed.
50+
51+
## Plan moving forward
52+
53+
The plan forward is to continue to fix and develop the frontend, and hopefully switch over to [eel](https://github.com/python-eel/Eel) - a python library to use HTML and JS as GUI for apps - and add more options for having metadata in the nodes and edges.
54+
55+
## How to contribute
56+
57+
If you find something you want to change, please feel free to create a pull request with the changes you have created. If you do not have the time to implement the changes yourself, you can add it as an issue such that we can add it to the development plan.

data/data.py

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
import pandas as pd
2+
3+
class Data:
4+
__path: str
5+
__name: str
6+
__type: str
7+
__df: pd.DataFrame
8+
__loaded: bool
9+
__error: bool
10+
11+
def __init__(self, path: str) -> None:
12+
self.__path = path
13+
self.__name = path.split('/')[-1].split(".")[0]
14+
self.__type = path.split('/')[-1].split(".")[1]
15+
self.__loaded = False
16+
self.__error = False
17+
pass
18+
19+
@property
20+
def path(self) -> str:
21+
return self.__path
22+
23+
@property
24+
def name(self) -> str:
25+
return self.__name
26+
27+
@property
28+
def type(self) -> str:
29+
return self.__type
30+
31+
@property
32+
def df(self) -> pd.DataFrame:
33+
return self.__df
34+
35+
@property
36+
def loaded(self) -> bool:
37+
return self.__loaded
38+
39+
@property
40+
def error(self) -> bool:
41+
return self.__error
42+
43+
def __str__(self) -> str:
44+
return f"path: {self.__path}, name: {self.__name}, type: {self.__type}"
45+
46+
def __repr__(self) -> str:
47+
return self.__str__()
48+
49+
def __eq__(self, __o: object) -> bool:
50+
if not isinstance(__o, Data): return False
51+
if (self.__path != __o.path): return False
52+
return True
53+
54+
def __hash__(self) -> int:
55+
return hash(self.__path)
56+
57+
def loadData(self) -> None:
58+
self.__error = False
59+
if self.__type == "csv":
60+
try:
61+
self.__df = pd.read_csv(self.__path, delimiter=';', decimal=',', dtype='string')
62+
if (self.__df.shape[1] == 1):
63+
self.__df = pd.read_csv(self.__path, delimiter=',', decimal='.', dtype='string')
64+
except Exception as e:
65+
print(e)
66+
print("could not load")
67+
self.__error = True
68+
elif self.__type == "xlsx":
69+
self.__df = pd.read_excel(self.__path)
70+
pass
71+
elif self.__type == "json":
72+
self.__df = pd.read_json(self.__path)
73+
return

data/dataManager.py

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
from typing import Set, Union
2+
from data.data import Data
3+
from data.edge import Edge
4+
from data.node import Node
5+
from data.edgeDef import EdgeDef
6+
from data.nodeDef import NodeDef
7+
8+
import pandas as pd
9+
10+
class DataManager:
11+
__nodeDefs: Set[NodeDef]
12+
__edgeDefs: Set[EdgeDef]
13+
__nodes: Set[Node]
14+
__edges: Set[Edge]
15+
__data: Set[Data]
16+
17+
def __init__(self) -> None:
18+
self.__nodeDefs = set()
19+
self.__edgeDefs = set()
20+
self.__nodes = set()
21+
self.__edges = set()
22+
self.__data = set()
23+
return
24+
25+
@property
26+
def data(self) -> Set[Data]:
27+
return self.__data
28+
29+
@property
30+
def nodeDefs(self) -> Set[NodeDef]:
31+
return self.__nodeDefs
32+
33+
@property
34+
def edgeDefs(self) -> Set[EdgeDef]:
35+
return self.__edgeDefs
36+
37+
def addData(self, data: Data) -> None:
38+
self.__data.add(data)
39+
return
40+
41+
def findData(self, path: str, name: str, type: str) -> Union[Data, None]:
42+
for d in self.__data:
43+
if d.name == name and d.path == path and d.type == type:
44+
return d
45+
46+
def removeData(self, data: Data) -> None:
47+
self.__data.remove(data)
48+
return
49+
50+
def addNodeDef(self, d: NodeDef) -> None:
51+
self.__nodeDefs.add(d)
52+
return
53+
54+
def removeNodeDef(self, d: NodeDef) -> None:
55+
self.__nodeDefs.remove(d)
56+
return
57+
58+
def findNodeDef(self, field: str) -> Union[NodeDef, None]:
59+
for n in self.__nodeDefs:
60+
print(f"field: {n.field}, inField: {field}")
61+
if n.field == field:
62+
return n
63+
return None
64+
65+
def addEdgeDef(self, d: EdgeDef) -> None:
66+
self.__edgeDefs.add(d)
67+
return
68+
69+
def removeEdgeDef(self, d: EdgeDef) -> None:
70+
self.__edgeDefs.remove(d)
71+
return
72+
73+
def generateData(self) -> None:
74+
[n.createNodes(list(self.__data)[0]) for n in self.__nodeDefs]
75+
[e.createEdges(list(self.__data)[0]) for e in self.__edgeDefs]
76+
return
77+
78+
def generateNodeFile(self, path: str) -> None:
79+
[self.__nodes.update(d.nodes) for d in self.__nodeDefs]
80+
df = pd.DataFrame.from_records([n.as_dict for n in self.__nodes])
81+
df.to_csv(path, index=False, sep=';', decimal='.')
82+
return
83+
84+
def generateEdgeFile(self, path: str) -> None:
85+
[self.__edges.update(d.edges) for d in self.__edgeDefs]
86+
df = pd.DataFrame.from_records([e.as_dict for e in self.__edges])
87+
df.to_csv(path, index=False, sep=';', decimal='.')
88+
return

0 commit comments

Comments
 (0)