-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move hardware information into directory structure and add scripting to autopopulate #271
base: master
Are you sure you want to change the base?
Conversation
I am going to hold off on tests and docs until this is reviewed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Darin755 thanks for working on this.
I thought we discussed that something useful would be to build a way to parse the OpenWrt source code (a specific branch), extract definitions and merge those definition which we already stored here.
But I don't see here, I see the code has been moved around, but I don't understand the point of this work.
Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once. I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.
I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.
Thanks for the explanation. I understand this could be a preparation for what comes next.
I need to insist against storing JSON in this repository.
We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.
Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.
Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let the black
formatting tool reformat it.
Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.
Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.
Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, running openwisp-qa-format
should fix most issues automatically.
I understand. However, trying to store data in python is simply not a great option. This is due to both to the reliability and security issues. My concern is that python scripts have the ability to run arbitrary code. This means someone could put a backdoor that is very hard to detect. I am sure you are verifying merges but from my perspective it is risky. The hardware.py before was mostly fine as it was a relatively small file written by humans. The supported devices were just a fancy constant in the program. However, I am looking to automate device addition. When dealing with code it is better to hand review what it is doing to make sure no threats are introduced. However, it is hard to hand review potentially hundreds (if not thousands) of lines of just data. I think there would be a significant attack surface with many points to add a backdoor. If it is a matter of updating the testing that is fine but I think it is time to look into a better way to store constant data. In all reality this change didn't take me very long to do as I automated a lot of it so I am fine accepting that it will not be accepted. With that being said if there is a secure way to import data from a python file that I missed let me know. Everything I know about secure code tells me that I need to separate data and code. I am sorry for being a pain but I would rather build secure systems if I can. It is entirely possible that I am wrong and if so feel free to correct me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your response. The point is to make it easier to build automated generation scripts in the future. I can write a script to get add devices with a particular chipset instead of trying to handle every device at once.
I evaluated my options for data storage and I found that json is the best option. I looked into doing raw python but that is hard to generate. I also have concerns with mixing code and data as there is a lot to go wrong. With JSON there is never a risk of bad code as it is pure data. if a JSON is corrupt it is skipped and with the separate files the impact of a bad file is minimized.Thanks for the explanation. I understand this could be a preparation for what comes next.
I need to insist against storing JSON in this repository.
We have automated QA check tools like flake8, isort and black in place to check syntax and formatting of python files, read more about this in Python code conventions.
Moreover, it would be inconsistent with other Django modules we maintain, and consistency in maintaining a project like OpenWISP is important.
Converting JSON to a python data structure is pretty easy, just print the representation of the data structure to a file and let theblack
formatting tool reformat it. Syntax or code errors are a non issue here, we have a lot of automated checks that spot those and fail before any of those are pushed to the main branches.
Storing the data in Python files also avoids the unnecessary computation to parse the JSON files which would happen every time the application starts, either with the application HTTP server, the websocket server or when a celery autoscaling worker is spawned, which in real world applications happens often due to the dynamic autoscaling settings, which increase the amount of workers when the application is under pressure.
Please check the build failures, solving QA check failures is explained in the section I linked above, please read it carefully. In short, runningopenwisp-qa-format
should fix most issues automatically.I understand. However, trying to store data in python is simply not a great option. This is due to both to the reliability and security issues. My concern is that python scripts have the ability to run arbitrary code. This means someone could put a backdoor that is very hard to detect. I am sure you are verifying merges but from my perspective it is risky. The hardware.py before was mostly fine as it was a relatively small file written by humans. The supported devices were just a fancy constant in the program. However, I am looking to automate device addition. When dealing with code it is better to hand review what it is doing to make sure no threats are introduced. However, it is hard to hand review potentially hundreds (if not thousands) of lines of just data. I think there would be a significant attack surface with many points to add a backdoor. If it is a matter of updating the testing that is fine but I think it is time to look into a better way to store constant data. In all reality this change didn't take me very long to do as I automated a lot of it so I am fine accepting that it will not be accepted.
With that being said if there is a secure way to import data from a python file that I missed let me know. Everything I know about secure code tells me that I need to separate data and code. I am sorry for being a pain but I would rather build secure systems if I can. It is entirely possible that I am wrong and if so feel free to correct me.
I appreciate your concerns. We have security scans in place and we never merge code without checking and testing it.
Moreover, if we are worried that could happen, it would be pretty easy to write a script that checks that these files contain only data and nothing else, we can than add this check to the CI build.
I'll look into it and get back to you |
So I can not find a way to do this without using Would it be possible to treat JSON files as python files for formatting and testing? JSON should be able to pass the formatting check with the proper indent set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Darin755 try this in the ipython shell:
data = {} # data here
with open('test_data.py', 'w') as file_:
file_.write(f'data = {str(data)}')
from test_data import data
isinstance(data, dict)
print(data)
I got it working with python files. It should be ready for review |
Could I get a maintainer to take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience @Darin755, we're currently busy preparing the release so our response may be slow.
I recommend simplifying the directory structure like:
hardware/openwrt/<target>.py
The hardware
dir would replace hardware.py
. The subdir in hardware
indicates the system, now we work mostly with openwrt but it's possible to have other OSes and at some point we'll be headed there.
We can then have just 1 file for target, to avoid having to constantly change this because OpenWrt does change and rename targets often, but we need to have something as stable as possible, we can't be renaming files and moving them around too much.
There's been additions of a few more models on master, so please double check that.
@@ -0,0 +1,2 @@ | |||
def returnData(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The camelCase
style is not a thing in python projects, not important though here, because I think we should get rid of the function and just leave something like data = {....}
.
The data structure should be indented (should be doable using openwisp-qa-format
, make sure to use the latest version available in the openwisp-utils repo).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will look into it
Checklist
Reference to Existing Issue
Closes #272
Related to #264
Description of Changes
The original ordered dictionary was moved into a file called populatehardware.py. populatehardware.py simply creates the proper directories under targets from the first part of the file name (platform-subplatform) and then it writes all devices matching the first part of the name to devices.json.
After that was completed hardware.py was updated to iterate though the directory structure to pull all the device.json files. It is important to note that hardware.py now imports listdir from os and json. If it fails to load a json for whatever reason it prints a error and moves on.