Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ## The Setup:
- I have MRI data from a group of patients with brain damage (due to stroke, head injury, etc). For each patient,
- I have a binary MRI image (3D array) for each patient, where a value of 0 or 1 indicates the presence or absence
- of damage at a particular voxel (3D pixel). Each image is ~870k voxels.
- ## The Goal:
- Return a list of patients with damage at a given voxel.
- Here's the stupid/inelegant way to do this:
- ```
- voxel_id = 12345 # Index of the voxel I'm interested in
- patient_matches = []
- for patient in patient_list:
- patient_img_file = '/path/to/images/{}.MRI'.format(patient) # Get the path to the image for this patient.
- img = load_image(patient_img_file) # Load the image as a NumPy ndarray.
- img_flat = img.ravel() #Flatten to a 1D
- vox_val = img[voxel_id]
- if vox_val == 1:
- patient_matches.append(patient)
- ```
- ## The Original Plan
- 1. Load and flatten each image, so I have a 1x870k vector for each patient.
- 2. Read these vectors into a DB (`table='damage'`), with 1 row per patient and 1 column per voxel.
- 3. Query `select patient_id from damage where vox12345 = 1`
- ## Other Possibilities
- I quickly realized that it's not reasonable/possible to have almost 1 million columns in a table, so my next thought was to have a separate table for each voxel (which would also reduce the amount of data that is read into memory with each query).
- Another solution would be to read all the images into a single huge 2D table (n_patients x n_voxels) and query it using something like Pandas/xarray/PyTables/HDF.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement