From e02942e42e0ddfc57f0347f418191ee7805ed98f Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Wed, 19 Nov 2025 15:19:57 +0000 Subject: [PATCH 1/9] Working through run-herg-qc --- README.md | 83 +- pcpostprocess/leak_correct.py | 8 +- pcpostprocess/scripts/run_herg_qc.py | 757 +++++++++--------- .../scripts/summarise_herg_export.py | 3 - 4 files changed, 428 insertions(+), 423 deletions(-) diff --git a/README.md b/README.md index 1106afd..62f4d7d 100644 --- a/README.md +++ b/README.md @@ -46,42 +46,71 @@ python3 -m unittest ## Usage -### Running QC and post-processing +### Classic case -Quality control (QC) may be run using the criteria outlined in [Rapid Characterization of hERG Channel Kinetics I](https://doi.org/10.1016/j.bpj.2019.07.029) and [Evaluating the predictive accuracy of ion channel models using data from multiple experimental designs](https://doi.org/10.1101/2024.08.16.608289). These criteria assume the use of the `staircase` protocol for quality control, which should be the first and last protocol performed. We also assume the presence of repeats after the addition of an IKr blocker (such as dofetilide). +- Syncropatch 384 ("DataControl 384" output) +- A full repeat of all protocols with an IKr blocker, e.g. E-4031 +- A staircase protocol, run 4 times: + - Before and after all other protocols, before drug block + - Before and after all other protocols, after drug block +- Additional protocols ran in between the staircases +- Leak subtraction by... -Prior to performing QC and exporting, an `export_config.py` file should be added to the root of the data directory. This file (see `example_config.py`) contains a Python `dict` (`Q2S_DC`) specifying the filenames of the protocols used for QC, and names they should be outputted with, as well as a Python `dict` (`D2S`) listing the other protocols and names to be used for their output. Additionally, the `saveID` field specifies the name of the expeirment which appears in the output file names. +Quality control (QC) may be run using the criteria outlined in [Rapid Characterization of hERG Channel Kinetics I](https://doi.org/10.1016/j.bpj.2019.07.029) and [Evaluating the predictive accuracy of ion channel models using data from multiple experimental designs](https://doi.org/10.1101/2024.08.16.608289). +These criteria assume the use of the `staircase` protocol for quality control, which should be the first and last protocol performed. +We also assume the presence of repeats after the addition of an IKr blocker (such as dofetilide). + +Prior to performing QC and exporting, an `export_config.py` file should be added to the root of the data directory. +This file (see `example_config.py`) contains a Python `dict` (`Q2S_DC`) specifying the filenames of the protocols used for QC, and names they should be outputted with, as well as a Python dict (`D2S`) listing the other protocols and names to be used for their output. +Additionally, the `saveID` field specifies the name of the expeirment which appears in the output file names. + +An example input directory might contain the following subdirectories +- `staircaseramp (2)_2kHz_15.01.07`, staircase 1 before E-4031 +- `StaircaseInStaircaseramp (2)_2kHz_15.01.51`, non-QC protocol +- `staircaseramp (2)_2kHz_15.06.53`, staircase 2 before E-4031 +- `staircaseramp (2)_2kHz_15.11.33`, staircase 1 after E-4031 +- `StaircaseInStaircaseramp (2)_2kHz_15.12.17`, non-QC protocol +- `staircaseramp (2)_2kHz_15.17.19`, staircase 2 after E-4031 + +`run_herg_qc` will use the time parts of the staircase names to work out which is which. +This assumes there are either 2 staircase runs (once before drug, once after), or 4 (as above) + +Example: +``` +# Configuration for run_herg_qc + +# Save name for this set of data +saveID = 'EXPERIMENT_NAME' + +# Name of protocols to use in QC, mapped onto names used in output +D2S_QC = { + 'staircaseramp (2)_2kHz': 'staircaseramp' +} + +# Name of additional protocols to export, but not use in QC (again as a dict) +D2S = { + 'StaircaseInStaircaseramp (2)_2kHz': 'sis', +} +``` + +Next, we call `run_herg_qc`: ```sh -$ pcpostprocess run_herg_qc --help +pcpostprocess run_herg_qc test_data/13112023_MW2_FF -w A01 A02 A03 -o output --output_traces -usage: pcpostprocess run_herg_qc [-h] [-c NO_CPUS] - [--output_dir OUTPUT_DIR] [-w WELLS [WELLS ...]] - [--protocols PROTOCOLS [PROTOCOLS ...]] - [--reversal_spread_threshold REVERSAL_SPREAD_THRESHOLD] [--export_failed] - [--selection_file SELECTION_FILE] [--subtracted_only] - [--figsize FIGSIZE FIGSIZE] - [--debug] [--log_level LOG_LEVEL] [--Erev EREV] - data_directory +``` -positional arguments: - data_directory -options: - -h, --help show this help message and exit - -c NO_CPUS, --no_cpus NO_CPUS Number of workers to spawn in the multiprocessing pool (default: 1) - --output_dir OUTPUT_DIR path where the output will be saved - -w WELLS [WELLS ...], --wells WELLS [WELLS ...] wells to include (default: all) - --protocols PROTOCOLS [PROTOCOLS ...] protocols to include (default: all) - --reversal_spread_threshold REVERSAL_SPREAD_THRESHOLD The maximum spread in reversal potential (across sweeps) allowed for QC - --export_failed Flag specifying whether to produce full output for those wells failing QC (default: false) - --selection_file SELECTION_FILE File listing wells to be included - --figsize FIGSIZE FIGSIZE - --debug - --log_level LOG_LEVEL - --Erev EREV The reversal potential during the experiment + +```sh +$ pcpostprocess run_herg_qc --help ``` + + + + + ### Exporting Summary The `summarise_herg_export` command produces additionally output after `run_herg_qc` has been run. diff --git a/pcpostprocess/leak_correct.py b/pcpostprocess/leak_correct.py index c423883..7871cce 100644 --- a/pcpostprocess/leak_correct.py +++ b/pcpostprocess/leak_correct.py @@ -104,26 +104,26 @@ def fit_linear_leak(current, voltage, times, ramp_start_index, ramp_end_index, time_range = (0, times.max() / 5) #  Current vs time - ax1.set_title(r'\textbf{a}', loc='left') + ax1.set_title('a', loc='left') ax1.set_xlabel(r'$t$ (ms)') ax1.set_ylabel(r'$I_\mathrm{obs}$ (pA)') ax1.set_xticklabels([]) ax1.set_xlim(*time_range) # Voltage vs time - ax2.set_title(r'\textbf{b}', loc='left') + ax2.set_title('b', loc='left') ax2.set_xlabel(r'$t$ (ms)') ax2.set_ylabel(r'$V_\mathrm{cmd}$ (mV)') ax2.set_xlim(*time_range) # Current vs voltage - ax3.set_title(r'\textbf{c}', loc='left') + ax3.set_title('c', loc='left') ax3.set_xlabel(r'$V_\mathrm{cmd}$ (mV)') ax3.set_ylabel(r'$I_\mathrm{obs}$ (pA)') ax4.set_xlabel(r'$t$ (ms)') ax4.set_ylabel(r'current (pA)') - ax4.set_title(r'\textbf{d}', loc='left') + ax4.set_title('d', loc='left') start_t = times[ramp_start_index] end_t = times[ramp_end_index] diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index 18c87ae..0d4df65 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -33,11 +33,10 @@ def starmap(n, func, iterable): Like ``multiprocessing.Pool.starmap``, but does not use subprocesses when n=1. """ - if n == 1: - return [func(*args) for args in iterable] - else: # pragma: no cover + if n > 1: # pragma: no cover with multiprocessing.Pool(n, maxtasksperchild=1) as pool: return pool.starmap(func, iterable) + return [func(*args) for args in iterable] def run_from_command_line(): # pragma: no cover @@ -113,7 +112,19 @@ def run(data_path, output_path, qc_map, wells=None, reversal_potential=-90, reversal_spread_threshold=10, max_processes=1, figure_size=None, save_id=None): """ - Imports traces and runs QC. + Imports traces and runs QC+. + + Makes the following assumptions: + + This proceeds with the following steps: + + 1. All wells (or those selected with ``wells``) are run through "plain" QC, + for every protocol in ``qc_map``. + + + + + @param data_path The path to read data from @param output_path The path to write output to @@ -134,7 +145,7 @@ def run(data_path, output_path, qc_map, wells=None, @param figure_size An optional tuple specifying the size of figures to create - @param save_id TODO + @param save_id Used in some outputs, e.g. as part of CSV names """ # TODO reversal_spread_threshold should be specified the same way as all @@ -146,9 +157,6 @@ def run(data_path, output_path, qc_map, wells=None, # TODO Remove protocol selection here: this is done via the export file! # Only protocols listed there are accepted - # TODO: Find some way around setting this? - matplotlib.use('Agg') - # Select wells to use all_wells = [row + str(i).zfill(2) for row in string.ascii_uppercase[:16] for i in range(1, 25)] @@ -178,11 +186,14 @@ def run(data_path, output_path, qc_map, wells=None, r'^([a-z|A-Z|_|0-9| |\-|\(|\)]+)_([0-9][0-9]\.[0-9][0-9]\.[0-9][0-9])$' protocols_regex = re.compile(protocols_regex) - # Gather protocol directories to use in a dictionary k:v, where v is a list - # of times so that k_v[i] is a directory name - # TODO: Replace this by looping over qc_map and write_map + # Gather protocol directories to use in a dictionary + # { protocol_name: [time1, time2, time3, ...] } + # such that protocol_name_time is a directory + + # TODO: Replace this by looping over qc_map and write_map? res_dict = {} for dirname in os.listdir(data_path): + print(dirname, os.path.basename(dirname)) dirname = os.path.basename(dirname) match = protocols_regex.match(dirname) @@ -192,6 +203,7 @@ def run(data_path, output_path, qc_map, wells=None, protocol_name = match.group(1) if not (protocol_name in qc_map or protocol_name in write_map): + print(f'Skipping {protocol_name}') continue time = match.group(2) @@ -201,16 +213,29 @@ def run(data_path, output_path, qc_map, wells=None, res_dict[protocol_name].append(time) - readnames, savenames, times_list = [], [], [] + def pront(*args): + print('********') + for arg in args: + print(arg) + + pront(res_dict) + # At this point, despite its name, res_dict is not a dictionary of results, + # but a map of QC protocol names onto lists of times (see comment above) + + # + # Prepare arguments to call `run_qc_for_protocol` + # combined_dict = qc_map | write_map # Select QC protocols and times + readnames, savenames, times_list = [], [], [] for protocol in res_dict: if protocol not in qc_map: continue times = sorted(res_dict[protocol]) + pront(times) savename = qc_map[protocol] @@ -231,6 +256,18 @@ def run(data_path, output_path, qc_map, wells=None, times_list.append([times[1], times[3]]) readnames.append(protocol) + else: + raise ValueError('Expecting 2 or 4 repeats of the QC protocol') + + # For two repeats, we now have + # savenames: short user name, one per QC protocol + # times_list: the time part of a dirname, [before, after] + # For four repeats + # savenames: short user name, repeat has _2 added on + # times_list: [before1, after1], [before2, after2] + + pront(readnames, savenames, times_list) + if not readnames: raise ValueError('No compatible protocols specified.') @@ -238,108 +275,128 @@ def run(data_path, output_path, qc_map, wells=None, n = min(max_processes, m) args = zip(readnames, savenames, times_list, [output_path] * m, [data_path] * m, [wells] * m, [write_traces] * m, [save_id] * m) - well_selections, qc_dfs = list(zip(*starmap(n, run_qc_for_protocol, args))) + well_selections, qc_dfs = zip(*starmap(n, run_qc_for_protocol, args)) + + pront(well_selections) + pront(qc_dfs) + + # + # Assuming a single QC protocol. At this point, we have + # + # well_selections: a tuple ``(s1, s2)``, where ``s1`` lists the cells + # passing QC on staircase 1, and ``s2`` lists those + # passing QC on staircase 2. + # qc_dfs: a tuple ``(d1, d2)`` where ``d1`` is a dataframe of staircase 1 + # results. It doesn't need to be a dataframe, really. Just a 2d + # matrix of ``n_cells`` rows, and ``n_crit + 2`` QC criteria. + # in addition to QC criteria, each row starts with the well code + # and ends with the shortened protocol name. The QC crit cells + # contain True of False, reasons for failing are not included. + # qc_df = pd.concat(qc_dfs, ignore_index=True) + pront(qc_df) - # Do QC which requires both repeats - # qc3.bookend check very first and very last staircases are similar - protocol, savename = list(qc_map.items())[0] - times = sorted(res_dict[protocol]) - if len(times) == 4: - qc3_bookend_dict = qc3_bookend( - protocol, savename, times, wells, output_path, data_path, - figure_size, save_id) - else: - qc3_bookend_dict = {well: True for well in qc_df.well.unique()} + # + # At this point, ``qc_df`` is a single dataframe containing the information + # from both qc_dfs. The protocol is indicated with the shortened name, + # where for the second run it has _2 appended + # - qc_df['qc3.bookend'] = [qc3_bookend_dict[well] for well in qc_df.well] + # Combine QC protocls into overall_selection + selection = [set(x) for x in well_selections] + selection = selection[0].intersection(*selection[1:]) - #  qc_df will be updated and saved again, but it's useful to save them here for debugging - # Write qc_df to file - qc_df.to_csv(os.path.join(output_path, 'QC-%s.csv' % save_id)) + # Store "plain QC" selections in "selected" files + fname = os.path.join(output_path, f'selected-{save_id}.txt') + with open(fname, 'w') as f: + f.write('\n'.join(selection)) + for partial, protocol in zip(well_selections, list(savenames)): + fname = os.path.join(output_path, f'selected-{save_id}-{protocol}.txt') + with open(fname, 'w') as f: + f.write('\n'.join(partial)) - # Write data to JSON file - qc_df.to_json(os.path.join(output_path, 'QC-%s.json' % save_id), - orient='records') + # + # Now go over _all_ protocols, including the QC protocols (AGAIN!), and + # call extract_protocol() on them + # + if write_traces: - # Overwrite old files - for protocol in list(qc_map.values()): - fname = os.path.join(output_path, 'selected-%s-%s.txt' % (save_id, protocol)) - with open(fname, 'w') as fout: - pass - - overall_selection = [] - for well in qc_df.well.unique(): - failed = False - for well_selection, protocol in zip(well_selections, list(savenames)): - - logging.debug(f"{well_selection} selected from protocol {protocol}") - fname = os.path.join(output_path, 'selected-%s-%s.txt' % - (save_id, protocol)) - if well not in well_selection: - failed = True - else: - with open(fname, 'a') as fout: - fout.write(well) - fout.write('\n') - - # well in every selection - if not failed: - overall_selection.append(well) - - selectedfile = os.path.join(output_path, 'selected-%s.txt' % save_id) - with open(selectedfile, 'w') as fout: - for well in overall_selection: - fout.write(well) - fout.write('\n') + pront(savenames, readnames, times_list) - # Export all protocols - savenames, readnames, times_list = [], [], [] - for protocol in res_dict: + # Export all protocols + savenames, readnames, times_list = [], [], [] + for protocol in res_dict: - # Sort into chronological order - times = sorted(res_dict[protocol]) - savename = combined_dict[protocol] + # Sort into chronological order + times = sorted(res_dict[protocol]) + savename = combined_dict[protocol] - readnames.append(protocol) + readnames.append(protocol) - if len(times) == 2: - savenames.append(savename) - times_list.append(times) + if len(times) == 2: + savenames.append(savename) + times_list.append(times) - elif len(times) == 4: - savenames.append(savename) - times_list.append(times[::2]) + elif len(times) == 4: + savenames.append(savename) + times_list.append(times[::2]) - # Make seperate savename for protocol repeat - savename = combined_dict[protocol] + '_2' - assert savename not in combined_dict.values() - savenames.append(savename) - times_list.append(times[1::2]) - readnames.append(protocol) + # Make seperate savename for protocol repeat + savename = combined_dict[protocol] + '_2' + assert savename not in combined_dict.values() + savenames.append(savename) + times_list.append(times[1::2]) + readnames.append(protocol) - # TODO Else raise error? + # TODO Else raise error? - wells_to_export = wells if write_failed_traces else overall_selection - logging.info(f"exporting wells {wells}") + pront(savenames, readnames, times_list) - m = len(readnames) - n = min(max_processes, m) - args = zip(readnames, savenames, times_list, [wells_to_export] * m, - [output_path] * m, [data_path] * m, [write_traces] * m, - [figure_size] * m, [reversal_potential] * m, [save_id] * m) - dfs = starmap(n, extract_protocol, args) - - if dfs: - extract_df = pd.concat(dfs, ignore_index=True) - extract_df['selected'] = extract_df['well'].isin(overall_selection) + wells_to_export = wells if write_failed_traces else selection + logging.info(f'exporting wells {wells_to_export}') + m = len(readnames) + n = min(max_processes, m) + args = zip(readnames, savenames, times_list, [wells_to_export] * m, + [output_path] * m, [data_path] * m, [figure_size] * m, + [reversal_potential] * m, [save_id] * m) + dfs = starmap(n, extract_protocol, args) + + pront(len(dfs)) + for df in dfs: + pront(df) + sys.exit(1) + + if dfs: + extract_df = pd.concat(dfs, ignore_index=True) + extract_df['selected'] = extract_df['well'].isin(selection) + else: + logging.error("Didn't export any data") + return + + logging.info(f"extract_df: {extract_df}") + + # + # Do QC3 on first staircase, first sweep VS second staircase, second sweep + # qc3.bookend check very first and very last staircases are similar + # + + protocol, savename = list(qc_map.items())[0] + times = sorted(res_dict[protocol]) + if len(times) == 4: + qc3_bookend_dict = qc3_bookend( + protocol, savename, times, wells, output_path, data_path, + figure_size, save_id) else: - logging.error("Didn't export any data") - return + #TODO: Better indicate that it wasn't run? + qc3_bookend_dict = {well: True for well in qc_df.well.unique()} + qc_df['qc3.bookend'] = [qc3_bookend_dict[well] for well in qc_df.well] + pront(qc_df) - logging.info(f"extract_df: {extract_df}") + # + # + # qc_erev_spread = {} erev_spreads = {} @@ -437,10 +494,10 @@ def run(data_path, output_path, qc_map, wells=None, qc_styled_df.to_latex(os.path.join(output_path, 'qc_table.tex')) # Save in csv format - qc_df.to_csv(os.path.join(output_path, 'QC-%s.csv' % save_id)) + qc_df.to_csv(os.path.join(output_path, f'QC-{save_id}.csv')) # Write data to JSON file - qc_df.to_json(os.path.join(output_path, 'QC-%s.json' % save_id), + qc_df.to_json(os.path.join(output_path, f'QC-{save_id}.json'), orient='records') #  Load only QC vals. TODO use a new variabile name to avoid confusion @@ -528,60 +585,58 @@ def agg_func(x): def extract_protocol(readname, savename, time_strs, selected_wells, savedir, - data_path, write_traces, figure_size, reversal_potential, - save_id): + data_path, figure_size, reversal_potential, save_id): # TODO: Tidy up argument order """ ??? """ + print(f'EXTRACT PROTOCOL {savename}') + #logging.info(f"Exporting {readname} as {savename}") - logging.info(f"extracting {savename}") - + savedir = os.path.join(savedir, '2-extract-protocol') traces_dir = os.path.join(savedir, 'traces') os.makedirs(traces_dir, exist_ok=True) - - row_dict = {} - subtraction_plots_dir = os.path.join(savedir, 'subtraction_plots') os.makedirs(subtraction_plots_dir, exist_ok=True) - logging.info(f"Exporting {readname} as {savename}") + row_dict = {} - filepath_before = os.path.join(data_path, f'{readname}_{time_strs[0]}') - filepath_after = os.path.join(data_path, f'{readname}_{time_strs[1]}') - json_file_before = f"{readname}_{time_strs[0]}" - json_file_after = f"{readname}_{time_strs[1]}" - before_trace = Trace(filepath_before, json_file_before) - after_trace = Trace(filepath_after, json_file_after) + before_trace = Trace( + os.path.join(data_path, f'{readname}_{time_strs[0]}'), + f'{readname}_{time_strs[0]}.json') + after_trace = Trace( + os.path.join(data_path, f'{readname}_{time_strs[1]}'), + f'{readname}_{time_strs[1]}') - voltage_protocol = before_trace.get_voltage_protocol() + # Get times and voltages times = before_trace.get_times() voltages = before_trace.get_voltage() + assert len(times) == len(after_trace.get_times()) + assert np.all(np.abs(times - after_trace.get_times()) < 1.8) + assert np.all(np.abs(voltages - after_trace.get_voltage()) < 1.8) #  Find start of leak section + voltage_protocol = before_trace.get_voltage_protocol() desc = voltage_protocol.get_all_sections() ramp_bounds = detect_ramp_bounds(times, desc) - nsweeps_before = before_trace.NofSweeps = 2 - nsweeps_after = after_trace.NofSweeps = 2 - - assert nsweeps_before == nsweeps_after - - # Time points - times_before = before_trace.get_times() - times_after = after_trace.get_times() + nsweeps = before_trace.NofSweeps + assert nsweeps == after_trace.NofSweeps - try: - assert all(np.abs(times_before - times_after) < 1e-8) - except Exception as exc: - logging.warning(f"Exception thrown when handling {savename}: ", str(exc)) - return - - header = "\"current\"" + # Store voltages and times, using 2 different libraries... + voltage_df = pd.DataFrame( + np.vstack((times.flatten(), voltages.flatten())).T, + columns=['time', 'voltage']) + voltage_df.to_csv(os.path.join( + traces_dir, f"{save_id}-{savename}-voltages.csv")) + #np.savetxt(os.path.join(traces_dir, f"{save_id}-{savename}-times.csv"), + # times) qc_before = before_trace.get_onboard_QC_values() qc_after = after_trace.get_onboard_QC_values() - qc_vals_all = before_trace.get_onboard_QC_values() + + before_data = before_trace.get_trace_sweeps() + after_data = after_trace.get_trace_sweeps() for i_well, well in enumerate(selected_wells): # Go through all wells if i_well % 24 == 0: @@ -590,45 +645,16 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, if None in qc_before[well] or None in qc_after[well]: continue - if write_traces: - # Save 'before drug' trace as .csv - for sweep in range(nsweeps_before): - out = before_trace.get_trace_sweeps([sweep])[well][0] - save_fname = os.path.join(traces_dir, f"{save_id}-{savename}-" - f"{well}-before-sweep{sweep}.csv") - - np.savetxt(save_fname, out, delimiter=',', - header=header) - - if write_traces: - # Save 'after drug' trace as .csv - for sweep in range(nsweeps_after): - save_fname = os.path.join(traces_dir, f"{save_id}-{savename}-" - f"{well}-after-sweep{sweep}.csv") - out = after_trace.get_trace_sweeps([sweep])[well][0] - if len(out) > 0: - np.savetxt(save_fname, out, - delimiter=',', comments='', header=header) - - voltage_before = before_trace.get_voltage() - voltage_after = after_trace.get_voltage() - - assert len(voltage_before) == len(voltage_after) - assert len(voltage_before) == len(times_before) - assert len(voltage_after) == len(times_after) - voltage = voltage_before - - voltage_df = pd.DataFrame(np.vstack((times_before.flatten(), - voltage.flatten())).T, - columns=['time', 'voltage']) - - if not os.path.exists(os.path.join(traces_dir, - f"{save_id}-{savename}-voltages.csv")): - voltage_df.to_csv(os.path.join(traces_dir, - f"{save_id}-{savename}-voltages.csv")) - - np.savetxt(os.path.join(traces_dir, f"{save_id}-{savename}-times.csv"), - times_before) + # Save before and after drug traces as .csv + for sweep in range(nsweeps): + save_fname = os.path.join( + traces_dir, f'{save_id}-{savename}-{well}-before-sweep{sweep}.csv') + np.savetxt(save_fname, before_data[well][sweep], delimiter=',', + header='"current"') + save_fname = os.path.join( + traces_dir, f'{save_id}-{savename}-{well}-after-sweep{sweep}.csv') + np.savetxt(save_fname, after_data[well][sweep], delimiter=',', + header='"current"') # plot subtraction fig = plt.figure(figsize=figure_size, layout='constrained') @@ -648,21 +674,21 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, os.makedirs(out1, exist_ok=True) os.makedirs(out2, exist_ok=True) + + hergqc = hERGQC(voltages, before_trace.sampling_rate) for well in selected_wells: - before_current = before_trace.get_trace_sweeps()[well] - after_current = after_trace.get_trace_sweeps()[well] before_leak_currents = [] after_leak_currents = [] - for sweep in range(before_current.shape[0]): + for sweep in range(nsweeps): row_dict = { 'well': well, 'sweep': sweep, 'protocol': savename } - qc_vals = qc_vals_all[well][sweep] + qc_vals = qc_before[well][sweep] if qc_vals is None: continue if len(qc_vals) == 0: @@ -673,7 +699,7 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, row_dict['Rseries'] = qc_vals[2] before_params, before_leak = fit_linear_leak( - before_current[sweep], voltages, times, *ramp_bounds, + before_data[well][sweep], voltages, times, *ramp_bounds, save_fname=os.path.join(out1, f'{well}_sweep{sweep}.png')) before_leak_currents.append(before_leak) @@ -682,7 +708,7 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, row_dict['E_leak_before'] = -before_params[0] / before_params[1] after_params, after_leak = fit_linear_leak( - after_current[sweep, :], voltages, times, *ramp_bounds, + after_data[well][sweep], voltages, times, *ramp_bounds, save_fname=os.path.join(out2, f'{well}_sweep{sweep}.png')) after_leak_currents.append(after_leak) @@ -691,10 +717,10 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, row_dict['gleak_after'] = after_params[1] row_dict['E_leak_after'] = -after_params[0] / after_params[1] - subtracted_trace = before_current[sweep, :] - before_leak\ - - (after_current[sweep, :] - after_leak) - after_corrected = after_current[sweep, :] - after_leak - before_corrected = before_current[sweep, :] - before_leak + subtracted_trace = before_data[well][sweep] - before_leak\ + - (after_data[well][sweep] - after_leak) + after_corrected = after_data[well][sweep] - after_leak + before_corrected = before_data[well][sweep] - before_leak E_rev_before = infer_reversal_potential( before_corrected, times, desc, voltages, @@ -728,7 +754,7 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, # Check QC6 for each protocol (not just the staircase) - hergqc = hERGQC(voltage, before_trace.sampling_rate) + times = before_trace.get_times() voltage = before_trace.get_voltage() @@ -758,11 +784,10 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, row_dict['QC4'] = all([x for x, _ in qc4]) - if write_traces: - out_fname = os.path.join(traces_dir, - f"{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv") + out_fname = os.path.join(traces_dir, + f"{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv") + np.savetxt(out_fname, subtracted_trace.flatten()) - np.savetxt(out_fname, subtracted_trace.flatten()) rows.append(row_dict) param, leak = fit_linear_leak(current, voltage, times, @@ -794,12 +819,9 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, times = before_trace.get_times() voltages = before_trace.get_voltage() - before_current_all = before_trace.get_trace_sweeps() - after_current_all = after_trace.get_trace_sweeps() - for well in selected_wells: - before_current = before_current_all[well] - after_current = after_current_all[well] + before_current = before_data[well] + after_current = after_data[well] before_leak_currents = before_leak_current_dict[well] after_leak_currents = after_leak_current_dict[well] @@ -821,181 +843,140 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, plt.close(fig) - protocol_dir = os.path.join(traces_dir, 'protocols') - os.makedirs(protocol_dir, exist_ok=True) - - # extract protocol - protocol = before_trace.get_voltage_protocol() - protocol.export_txt(os.path.join(protocol_dir, - f'{save_id}-{savename}.txt')) - - json_protocol = before_trace.get_voltage_protocol_json() - - with open(os.path.join(protocol_dir, f'{save_id}-{savename}.json'), 'w') as fout: - json.dump(json_protocol, fout) - return extract_df def run_qc_for_protocol(readname, savename, time_strs, output_path, data_path, wells, write_traces, save_id): """ - ??? + Runs a QC procedure for a single protocol, on the selected wells. + + Assumes: + - time_strs has length 2, corresponding to a before- and after-drug trace + - each recording has 2 sweeps + + The following procedure is followed: + - Traces are leak corrected with `fit_linear_leak` + - Traces are drug subtracted + - No capacitative spike filtering + + Also: + - Creates a directory ``output_path/1-qc`` + - Writes leak subtracted traces at ``1-qc/save_id-...csv`` + - Makes leak correction plots at ``output_path/1-qc/leak_correction``. + These are the plots made by `fit_linear_leak()`. + + @param readname The protocol name, without the time part + @param savename The shorter name for the protocol + @param time_strs The time part of the protocol names, must have length 2 + @param output_path The root directory to chuck everything in + @param data_path The path to read data from + @param wells The wells to process + @param write_traces True if traces should be written too + @param save_id + + Returns a tuple `(selected_wells, detailed_qc)`` where ``selected_wells`` + is a list containing all selected well names, and ``detailed_qc`` is a + pandas dataframe containing the pass/fail status of all individual QC + criteria, for all wells. """ - # TODO Tidy up argument order - - df_rows = [] + print(f'RUN HERG QC FOR {readname}, {time_strs}, {savename}') + # TODO Tidy up argument order assert len(time_strs) == 2 - filepath_before = os.path.join(data_path, f'{readname}_{time_strs[0]}') - json_file_before = f"{readname}_{time_strs[0]}" - - filepath_after = os.path.join(data_path, f'{readname}_{time_strs[1]}') - json_file_after = f"{readname}_{time_strs[1]}" - - logging.debug(f"loading {json_file_after} and {json_file_before}") + before_trace = Trace( + os.path.join(data_path, f'{readname}_{time_strs[0]}'), + f'{readname}_{time_strs[0]}.json') + after_trace = Trace( + os.path.join(data_path, f'{readname}_{time_strs[1]}'), + f'{readname}_{time_strs[1]}.json') - before_trace = Trace(filepath_before, json_file_before) - after_trace = Trace(filepath_after, json_file_after) - assert before_trace.sampling_rate == after_trace.sampling_rate - - # Convert to s + # Assert that protocols are exactly the same sampling_rate = before_trace.sampling_rate - - savedir = os.path.join(output_path, 'QC') - plot_dir = os.path.join(savedir, 'leak_correction') + assert sampling_rate == after_trace.sampling_rate + voltage = before_trace.get_voltage() + assert np.all(voltage == after_trace.get_voltage()) + hergqc = hERGQC(voltage, sampling_rate) + + # Store stuff to "QC/leak_correction" + save_dir = os.path.join(output_path, '1-qc') + plot_dir = os.path.join(save_dir, 'leak_correction') os.makedirs(plot_dir, exist_ok=True) - before_voltage = before_trace.get_voltage() - after_voltage = after_trace.get_voltage() - - # Assert that protocols are exactly the same - assert np.all(before_voltage == after_voltage) - - voltage = before_voltage - + # Assume two sweeps! sweeps = [0, 1] - raw_before_all = before_trace.get_trace_sweeps(sweeps) - raw_after_all = after_trace.get_trace_sweeps(sweeps) - - selected_wells = [] - - hergqc = hERGQC(before_voltage, sampling_rate) + nsweeps = len(sweeps) + raw_before = before_trace.get_trace_sweeps(sweeps) + raw_after = after_trace.get_trace_sweeps(sweeps) qc_before = before_trace.get_onboard_QC_values() qc_after = after_trace.get_onboard_QC_values() + # Get ramp bounds + times = before_trace.get_times() + nsamples = len(times) + v_protocol = VoltageProtocol.from_voltage_trace(voltage, times) + v_sections = v_protocol.get_all_sections() + ramp_bounds = detect_ramp_bounds(times, v_sections) + + df_rows = [] + selected_wells = [] for well in wells: # Check if any cell first! if (None in qc_before[well][0]) or (None in qc_after[well][0]): - no_cell = True continue - else: - no_cell = False - - nsweeps = before_trace.NofSweeps - assert after_trace.NofSweeps == nsweeps - before_currents_corrected = np.empty((nsweeps, before_trace.NofSamples)) - after_currents_corrected = np.empty((nsweeps, after_trace.NofSamples)) - - before_currents = np.empty((nsweeps, before_trace.NofSamples)) - after_currents = np.empty((nsweeps, after_trace.NofSamples)) - - # Get ramp times from protocol description - voltage_protocol = VoltageProtocol.from_voltage_trace( - voltage, before_trace.get_times()) - - #  Find start of leak section - desc = voltage_protocol.get_all_sections() - ramp_locs = np.argwhere(desc[:, 2] != desc[:, 3]).flatten() - tstart = desc[ramp_locs[0], 0] - tend = voltage_protocol.get_ramps()[0][1] - - times = before_trace.get_times() - - ramp_bounds = [np.argmax(times > tstart), np.argmax(times > tend)] - - assert after_trace.NofSamples == before_trace.NofSamples - - for sweep in range(nsweeps): - before_raw = np.array(raw_before_all[well])[sweep, :] - after_raw = np.array(raw_after_all[well])[sweep, :] - - before_params1, before_leak = fit_linear_leak( - before_raw, voltage, times, *ramp_bounds, + corrected_before = np.empty((nsweeps, nsamples)) + corrected_after = np.empty((nsweeps, nsamples)) + for isweep in range(nsweeps): + # Get corrected traces, and make plot + _, before_leak = fit_linear_leak( + raw_before[well][isweep], voltage, times, *ramp_bounds, save_fname=os.path.join( - plot_dir, f'{well}-sweep{sweep}-before.png')) - - after_params1, after_leak = fit_linear_leak( - after_raw, voltage, times, *ramp_bounds, + plot_dir, f'{savename}-{well}-{isweep}-before.png')) + _, after_leak = fit_linear_leak( + raw_after[well][isweep], voltage, times, *ramp_bounds, save_fname=os.path.join( - plot_dir, f'{well}-sweep{sweep}-after.png')) - - before_currents_corrected[sweep, :] = before_raw - before_leak - after_currents_corrected[sweep, :] = after_raw - after_leak + plot_dir, f'{savename}-{well}-{isweep}-after.png')) - before_currents[sweep, :] = before_raw - after_currents[sweep, :] = after_raw - - logging.info(f"{well} {savename}\n----------") - logging.info(f"sampling_rate is {sampling_rate}") - - voltage_steps = [tend - for tstart, tend, vstart, vend in - voltage_protocol.get_all_sections() if vend == vstart] + corrected_before[isweep] = raw_before[well][isweep] - before_leak + corrected_after[isweep] = raw_after[well][isweep] - after_leak # Run QC with raw currents QC = hergqc.run_qc( - voltage_steps, - times, - before_currents_corrected, - after_currents_corrected, - np.array(qc_before[well])[0, :], - np.array(qc_after[well])[0, :], - nsweeps, - ) + [v0 for t0, t1, v0, v1 in v_sections if v1 == v0], + times, corrected_before, corrected_after, + qc_before[well][0], qc_after[well][0], nsweeps) df_rows.append([well] + QC.passed_list()) - selected = QC.all_passed() and not no_cell - if selected: + if QC.all_passed(): selected_wells.append(well) # Save subtracted current in csv file - header = "\"current\"" - - for i in range(nsweeps): - - savepath = os.path.join(savedir, - f'{save_id}-{savename}-{well}-sweep{i}.csv') - subtracted_current = before_currents_corrected[i, :] - after_currents_corrected[i, :] - - if write_traces: - os.makedirs(savedir, exist_ok=True) - - np.savetxt(savepath, subtracted_current, delimiter=',', - comments='', header=header) - + if write_traces: + header = '"current"' + for i in range(nsweeps): + savepath = os.path.join( + save_dir, f'{save_id}-{savename}-{well}-sweep{i}.csv') + subtracted = corrected_before[i] - corrected_after[i] + np.savetxt(savepath, subtracted, delimiter=',', header=header) + + # TODO: Depends on QC used column_labels = ['well', 'qc1.rseal', 'qc1.cm', 'qc1.rseries', 'qc2.raw', 'qc2.subtracted', 'qc3.raw', 'qc3.E4031', 'qc3.subtracted', 'qc4.rseal', 'qc4.cm', 'qc4.rseries', 'qc5.staircase', 'qc5.1.staircase', 'qc6.subtracted', 'qc6.1.subtracted', 'qc6.2.subtracted'] - df = pd.DataFrame(np.array(df_rows), columns=column_labels) - - missing_wells_dfs = [] - # Add onboard qc to dataframe - for well in wells: - if well not in df['well'].values: - onboard_qc_df = pd.DataFrame([[well] + [False for col in - list(df)[1:]]], - columns=list(df)) - missing_wells_dfs.append(onboard_qc_df) - df = pd.concat([df] + missing_wells_dfs, ignore_index=True) + # Add skipped wells + ncriteria = len(column_labels) - 1 + for well in set(wells) - set([row[0] for row in df_rows]): + df_rows.append([well] + [False] * ncriteria) + # Create data frame and return + df = pd.DataFrame(np.array(df_rows), columns=column_labels) df['protocol'] = savename return selected_wells, df @@ -1003,6 +984,36 @@ def run_qc_for_protocol(readname, savename, time_strs, output_path, def qc3_bookend(readname, savename, time_strs, wells, output_path, data_path, figure_size, save_id): + """ + Joey's "bookend" test, comparing a staircase before other protocols with a + staircase after. + + This method assumes 4 time strings are given, for 4 files with 2 sweeps + each. It loads data, performs leak correction, drug subtraction, and + capacitance filtering, before comparing the first sweep of the first + staircase with the second sweep of the second staircase. + + Also: + - Creates a directory ``output_path/3-qc3-bookend`` + - Creates plots comparing staircase 1, sweep 1, with staircase 2, sweep 2. + + TODO: This method repeats lots of steps, unneccesarily: + - loading all data + - leak correction + - drug subtraction + + @param readname The protocol name, without the time part + @param savename The shorter name for the protocol + @param time_strs The time part of the protocol names, must have length 4 + @param wells The wells to process + @param output_path The root directory to chuck everything in + @param data_path The path to read data from + @param figure_size + + Returns a dictionary mapping well names to booleans. + """ + print('RUN QC3 bookend') + assert len(time_strs) == 4 filepath_first_before = os.path.join(data_path, f'{readname}_{time_strs[0]}') filepath_last_before = os.path.join(data_path, f'{readname}_{time_strs[1]}') @@ -1010,17 +1021,16 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, json_file_last_before = f"{readname}_{time_strs[1]}" #  Each Trace object contains two sweeps - first_before_trace = Trace(filepath_first_before, - json_file_first_before) - last_before_trace = Trace(filepath_last_before, - json_file_last_before) + first_before_trace = Trace(filepath_first_before, json_file_first_before) + last_before_trace = Trace(filepath_last_before, json_file_last_before) times = first_before_trace.get_times() voltage = first_before_trace.get_voltage() voltage_protocol = first_before_trace.get_voltage_protocol() - ramp_bounds = detect_ramp_bounds(times, - voltage_protocol.get_all_sections()) + ramp_bounds = detect_ramp_bounds( + times, voltage_protocol.get_all_sections()) + filepath_first_after = os.path.join(data_path, f'{readname}_{time_strs[2]}') filepath_last_after = os.path.join(data_path, f'{readname}_{time_strs[3]}') json_file_first_after = f"{readname}_{time_strs[2]}" @@ -1043,81 +1053,50 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, last_before_current_dict = last_before_trace.get_trace_sweeps() last_after_current_dict = last_after_trace.get_trace_sweeps() - #  Do leak subtraction and store traces for each well - #  TODO Refactor this code into a single loop. There's no need to store each individual trace. - before_traces_first = {} - before_traces_last = {} - after_traces_first = {} - after_traces_last = {} - first_processed = {} - last_processed = {} - - #  Iterate over all wells - for well in np.array(wells).flatten(): - first_before_current = first_before_current_dict[well][0, :] - first_after_current = first_after_current_dict[well][0, :] - last_before_current = last_before_current_dict[well][-1, :] - last_after_current = last_after_current_dict[well][-1, :] - - save_fname = f"{well}_{savename}_before0.pdf" - - before_traces_first[well] = get_leak_corrected(first_before_current, - voltage, times, - *ramp_bounds) - - before_traces_last[well] = get_leak_corrected(last_before_current, - voltage, times, - *ramp_bounds) - - after_traces_first[well] = get_leak_corrected(first_after_current, - voltage, times, - *ramp_bounds) - after_traces_last[well] = get_leak_corrected(last_after_current, - voltage, times, - *ramp_bounds) - - # Store subtracted traces - first_processed[well] = before_traces_first[well] - after_traces_first[well] - last_processed[well] = before_traces_last[well] - after_traces_last[well] - - voltage_protocol = VoltageProtocol.from_voltage_trace(voltage, times) + # Get steps for capacitance filtering + voltage_steps = [tstart + for tstart, tend, vstart, vend in + voltage_protocol.get_all_sections() if vend == vstart] - plot_dir = os.path.join(output_path, output_path, - f'{save_id}-{savename}-qc3-bookend') - os.makedirs(plot_dir, exist_ok=True) + # Create QC object. Only uses qc3 which doesn't plot, so no plot_dir needed hergqc = hERGQC(sampling_rate=first_before_trace.sampling_rate, - plot_dir=plot_dir, voltage=voltage) - assert first_before_trace.NofSweeps == last_before_trace.NofSweeps - - voltage_steps = [tstart - for tstart, tend, vstart, vend in - voltage_protocol.get_all_sections() if vend == vstart] - res_dict = {} + # Update output path + output_path = os.path.join(output_path, '3-qc3-bookend') + os.makedirs(output_path, exist_ok=True) + # Create figure - will be reused fig = plt.figure(figsize=figure_size) ax = fig.subplots() - for well in wells: - trace1 = hergqc.filter_capacitive_spikes( - first_processed[well], times, voltage_steps - ).flatten() - - trace2 = hergqc.filter_capacitive_spikes( - last_processed[well], times, voltage_steps - ).flatten() - - passed = hergqc.qc3(trace1, trace2)[0] - - res_dict[well] = passed - - save_fname = os.path.join(output_path, 'qc3_bookend') + #  Iterate over all wells, perform qc3-bookend, plot and store + res_dict = {} + for well in np.array(wells).flatten(): + # First staircase, before drug and after drug, first sweep + before_trace_first = get_leak_corrected( + first_before_current_dict[well][0], voltage, times, *ramp_bounds) + after_trace_first = get_leak_corrected( + first_after_current_dict[well][0], voltage, times, *ramp_bounds) + first = before_trace_first - after_trace_first + + # Second staircase, before drug and after drug, second sweep + before_trace_last = get_leak_corrected( + last_before_current_dict[well][-1], voltage, times, *ramp_bounds) + after_trace_last = get_leak_corrected( + last_after_current_dict[well][-1], voltage, times, *ramp_bounds) + last = before_trace_last - after_trace_last + + trace1 = hergqc.filter_capacitive_spikes(first, times, voltage_steps) + trace2 = hergqc.filter_capacitive_spikes(last, times, voltage_steps) + + res_dict[well] = hergqc.qc3(trace1, trace2)[0] + + save_fname = os.path.join(output_path, f'qc3_bookend-{well}.png') + ax.cla() ax.plot(times, trace1) ax.plot(times, trace2) - fig.savefig(save_fname) - ax.cla() plt.close(fig) return res_dict diff --git a/pcpostprocess/scripts/summarise_herg_export.py b/pcpostprocess/scripts/summarise_herg_export.py index 9cbdc0c..d8969d9 100644 --- a/pcpostprocess/scripts/summarise_herg_export.py +++ b/pcpostprocess/scripts/summarise_herg_export.py @@ -49,9 +49,6 @@ def run(data_path, output_path, experiment_name, reversal_potential=None, @param reversal_potential The calculated reversal potential, or ``None`` @param figsize The matplotlib figure size, or ``None``. """ - # TODO: Find some way around setting this - matplotlib.use('Agg') - output_path = setup_output_directory(output_path) leak_parameters_df = pd.read_csv(os.path.join(data_path, 'subtraction_qc.csv')) From b8b5604dd5c8dbc1fbb39d8bc8f5d83d4dbaf66c Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Wed, 19 Nov 2025 18:49:20 +0000 Subject: [PATCH 2/9] Working through run-herg-qc --- pcpostprocess/scripts/run_herg_qc.py | 93 ++++++++++++++-------------- 1 file changed, 47 insertions(+), 46 deletions(-) diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index 0d4df65..fe343e7 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -320,62 +320,63 @@ def pront(*args): # Now go over _all_ protocols, including the QC protocols (AGAIN!), and # call extract_protocol() on them # - if write_traces: + pront(savenames, readnames, times_list) - pront(savenames, readnames, times_list) - - # Export all protocols - savenames, readnames, times_list = [], [], [] - for protocol in res_dict: - - # Sort into chronological order - times = sorted(res_dict[protocol]) - savename = combined_dict[protocol] - - readnames.append(protocol) + # Export all protocols + savenames, readnames, times_list = [], [], [] + for protocol in res_dict: - if len(times) == 2: - savenames.append(savename) - times_list.append(times) + # Sort into chronological order + times = sorted(res_dict[protocol]) + savename = combined_dict[protocol] - elif len(times) == 4: - savenames.append(savename) - times_list.append(times[::2]) + readnames.append(protocol) - # Make seperate savename for protocol repeat - savename = combined_dict[protocol] + '_2' - assert savename not in combined_dict.values() - savenames.append(savename) - times_list.append(times[1::2]) - readnames.append(protocol) + if len(times) == 2: + savenames.append(savename) + times_list.append(times) - # TODO Else raise error? + elif len(times) == 4: + savenames.append(savename) + times_list.append(times[::2]) + # Make seperate savename for protocol repeat + savename = combined_dict[protocol] + '_2' + assert savename not in combined_dict.values() + savenames.append(savename) + times_list.append(times[1::2]) + readnames.append(protocol) - pront(savenames, readnames, times_list) + # TODO Else raise error? - wells_to_export = wells if write_failed_traces else selection - logging.info(f'exporting wells {wells_to_export}') - m = len(readnames) - n = min(max_processes, m) - args = zip(readnames, savenames, times_list, [wells_to_export] * m, - [output_path] * m, [data_path] * m, [figure_size] * m, - [reversal_potential] * m, [save_id] * m) - dfs = starmap(n, extract_protocol, args) - pront(len(dfs)) - for df in dfs: - pront(df) - sys.exit(1) + pront(savenames, readnames, times_list) - if dfs: - extract_df = pd.concat(dfs, ignore_index=True) - extract_df['selected'] = extract_df['well'].isin(selection) - else: - logging.error("Didn't export any data") - return + wells_to_export = wells if write_failed_traces else selection + logging.info(f'exporting wells {wells_to_export}') + m = len(readnames) + n = min(max_processes, m) + args = zip(readnames, savenames, times_list, [wells_to_export] * m, + [output_path] * m, [data_path] * m, [figure_size] * m, + [reversal_potential] * m, [save_id] * m) + dfs = starmap(n, extract_protocol, args) + + pd.set_option('display.max_columns', None) + pd.set_option('display.max_rows', None) + + pront(len(dfs)) + for df in dfs: + pront(df) + sys.exit(1) + + if dfs: + extract_df = pd.concat(dfs, ignore_index=True) + extract_df['selected'] = extract_df['well'].isin(selection) + else: + logging.error("Didn't export any data") + return - logging.info(f"extract_df: {extract_df}") + logging.info(f"extract_df: {extract_df}") # # Do QC3 on first staircase, first sweep VS second staircase, second sweep @@ -638,7 +639,7 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, before_data = before_trace.get_trace_sweeps() after_data = after_trace.get_trace_sweeps() - for i_well, well in enumerate(selected_wells): # Go through all wells + for i_well, well in enumerate(selected_wells): if i_well % 24 == 0: logging.info('row ' + well[0]) From 80858fb3a8f39d6398373fd73274ae300936a701 Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Wed, 19 Nov 2025 19:07:38 +0000 Subject: [PATCH 3/9] Working through run-herg-qc --- pcpostprocess/scripts/run_herg_qc.py | 41 ++++------------------------ 1 file changed, 5 insertions(+), 36 deletions(-) diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index fe343e7..b9e6c50 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -273,8 +273,7 @@ def pront(*args): m = len(readnames) n = min(max_processes, m) - args = zip(readnames, savenames, times_list, [output_path] * m, - [data_path] * m, [wells] * m, [write_traces] * m, [save_id] * m) + args = zip(readnames, savenames, times_list, [data_path] * m, [wells] * m) well_selections, qc_dfs = zip(*starmap(n, run_qc_for_protocol, args)) pront(well_selections) @@ -594,7 +593,6 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, print(f'EXTRACT PROTOCOL {savename}') #logging.info(f"Exporting {readname} as {savename}") - savedir = os.path.join(savedir, '2-extract-protocol') traces_dir = os.path.join(savedir, 'traces') os.makedirs(traces_dir, exist_ok=True) subtraction_plots_dir = os.path.join(savedir, 'subtraction_plots') @@ -847,8 +845,7 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, return extract_df -def run_qc_for_protocol(readname, savename, time_strs, output_path, - data_path, wells, write_traces, save_id): +def run_qc_for_protocol(readname, savename, time_strs, data_path, wells): """ Runs a QC procedure for a single protocol, on the selected wells. @@ -861,20 +858,12 @@ def run_qc_for_protocol(readname, savename, time_strs, output_path, - Traces are drug subtracted - No capacitative spike filtering - Also: - - Creates a directory ``output_path/1-qc`` - - Writes leak subtracted traces at ``1-qc/save_id-...csv`` - - Makes leak correction plots at ``output_path/1-qc/leak_correction``. - These are the plots made by `fit_linear_leak()`. @param readname The protocol name, without the time part @param savename The shorter name for the protocol @param time_strs The time part of the protocol names, must have length 2 - @param output_path The root directory to chuck everything in @param data_path The path to read data from @param wells The wells to process - @param write_traces True if traces should be written too - @param save_id Returns a tuple `(selected_wells, detailed_qc)`` where ``selected_wells`` is a list containing all selected well names, and ``detailed_qc`` is a @@ -900,11 +889,6 @@ def run_qc_for_protocol(readname, savename, time_strs, output_path, assert np.all(voltage == after_trace.get_voltage()) hergqc = hERGQC(voltage, sampling_rate) - # Store stuff to "QC/leak_correction" - save_dir = os.path.join(output_path, '1-qc') - plot_dir = os.path.join(save_dir, 'leak_correction') - os.makedirs(plot_dir, exist_ok=True) - # Assume two sweeps! sweeps = [0, 1] nsweeps = len(sweeps) @@ -931,16 +915,10 @@ def run_qc_for_protocol(readname, savename, time_strs, output_path, corrected_before = np.empty((nsweeps, nsamples)) corrected_after = np.empty((nsweeps, nsamples)) for isweep in range(nsweeps): - # Get corrected traces, and make plot _, before_leak = fit_linear_leak( - raw_before[well][isweep], voltage, times, *ramp_bounds, - save_fname=os.path.join( - plot_dir, f'{savename}-{well}-{isweep}-before.png')) + raw_before[well][isweep], voltage, times, *ramp_bounds) _, after_leak = fit_linear_leak( - raw_after[well][isweep], voltage, times, *ramp_bounds, - save_fname=os.path.join( - plot_dir, f'{savename}-{well}-{isweep}-after.png')) - + raw_after[well][isweep], voltage, times, *ramp_bounds) corrected_before[isweep] = raw_before[well][isweep] - before_leak corrected_after[isweep] = raw_after[well][isweep] - after_leak @@ -955,15 +933,6 @@ def run_qc_for_protocol(readname, savename, time_strs, output_path, if QC.all_passed(): selected_wells.append(well) - # Save subtracted current in csv file - if write_traces: - header = '"current"' - for i in range(nsweeps): - savepath = os.path.join( - save_dir, f'{save_id}-{savename}-{well}-sweep{i}.csv') - subtracted = corrected_before[i] - corrected_after[i] - np.savetxt(savepath, subtracted, delimiter=',', header=header) - # TODO: Depends on QC used column_labels = ['well', 'qc1.rseal', 'qc1.cm', 'qc1.rseries', 'qc2.raw', 'qc2.subtracted', 'qc3.raw', 'qc3.E4031', 'qc3.subtracted', @@ -1154,7 +1123,7 @@ def fit_func(x, args=None): ] # TESTING ONLY - # np.random.seed(1) + np.random.seed(1) #  Repeat optimisation with different starting guesses x0s = [[np.random.uniform(lower_b, upper_b) for lower_b, upper_b in bounds] for i in range(100)] From ed789c8d8b39281453898c0f948daf946130f729 Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Wed, 19 Nov 2025 19:39:56 +0000 Subject: [PATCH 4/9] Working through run-herg-qc --- pcpostprocess/scripts/run_herg_qc.py | 50 ++++++++++++++++++---------- 1 file changed, 33 insertions(+), 17 deletions(-) diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index b9e6c50..f90ed04 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -28,6 +28,23 @@ from pcpostprocess.subtraction_plots import do_subtraction_plot +def write_csv(data, path, *args, header=None): + """ + Writes a numpy array ``data`` to ``path``. + + Optional extra arguments are used for longer names:: + + write_csv(x, 'path', 'to', 'file.csv') + + writes ``x`` to ``path/to/file.csv``. + """ + if header is not None: + if not header.startswith('"'): + header = f'"{header}"' + path = os.path.join(path, *args) + np.savetxt(path, data, delimiter=',', comments='', header=header or '') + + def starmap(n, func, iterable): """ Like ``multiprocessing.Pool.starmap``, but does not use subprocesses when @@ -628,8 +645,8 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, columns=['time', 'voltage']) voltage_df.to_csv(os.path.join( traces_dir, f"{save_id}-{savename}-voltages.csv")) - #np.savetxt(os.path.join(traces_dir, f"{save_id}-{savename}-times.csv"), - # times) + #write_csv(times, traces_dir, f'{save_id}-{savename}-times.csv') + qc_before = before_trace.get_onboard_QC_values() qc_after = after_trace.get_onboard_QC_values() @@ -646,14 +663,14 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, # Save before and after drug traces as .csv for sweep in range(nsweeps): - save_fname = os.path.join( - traces_dir, f'{save_id}-{savename}-{well}-before-sweep{sweep}.csv') - np.savetxt(save_fname, before_data[well][sweep], delimiter=',', - header='"current"') - save_fname = os.path.join( - traces_dir, f'{save_id}-{savename}-{well}-after-sweep{sweep}.csv') - np.savetxt(save_fname, after_data[well][sweep], delimiter=',', - header='"current"') + write_csv( + before_data[well][sweep], traces_dir, + f'{save_id}-{savename}-{well}-before-sweep{sweep}.csv', + header='current') + write_csv( + after_data[well][sweep], traces_dir, + f'{save_id}-{savename}-{well}-after-sweep{sweep}.csv', + header='current') # plot subtraction fig = plt.figure(figsize=figure_size, layout='constrained') @@ -783,14 +800,12 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, row_dict['QC4'] = all([x for x, _ in qc4]) - out_fname = os.path.join(traces_dir, - f"{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv") - np.savetxt(out_fname, subtracted_trace.flatten()) + write_csv( + subtracted_trace, traces_dir, + f'{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv') - rows.append(row_dict) - - param, leak = fit_linear_leak(current, voltage, times, - *ramp_bounds) + param, leak = fit_linear_leak( + current, voltage, times, *ramp_bounds) subtracted_trace = current - leak @@ -808,6 +823,7 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, row_dict['-120mV decay time constant 2'] = res[0][1] row_dict['-120mV decay time constant 3'] = res[1] row_dict['-120mV peak current'] = res[2] + rows.append(row_dict) before_leak_current_dict[well] = np.vstack(before_leak_currents) after_leak_current_dict[well] = np.vstack(after_leak_currents) From 2e9b85ff1553b43520fdc90be3e66638829d5377 Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Thu, 20 Nov 2025 16:44:01 +0000 Subject: [PATCH 5/9] Working through run-herg-qc --- pcpostprocess/scripts/run_herg_qc.py | 471 ++++++++++++--------------- 1 file changed, 208 insertions(+), 263 deletions(-) diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index f90ed04..3f5f302 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -210,7 +210,6 @@ def run(data_path, output_path, qc_map, wells=None, # TODO: Replace this by looping over qc_map and write_map? res_dict = {} for dirname in os.listdir(data_path): - print(dirname, os.path.basename(dirname)) dirname = os.path.basename(dirname) match = protocols_regex.match(dirname) @@ -235,7 +234,6 @@ def pront(*args): for arg in args: print(arg) - pront(res_dict) # At this point, despite its name, res_dict is not a dictionary of results, # but a map of QC protocol names onto lists of times (see comment above) @@ -252,7 +250,6 @@ def pront(*args): continue times = sorted(res_dict[protocol]) - pront(times) savename = qc_map[protocol] @@ -283,8 +280,6 @@ def pront(*args): # savenames: short user name, repeat has _2 added on # times_list: [before1, after1], [before2, after2] - pront(readnames, savenames, times_list) - if not readnames: raise ValueError('No compatible protocols specified.') @@ -293,9 +288,6 @@ def pront(*args): args = zip(readnames, savenames, times_list, [data_path] * m, [wells] * m) well_selections, qc_dfs = zip(*starmap(n, run_qc_for_protocol, args)) - pront(well_selections) - pront(qc_dfs) - # # Assuming a single QC protocol. At this point, we have # @@ -311,7 +303,6 @@ def pront(*args): # qc_df = pd.concat(qc_dfs, ignore_index=True) - pront(qc_df) # # At this point, ``qc_df`` is a single dataframe containing the information @@ -336,7 +327,6 @@ def pront(*args): # Now go over _all_ protocols, including the QC protocols (AGAIN!), and # call extract_protocol() on them # - pront(savenames, readnames, times_list) # Export all protocols savenames, readnames, times_list = [], [], [] @@ -365,9 +355,6 @@ def pront(*args): # TODO Else raise error? - - pront(savenames, readnames, times_list) - wells_to_export = wells if write_failed_traces else selection logging.info(f'exporting wells {wells_to_export}') m = len(readnames) @@ -376,24 +363,17 @@ def pront(*args): [output_path] * m, [data_path] * m, [figure_size] * m, [reversal_potential] * m, [save_id] * m) dfs = starmap(n, extract_protocol, args) + if not dfs: + raise Exception('No data exported') pd.set_option('display.max_columns', None) pd.set_option('display.max_rows', None) - pront(len(dfs)) - for df in dfs: - pront(df) + extract_df = pd.concat(dfs, ignore_index=True) + extract_df['selected'] = extract_df['well'].isin(selection) + pront(extract_df) sys.exit(1) - if dfs: - extract_df = pd.concat(dfs, ignore_index=True) - extract_df['selected'] = extract_df['well'].isin(selection) - else: - logging.error("Didn't export any data") - return - - logging.info(f"extract_df: {extract_df}") - # # Do QC3 on first staircase, first sweep VS second staircase, second sweep # qc3.bookend check very first and very last staircases are similar @@ -605,18 +585,62 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, data_path, figure_size, reversal_potential, save_id): # TODO: Tidy up argument order """ - ??? - """ - print(f'EXTRACT PROTOCOL {savename}') - #logging.info(f"Exporting {readname} as {savename}") - - traces_dir = os.path.join(savedir, 'traces') - os.makedirs(traces_dir, exist_ok=True) - subtraction_plots_dir = os.path.join(savedir, 'subtraction_plots') - os.makedirs(subtraction_plots_dir, exist_ok=True) + Extracts a (non-QC) protocol, performs more QC(!), and + + Loads the data (again!) and + - leak-subtracts it using the leak step + - stores before, after, and subtracted traces in ``traces`` (for all + sweeps), along with time and voltage traces + + + - creates and returns a data frame, with entries detailed below + + The dataframe has entries + - ``well`` + - ``sweep`` + - ``protocol`` (savename) + - ``Rseal`` The "before" value + - ``Cm `` The "before" value + - ``Rseries`` The "before" value + - ``gleak_before`` Leak current parameters + - ``E_leak_before`` + - ``gleak_after `` + - ``E_leak_after`` + - ``R_leftover `` + - ``QC.R_leftover `` + - ``E_rev `` + - ``E_rev_before `` + - ``E_rev_after `` + - ``QC.Erev `` + - ``QC6 `` + - ``QC1 `` + - ``QC4 `` + - ``total before-drug flux`` + - ``-120mV decay time constant 1`` + - ``-120mV decay time constant 2`` + - ``-120mV decay time constant 3`` + - ``-120mV peak current`` - row_dict = {} + Also: + - Creates leak subtraction plots, in ``savedir/subtraction_plots`` + - Creates reversal ramp plots, in ``savedir/reversal_plots/`` + - Creates leak correction plots, before and after drug, in + ``savedir/leak_correction/saveid-savename-leak_fit-before/after``. + """ + logging.info(f"Exporting {readname} as {savename}") + + trace_dir = os.path.join(savedir, 'traces') + subtraction_dir = os.path.join(savedir, 'subtraction_plots') + reversal_dir = os.path.join(savedir, 'reversal_plots') + leak_dir1 = os.path.join(savedir, 'leak_correction', + f'{save_id}-{savename}-leak_fit-before') + leak_dir2 = os.path.join(savedir, 'leak_correction', + f'{save_id}-{savename}-leak_fit-after') + for d in (subtraction_dir, reversal_dir, trace_dir, leak_dir1, leak_dir2): + os.makedirs(d, exist_ok=True) + + # Load data before_trace = Trace( os.path.join(data_path, f'{readname}_{time_strs[0]}'), f'{readname}_{time_strs[0]}.json') @@ -624,6 +648,10 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, os.path.join(data_path, f'{readname}_{time_strs[1]}'), f'{readname}_{time_strs[1]}') + # Count sweeps + nsweeps = before_trace.NofSweeps + assert nsweeps == after_trace.NofSweeps + # Get times and voltages times = before_trace.get_times() voltages = before_trace.get_voltage() @@ -635,18 +663,16 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, voltage_protocol = before_trace.get_voltage_protocol() desc = voltage_protocol.get_all_sections() ramp_bounds = detect_ramp_bounds(times, desc) - - nsweeps = before_trace.NofSweeps - assert nsweeps == after_trace.NofSweeps + voltage_steps = [tstart for tstart, tend, vstart, vend in + voltage_protocol.get_all_sections() if vend == vstart] # Store voltages and times, using 2 different libraries... - voltage_df = pd.DataFrame( - np.vstack((times.flatten(), voltages.flatten())).T, - columns=['time', 'voltage']) + # TODO: REPLACE WITH SINGLE TIMES, SINGLE VOLTAGE + voltage_df = pd.DataFrame(np.vstack((times, voltages)).T, + columns=['time', 'voltage']) voltage_df.to_csv(os.path.join( - traces_dir, f"{save_id}-{savename}-voltages.csv")) - #write_csv(times, traces_dir, f'{save_id}-{savename}-times.csv') - + trace_dir, f'{save_id}-{savename}-voltages.csv')) + #write_csv(times, trace_dir, f'{save_id}-{savename}-times.csv') qc_before = before_trace.get_onboard_QC_values() qc_after = after_trace.get_onboard_QC_values() @@ -654,170 +680,130 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, before_data = before_trace.get_trace_sweeps() after_data = after_trace.get_trace_sweeps() - for i_well, well in enumerate(selected_wells): - if i_well % 24 == 0: - logging.info('row ' + well[0]) - - if None in qc_before[well] or None in qc_after[well]: - continue - - # Save before and after drug traces as .csv + # Save before and after drug traces as .csv + for well in selected_wells: for sweep in range(nsweeps): write_csv( - before_data[well][sweep], traces_dir, + before_data[well][sweep], trace_dir, f'{save_id}-{savename}-{well}-before-sweep{sweep}.csv', header='current') write_csv( - after_data[well][sweep], traces_dir, + after_data[well][sweep], trace_dir, f'{save_id}-{savename}-{well}-after-sweep{sweep}.csv', header='current') - # plot subtraction + # Reusable figure to plot subtraction fig = plt.figure(figsize=figure_size, layout='constrained') - plot_dir = os.path.join(savedir, 'reversal_plots') - os.makedirs(plot_dir, exist_ok=True) + # Reusable hergqc object + hergqc = hERGQC(voltages, before_trace.sampling_rate) + # Process wells, create rows for data frame rows = [] - - before_leak_current_dict = {} - after_leak_current_dict = {} - - out1 = os.path.join(savedir, 'leak_correction', - f'{save_id}-{savename}-leak_fit-before') - out2 = os.path.join(savedir, 'leak_correction', - f'{save_id}-{savename}-leak_fit-after') - os.makedirs(out1, exist_ok=True) - os.makedirs(out2, exist_ok=True) - - - hergqc = hERGQC(voltages, before_trace.sampling_rate) for well in selected_wells: - - before_leak_currents = [] - after_leak_currents = [] - for sweep in range(nsweeps): - row_dict = { - 'well': well, - 'sweep': sweep, - 'protocol': savename - } + row_dict = {'well': well, 'sweep': sweep, 'protocol': savename} qc_vals = qc_before[well][sweep] - if qc_vals is None: - continue - if len(qc_vals) == 0: - continue - row_dict['Rseal'] = qc_vals[0] row_dict['Cm'] = qc_vals[1] row_dict['Rseries'] = qc_vals[2] + # Fit leak using step (again!) before_params, before_leak = fit_linear_leak( before_data[well][sweep], voltages, times, *ramp_bounds, - save_fname=os.path.join(out1, f'{well}_sweep{sweep}.png')) - before_leak_currents.append(before_leak) - - # Convert linear regression parameters into conductance and reversal - row_dict['gleak_before'] = before_params[1] - row_dict['E_leak_before'] = -before_params[0] / before_params[1] - + save_fname=os.path.join(leak_dir1, f'{well}_sweep{sweep}.png')) after_params, after_leak = fit_linear_leak( after_data[well][sweep], voltages, times, *ramp_bounds, - save_fname=os.path.join(out2, f'{well}_sweep{sweep}.png')) + save_fname=os.path.join(leak_dir2, f'{well}_sweep{sweep}.png')) + before_corrected = before_data[well][sweep] - before_leak + after_corrected = after_data[well][sweep] - after_leak - after_leak_currents.append(after_leak) + # Store drug-subtracted trace + subtracted = before_corrected - after_corrected + write_csv( + subtracted, trace_dir, + f'{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv') - # Convert linear regression parameters into conductance and reversal + # Create subtraction plot + fig = plt.figure(figsize=figure_size, layout='constrained') + do_subtraction_plot( + fig, times, range(nsweeps), before_data[well], after_data[well], + voltages, ramp_bounds, well=well, protocol=savename) + fig.savefig(os.path.join( + subtraction_dir, + f'{save_id}-{savename}-{well}-sweep{sweep}-subtraction')) + plt.close(fig) + + # Store conductance and reversal + row_dict['gleak_before'] = before_params[1] + row_dict['E_leak_before'] = -before_params[0] / before_params[1] row_dict['gleak_after'] = after_params[1] row_dict['E_leak_after'] = -after_params[0] / after_params[1] - subtracted_trace = before_data[well][sweep] - before_leak\ - - (after_data[well][sweep] - after_leak) - after_corrected = after_data[well][sweep] - after_leak - before_corrected = before_data[well][sweep] - before_leak + # Add "R_leftover" QC + QC_R_leftover = np.sqrt(np.sum(after_corrected**2) / + np.sum(before_corrected**2)) + row_dict['R_leftover'] = QC_R_leftover + row_dict['QC.R_leftover'] = np.all(row_dict['R_leftover'] < 0.5) + # Infer reversal potential and store, creating plots in the process E_rev_before = infer_reversal_potential( before_corrected, times, desc, voltages, output_path=os.path.join( - plot_dir, f'{well}_{savename}_sweep{sweep}_before'), + reversal_dir, f'{well}_{savename}_sweep{sweep}_before.png'), known_Erev=reversal_potential) - E_rev_after = infer_reversal_potential( after_corrected, times, desc, voltages, output_path=os.path.join( - plot_dir, f'{well}_{savename}_sweep{sweep}_after'), + reversal_dir, f'{well}_{savename}_sweep{sweep}_after.png'), known_Erev=reversal_potential) - E_rev = infer_reversal_potential( - subtracted_trace, times, desc, voltages, + subtracted, times, desc, voltages, output_path=os.path.join( - plot_dir, f'{well}_{savename}_sweep{sweep}_subtracted'), + reversal_dir, f'{well}_{savename}_sweep{sweep}_subtracted.png'), known_Erev=reversal_potential) - row_dict['R_leftover'] =\ - np.sqrt(np.sum((after_corrected)**2)/(np.sum(before_corrected**2))) - - QC_R_leftover = np.all(row_dict['R_leftover'] < 0.5) - row_dict['QC.R_leftover'] = QC_R_leftover - + # Store Erevs parameters row_dict['E_rev'] = E_rev row_dict['E_rev_before'] = E_rev_before row_dict['E_rev_after'] = E_rev_after + # Add Erev QC row_dict['QC.Erev'] = E_rev < -50 and E_rev > -120 # Check QC6 for each protocol (not just the staircase) + # For this, use a capacitance-filtered trace + filtered = hergqc.filter_capacitive_spikes( + subtracted, times, voltage_steps) + row_dict['QC6'] = hergqc.qc6(filtered, win=hergqc.qc6_win)[0] - - - times = before_trace.get_times() - voltage = before_trace.get_voltage() - voltage_protocol = before_trace.get_voltage_protocol() - - voltage_steps = [tstart - for tstart, tend, vstart, vend in - voltage_protocol.get_all_sections() if vend == vstart] - - current = hergqc.filter_capacitive_spikes(before_corrected - after_corrected, - times, voltage_steps) - - row_dict['QC6'] = hergqc.qc6(current, win=hergqc.qc6_win)[0] - - #  Assume there is only one sweep for all non-QC protocols + # Check QC1 and QC4, but between before and after traces (assume + # there is only one sweep for non-QC protocols) rseal_before, cm_before, rseries_before = qc_before[well][0] rseal_after, cm_after, rseries_after = qc_after[well][0] - qc1_1 = hergqc.qc1(rseal_before, cm_before, rseries_before) qc1_2 = hergqc.qc1(rseal_after, cm_after, rseries_after) - row_dict['QC1'] = all([x for x, _ in qc1_1 + qc1_2]) - qc4 = hergqc.qc4([rseal_before, rseal_after], [cm_before, cm_after], [rseries_before, rseries_after]) - row_dict['QC4'] = all([x for x, _ in qc4]) - write_csv( - subtracted_trace, traces_dir, - f'{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv') - - param, leak = fit_linear_leak( - current, voltage, times, *ramp_bounds) - - subtracted_trace = current - leak - - t_step = times[1] - times[0] - row_dict['total before-drug flux'] = np.sum(current) * (1.0 / t_step) + # Add a "total flux" value to the data frame, based on the drug + # subtracted and capacitance filtered current + dt = times[1] - times[0] + row_dict['total before-drug flux'] = np.sum(filtered) / dt + # TODO: This measure can be dropped + + # Estimate time constants, based on a trace with yet another round + # of leak-step correction + _, leak = fit_linear_leak(filtered, voltages, times, *ramp_bounds) + fname = os.path.join( + savedir, '-120mV time constant', + f'{savename}-{well}-sweep{sweep}-time-constant-fit.png') res = get_time_constant_of_first_decay( - subtracted_trace, times, desc, - os.path.join( - savedir, 'debug', '-120mV time constant', - f'{savename}-{well}-sweep{sweep}-time-constant-fit.png'), - figure_size - ) + filtered - leak, times, desc, fname, figure_size) row_dict['-120mV decay time constant 1'] = res[0][0] row_dict['-120mV decay time constant 2'] = res[0][1] @@ -825,39 +811,9 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, row_dict['-120mV peak current'] = res[2] rows.append(row_dict) - before_leak_current_dict[well] = np.vstack(before_leak_currents) - after_leak_current_dict[well] = np.vstack(after_leak_currents) - extract_df = pd.DataFrame.from_dict(rows) logging.debug(extract_df) - times = before_trace.get_times() - voltages = before_trace.get_voltage() - - for well in selected_wells: - before_current = before_data[well] - after_current = after_data[well] - - before_leak_currents = before_leak_current_dict[well] - after_leak_currents = after_leak_current_dict[well] - - sub_df = extract_df[extract_df.well == well] - - if len(sub_df.index) == 0: - continue - - sweeps = sorted(list(sub_df.sweep.unique())) - - do_subtraction_plot(fig, times, sweeps, before_current, after_current, - voltages, ramp_bounds, well=well, - protocol=savename) - - fig.savefig(os.path.join(subtraction_plots_dir, - f'{save_id}-{savename}-{well}-sweep{sweep}-subtraction')) - fig.clf() - - plt.close(fig) - return extract_df @@ -886,7 +842,6 @@ def run_qc_for_protocol(readname, savename, time_strs, data_path, wells): pandas dataframe containing the pass/fail status of all individual QC criteria, for all wells. """ - print(f'RUN HERG QC FOR {readname}, {time_strs}, {savename}') # TODO Tidy up argument order assert len(time_strs) == 2 @@ -998,7 +953,6 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, Returns a dictionary mapping well names to booleans. """ - print('RUN QC3 bookend') assert len(time_strs) == 4 filepath_first_before = os.path.join(data_path, f'{readname}_{time_strs[0]}') @@ -1091,6 +1045,8 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, def get_time_constant_of_first_decay( trace, times, protocol_desc, output_path, figure_size): """ + Fits a XXX exponential to the -120mV step following the 40mV step at the + start of most protocols. ??? """ os.makedirs(os.path.dirname(output_path), exist_ok=True) @@ -1111,45 +1067,40 @@ def get_time_constant_of_first_decay( indices = np.argwhere((times >= peak_time) & (times <= tend - 50)) - def fit_func(x, args=None): - # Pass 'args=single' when we want to use a single exponential. - # Otherwise use 2 exponentials - if args: - single = args == 'single' - else: - single = False - - if not single: - a, b, c, d = x - if d < b: - b, d = d, b - prediction = c * np.exp((-1.0/d) * (times[indices] - peak_time)) + \ - a * np.exp((-1.0/b) * (times[indices] - peak_time)) - else: - a, b = x - prediction = a * np.exp((-1.0/b) * (times[indices] - peak_time)) + def fit1(x): + """ Error function to fit a one exponential. """ + a, b = x + # prediction = a * np.exp((peak_time - times[indices]) / b) + # No major difference, but keeps it compatible with original code + prediction = a * np.exp((-1.0/b) * (times[indices] - peak_time)) + return np.sum((prediction - trace[indices])**2) + def fit2(x): + """ Error function to fit two exponentials. """ + a, b, c, d = x + if d < b: + b, d = d, b + # prediction = (c * np.exp((peak_time - times[indices]) / d) + + # a * np.exp((peak_time - times[indices]) / b)) + prediction = (c * np.exp((-1.0/d) * (times[indices] - peak_time)) + + a * np.exp((-1.0/b) * (times[indices] - peak_time))) return np.sum((prediction - trace[indices])**2) - bounds = [ - (-np.abs(trace).max()*2, 0), - (1e-12, 5e3), - (-np.abs(trace).max()*2, 0), - (1e-12, 5e3), - ] + a_bounds = (-np.abs(trace).max()*2, 0) + b_bounds = (1e-12, 5e3) # TESTING ONLY np.random.seed(1) #  Repeat optimisation with different starting guesses - x0s = [[np.random.uniform(lower_b, upper_b) for lower_b, upper_b in bounds] for i in range(100)] - + bounds = (a_bounds, b_bounds, a_bounds, b_bounds) + x0s = [[np.random.uniform(*b) for b in bounds] for i in range(100)] + # Not required, but keeps it compatible with original code x0s = [[a, b, c, d] if d < b else [a, d, c, b] for (a, b, c, d) in x0s] best_res = None for x0 in x0s: - res = scipy.optimize.minimize(fit_func, x0=x0, - bounds=bounds) + res = scipy.optimize.minimize(fit2, x0=x0, bounds=bounds) if best_res is None: best_res = res elif res.fun < best_res.fun and res.success and res.fun != 0: @@ -1157,18 +1108,12 @@ def fit_func(x, args=None): res1 = best_res #  Re-run with single exponential - bounds = [ - (-np.abs(trace).max()*2, 0), - (1e-12, 5e3), - ] - - #  Repeat optimisation with different starting guesses - x0s = [[np.random.uniform(lower_b, upper_b) for lower_b, upper_b in bounds] for i in range(100)] + bounds = (a_bounds, b_bounds) + x0s = [[np.random.uniform(*b) for b in bounds] for i in range(100)] best_res = None for x0 in x0s: - res = scipy.optimize.minimize(fit_func, x0=x0, - bounds=bounds, args=('single',)) + res = scipy.optimize.minimize(fit1, x0=x0, bounds=bounds) if best_res is None: best_res = res elif res.fun < best_res.fun and res.success and res.fun != 0: @@ -1176,75 +1121,75 @@ def fit_func(x, args=None): res2 = best_res if not res2: - logging.warning('finding 120mv decay timeconstant failed:' + str(res)) + logging.warning(f'finding 120mv decay timeconstant failed: {res}') - if output_path and res: - fig = plt.figure(figsize=figure_size, constrained_layout=True) - axs = fig.subplots(2) + # Create a plot + fig = plt.figure(figsize=figure_size, constrained_layout=True) + axs = fig.subplots(2) - for ax in axs: - ax.spines[['top', 'right']].set_visible(False) - ax.set_ylabel(r'$I_\mathrm{obs}$ (pA)') + for ax in axs: + ax.spines[['top', 'right']].set_visible(False) + ax.set_ylabel(r'$I_\mathrm{obs}$ (pA)') - axs[-1].set_xlabel(r'$t$ (ms)') + axs[-1].set_xlabel(r'$t$ (ms)') - protocol_ax, fit_ax = axs - protocol_ax.set_title('a', fontweight='bold', loc='left') - fit_ax.set_title('b', fontweight='bold', loc='left') - fit_ax.plot(peak_time, peak_current, marker='x', color='red') + protocol_ax, fit_ax = axs + protocol_ax.set_title('a', fontweight='bold', loc='left') + fit_ax.set_title('b', fontweight='bold', loc='left') + fit_ax.plot(peak_time, peak_current, marker='x', color='red') - a, b, c, d = res1.x + a, b, c, d = res1.x - if d < b: - b, d = d, b + if d < b: + b, d = d, b - e, f = res2.x + e, f = res2.x - fit_ax.plot(times[indices], trace[indices], color='grey', - alpha=.5) - fit_ax.plot(times[indices], c * np.exp((-1.0/d) * (times[indices] - peak_time)) - + a * np.exp(-(1.0/b) * (times[indices] - peak_time)), - color='red', linestyle='--') + fit_ax.plot(times[indices], trace[indices], color='grey', + alpha=.5) + fit_ax.plot(times[indices], c * np.exp((-1.0/d) * (times[indices] - peak_time)) + + a * np.exp(-(1.0/b) * (times[indices] - peak_time)), + color='red', linestyle='--') - res_string = r'$\tau_{1} = ' f"{d:.1f}" r'\mathrm{ms}'\ - r'\; \tau_{2} = ' f"{b:.1f}" r'\mathrm{ms}$' + res_string = r'$\tau_{1} = ' f"{d:.1f}" r'\mathrm{ms}'\ + r'\; \tau_{2} = ' f"{b:.1f}" r'\mathrm{ms}$' - fit_ax.annotate(res_string, xy=(0.5, 0.05), xycoords='axes fraction') + fit_ax.annotate(res_string, xy=(0.5, 0.05), xycoords='axes fraction') - protocol_ax.plot(times, trace) - protocol_ax.axvspan(peak_time, tend - 50, alpha=.5, color='grey') + protocol_ax.plot(times, trace) + protocol_ax.axvspan(peak_time, tend - 50, alpha=.5, color='grey') - fig.savefig(output_path) - fit_ax.set_yscale('symlog') + fig.savefig(output_path) + fit_ax.set_yscale('symlog') - dirname, filename = os.path.split(output_path) - filename = 'log10_' + filename - fig.savefig(os.path.join(dirname, filename)) + dirname, filename = os.path.split(output_path) + filename = 'log10_' + filename + fig.savefig(os.path.join(dirname, filename)) - fit_ax.cla() + fit_ax.cla() - dirname, filename = os.path.split(output_path) - filename = 'single_exp_' + filename - output_path = os.path.join(dirname, filename) + dirname, filename = os.path.split(output_path) + filename = 'single_exp_' + filename + output_path = os.path.join(dirname, filename) - fit_ax.plot(times[indices], trace[indices], color='grey', - alpha=.5) - fit_ax.plot(times[indices], e * np.exp((-1.0/f) * (times[indices] - peak_time)), - color='red', linestyle='--') + fit_ax.plot(times[indices], trace[indices], color='grey', + alpha=.5) + fit_ax.plot(times[indices], e * np.exp((-1.0/f) * (times[indices] - peak_time)), + color='red', linestyle='--') - res_string = r'$\tau = ' f"{f:.1f}" r'\mathrm{ms}$' + res_string = r'$\tau = ' f"{f:.1f}" r'\mathrm{ms}$' - fit_ax.annotate(res_string, xy=(0.5, 0.05), xycoords='axes fraction') - fig.savefig(output_path) + fit_ax.annotate(res_string, xy=(0.5, 0.05), xycoords='axes fraction') + fig.savefig(output_path) - dirname, filename = os.path.split(output_path) - filename = 'log10_' + filename - fit_ax.set_yscale('symlog') - fig.savefig(os.path.join(dirname, filename)) + dirname, filename = os.path.split(output_path) + filename = 'log10_' + filename + fit_ax.set_yscale('symlog') + fig.savefig(os.path.join(dirname, filename)) - plt.close(fig) + plt.close(fig) - return (d, b), f, peak_current if res else (np.nan, np.nan), np.nan, peak_current + return (d, b), f, peak_current if res1 else (np.nan, np.nan), np.nan, peak_current if __name__ == '__main__': # pragma: no cover From 546ff4866e3a2d7afbd2fa4cfa068924e235a46b Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Wed, 7 Jan 2026 16:09:35 +0000 Subject: [PATCH 6/9] Tidying run_herg_qc script --- pcpostprocess/scripts/run_herg_qc.py | 328 +++++++++++++++------------ 1 file changed, 187 insertions(+), 141 deletions(-) diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index 3f5f302..eaadd8b 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -135,9 +135,14 @@ def run(data_path, output_path, qc_map, wells=None, This proceeds with the following steps: - 1. All wells (or those selected with ``wells``) are run through "plain" QC, - for every protocol in ``qc_map``. - + 1. All wells (or those selected with ``wells``) are run through staircase + QC, for every protocol in ``qc_map``. These protocols are expected to be + run either 2 times (before all other protocols, with and without IKr + blocker) or 4 times (before _and_ after all other protocols). This test + is performed by :meth:`run_staircase_qc`. + 2. The list of wells that pass staircase QC are then stored in a file + ``selected-{save_id}-txt``. + 3. For all protocols in either ``qc_map`` or @@ -145,19 +150,22 @@ def run(data_path, output_path, qc_map, wells=None, @param data_path The path to read data from @param output_path The path to write output to - @param qc_map A dictionary mapping input protocol names to output names. - All protocols in this dictionary will be used for quality control. + @param qc_map A dictionary mapping staircase protocol names to output + names. All protocols in this dictionary will be used for staircase + quality control and export. @param wells A list of strings indicating the wells to use, or ``None`` for all wells. @param write_traces Set to ``True`` to write (raw and processed) traces to the output directory in CSV format. @param write_failed_traces Set to ``True`` to write traces for wells failing quality control. Ignored if ``write_traces=False`. - @param write_map A dictionary like ``qc_map``, but specifying protocols to - write traces for without using them in quality control. Ignored if + @param write_map A dictionary like ``qc_map``, but specifying non-staircase + protocols to run generic QC on and export. Ignored if ``write_traces=False`. @param reversal_potential The calculated reversal potential, in mV. - @param reversal_spread_threshold The maximum reversal + @param reversal_spread_threshold The maximum range of reversal potential + values in all tested protocols (``qc_map`` and ``write_map`` if + ``write_traces=True``). @param max_processes The maximum number of processes to run simultaneously @param figure_size An optional tuple specifying the size of figures to create @@ -229,29 +237,35 @@ def run(data_path, output_path, qc_map, wells=None, res_dict[protocol_name].append(time) - def pront(*args): - print('********') + #TEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMP + + def pront(*args, t=None): + if t is None: + print('********') + else: + print(f'******** {t} ********') for arg in args: print(arg) + pd.set_option('display.max_columns', None) + pd.set_option('display.max_rows', None) + + #TEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMP + # At this point, despite its name, res_dict is not a dictionary of results, # but a map of QC protocol names onto lists of times (see comment above) # - # Prepare arguments to call `run_qc_for_protocol` + # Prepare arguments to call `run_staircase_qc` # - combined_dict = qc_map | write_map - # Select QC protocols and times readnames, savenames, times_list = [], [], [] for protocol in res_dict: if protocol not in qc_map: continue - - times = sorted(res_dict[protocol]) - savename = qc_map[protocol] + times = sorted(res_dict[protocol]) if len(times) == 2: savenames.append(savename) @@ -264,14 +278,14 @@ def pront(*args): times_list.append([times[0], times[2]]) # Make seperate savename for protocol repeat - savename = combined_dict[protocol] + '_2' + savename = qc_map[protocol] + '_2' assert savename not in write_map.values() savenames.append(savename) times_list.append([times[1], times[3]]) readnames.append(protocol) else: - raise ValueError('Expecting 2 or 4 repeats of the QC protocol') + raise Exception('Expecting QC protocol run 2 or 4 times') # For two repeats, we now have # savenames: short user name, one per QC protocol @@ -286,7 +300,7 @@ def pront(*args): m = len(readnames) n = min(max_processes, m) args = zip(readnames, savenames, times_list, [data_path] * m, [wells] * m) - well_selections, qc_dfs = zip(*starmap(n, run_qc_for_protocol, args)) + well_selections, qc_dfs = zip(*starmap(n, run_staircase_qc, args)) # # Assuming a single QC protocol. At this point, we have @@ -310,11 +324,11 @@ def pront(*args): # where for the second run it has _2 appended # - # Combine QC protocls into overall_selection - selection = [set(x) for x in well_selections] - selection = selection[0].intersection(*selection[1:]) + # Get set of wells passing both s1 and s2 QC + selection = set(well_selections[0]).intersection(set(well_selections[1])) - # Store "plain QC" selections in "selected" files + # Store wells selected by "plain QC" in files starting "selected-" + # TODO: name something more specific, e.g. passed-staircase-{save_id}.txt fname = os.path.join(output_path, f'selected-{save_id}.txt') with open(fname, 'w') as f: f.write('\n'.join(selection)) @@ -324,19 +338,19 @@ def pront(*args): f.write('\n'.join(partial)) # - # Now go over _all_ protocols, including the QC protocols (AGAIN!), and - # call extract_protocol() on them + # Now go over _all_ protocols, including the QC protocols (again!), and + # call run_generic_qc() on them # + combined_dict = qc_map | write_map # Export all protocols savenames, readnames, times_list = [], [], [] for protocol in res_dict: # Sort into chronological order - times = sorted(res_dict[protocol]) savename = combined_dict[protocol] - readnames.append(protocol) + times = sorted(res_dict[protocol]) if len(times) == 2: savenames.append(savename) @@ -350,10 +364,11 @@ def pront(*args): savename = combined_dict[protocol] + '_2' assert savename not in combined_dict.values() savenames.append(savename) - times_list.append(times[1::2]) readnames.append(protocol) + times_list.append(times[1::2]) - # TODO Else raise error? + else: + raise Exception('Expecting QC protocol run 2 or 4 times') wells_to_export = wells if write_failed_traces else selection logging.info(f'exporting wells {wells_to_export}') @@ -362,64 +377,89 @@ def pront(*args): args = zip(readnames, savenames, times_list, [wells_to_export] * m, [output_path] * m, [data_path] * m, [figure_size] * m, [reversal_potential] * m, [save_id] * m) - dfs = starmap(n, extract_protocol, args) + dfs = starmap(n, run_generic_qc, args) if not dfs: raise Exception('No data exported') - pd.set_option('display.max_columns', None) - pd.set_option('display.max_rows', None) + # + # At this point, dfs is a list containing one dataframe _per protocol_. If + # write_failed_traces is set, each dataframe contains information on all + # wells, if not, each contains information only on wells passing + # staircase QC. + # extract_df = pd.concat(dfs, ignore_index=True) extract_df['selected'] = extract_df['well'].isin(selection) - pront(extract_df) - sys.exit(1) # - # Do QC3 on first staircase, first sweep VS second staircase, second sweep - # qc3.bookend check very first and very last staircases are similar + # At this point, extract_df is a big dataframe with information about all + # protocols for all wells (write_failed_traces=True) or just the wells + # passing staircase QC (write_failed_traces=False). + # It inludes new calculated quantities such as ``Erev`` and ``R_leftover``. # + # + # QC3-bookend: Run QC3 on first sweep of first staircase VS second sweep of + # second staircase: this checks if the very first and very last staircases + # are similar (after drug subtraction) + # + # This is run on all wells, regardless of staircase QC. + # protocol, savename = list(qc_map.items())[0] times = sorted(res_dict[protocol]) if len(times) == 4: + # If we have staircase-other-staircase before & after drug qc3_bookend_dict = qc3_bookend( protocol, savename, times, wells, output_path, data_path, figure_size, save_id) - else: - #TODO: Better indicate that it wasn't run? + elif len(times) == 2: + # TODO: Better to indicate that it wasn't run? qc3_bookend_dict = {well: True for well in qc_df.well.unique()} + else: + raise Exception('Expecting QC protocol run 2 or 4 times') + + # Store the results by adding an extra column to qc_df qc_df['qc3.bookend'] = [qc3_bookend_dict[well] for well in qc_df.well] - pront(qc_df) + del(qc3_bookend_dict) # # # - qc_erev_spread = {} erev_spreads = {} + qc_erev_spread = {} passed_qc_dict = {} for well in extract_df.well.unique(): - logging.info(f"Checking QC for well {well}") - # Select only this well + logging.info(f'Checking final QC for well {well}') + + # Get extract_df and qc_df for only this well sub_df = extract_df[extract_df.well == well] sub_qc_df = qc_df[qc_df.well == well] - passed_qc3_bookend = np.all(sub_qc_df['qc3.bookend'].values) - logging.info(f"passed_QC3_bookend_all {passed_qc3_bookend}") + # Check if generic QC passed for all protocols for this well passed_QC_Erev_all = np.all(sub_df['QC.Erev'].values) passed_QC1_all = np.all(sub_df.QC1.values) - logging.info(f"passed_QC1_all {passed_QC1_all}") - passed_QC4_all = np.all(sub_df.QC4.values) - logging.info(f"passed_QC4_all {passed_QC4_all}") passed_QC6_all = np.all(sub_df.QC6.values) - logging.info(f"passed_QC6_all {passed_QC1_all}") + passed_qc3_bookend = np.all(sub_qc_df['qc3.bookend'].values) + passed_QC_Erev_all = np.all(sub_df['QC.Erev'].values) + logging.info(f'passed_QC1_all {passed_QC1_all}') + logging.info(f'passed_QC4_all {passed_QC4_all}') + logging.info(f'passed_QC6_all {passed_QC6_all}') + logging.info(f'passed_QC3_bookend_all {passed_qc3_bookend}') + logging.info(f'passed_QC_Erev_all {passed_QC_Erev_all}') + + # Perform "QC_Erev_spread": this passes if the range of E_rev values in + # all protocols is below a threshold + # QC Erev spread: check spread in reversal potential isn't too large E_revs = sub_df['E_rev'].values.flatten().astype(np.float64) E_rev_spread = E_revs.max() - E_revs.min() - # QC Erev spread: check spread in reversal potential isn't too large passed_QC_Erev_spread = E_rev_spread <= reversal_spread_threshold - logging.info(f"passed_QC_Erev_spread {passed_QC_Erev_spread}") + logging.info(f'passed_QC_Erev_spread {passed_QC_Erev_spread}') + del(E_revs) + erev_spreads[well] = E_rev_spread + qc_erev_spread[well] = passed_QC_Erev_spread # R_leftover only considered for protocols used for QC (i.e. staircase protocols) passed_QC_R_leftover = np.all(sub_df[sub_df.protocol.isin(qc_map.values())] @@ -427,29 +467,31 @@ def pront(*args): logging.info(f"passed_QC_R_leftover {passed_QC_R_leftover}") - passed_QC_Erev_spread = E_rev_spread <= reversal_spread_threshold - - qc_erev_spread[well] = passed_QC_Erev_spread - erev_spreads[well] = E_rev_spread - passed_QC_Erev_all = np.all(sub_df['QC.Erev'].values) - logging.info(f"passed_QC_Erev_all {passed_QC_Erev_all}") + passed_staircase_QC = np.all(sub_df['selected'].values) - was_selected = np.all(sub_df['selected'].values) - - passed_qc = passed_qc3_bookend and was_selected\ - and passed_QC_Erev_all and passed_QC6_all\ - and passed_QC_Erev_spread and passed_QC1_all\ + passed_qc = ( + passed_staircase_QC + and passed_QC1_all and passed_QC4_all + and passed_QC6_all + and passed_qc3_bookend + and passed_QC_Erev_all + and passed_QC_Erev_spread + ) passed_qc_dict[well] = passed_qc + pront(extract_df, t='extract_df, pre') extract_df['passed QC'] = [passed_qc_dict[well] for well in extract_df.well] extract_df['QC.Erev.spread'] = [qc_erev_spread[well] for well in extract_df.well] extract_df['Erev_spread'] = [erev_spreads[well] for well in extract_df.well] + pront(extract_df, t='extract_df, post') + sys.exit(1) chrono_dict = {times[0]: prot for prot, times in zip(savenames, times_list)} + pront('CHRONO', os.path.join(output_path, 'chrono.txt')) with open(os.path.join(output_path, 'chrono.txt'), 'w') as fout: for key in sorted(chrono_dict): val = chrono_dict[key] @@ -581,18 +623,18 @@ def agg_func(x): return ret_df -def extract_protocol(readname, savename, time_strs, selected_wells, savedir, - data_path, figure_size, reversal_potential, save_id): - # TODO: Tidy up argument order +def run_generic_qc(readname, savename, time_strs, selected_wells, savedir, + data_path, figure_size, reversal_potential, save_id): """ - Extracts a (non-QC) protocol, performs more QC(!), and + Performs QC on a protocol (staircase or other), and extracts and stores the + data. Loads the data (again!) and - leak-subtracts it using the leak step - stores before, after, and subtracted traces in ``traces`` (for all sweeps), along with time and voltage traces - - + - performs QC (1, 4, 6, Erev, R_leftover) using the meta data, leak step, + and up-down step after the leak step - creates and returns a data frame, with entries detailed below The dataframe has entries @@ -628,7 +670,7 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, ``savedir/leak_correction/saveid-savename-leak_fit-before/after``. """ - logging.info(f"Exporting {readname} as {savename}") + logging.info(f'Exporting {readname} as {savename}') trace_dir = os.path.join(savedir, 'traces') subtraction_dir = os.path.join(savedir, 'subtraction_plots') @@ -702,12 +744,12 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, rows = [] for well in selected_wells: for sweep in range(nsweeps): - row_dict = {'well': well, 'sweep': sweep, 'protocol': savename} + row = {'well': well, 'sweep': sweep, 'protocol': savename} qc_vals = qc_before[well][sweep] - row_dict['Rseal'] = qc_vals[0] - row_dict['Cm'] = qc_vals[1] - row_dict['Rseries'] = qc_vals[2] + row['Rseal'] = qc_vals[0] + row['Cm'] = qc_vals[1] + row['Rseries'] = qc_vals[2] # Fit leak using step (again!) before_params, before_leak = fit_linear_leak( @@ -736,16 +778,16 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, plt.close(fig) # Store conductance and reversal - row_dict['gleak_before'] = before_params[1] - row_dict['E_leak_before'] = -before_params[0] / before_params[1] - row_dict['gleak_after'] = after_params[1] - row_dict['E_leak_after'] = -after_params[0] / after_params[1] + row['gleak_before'] = before_params[1] + row['E_leak_before'] = -before_params[0] / before_params[1] + row['gleak_after'] = after_params[1] + row['E_leak_after'] = -after_params[0] / after_params[1] # Add "R_leftover" QC QC_R_leftover = np.sqrt(np.sum(after_corrected**2) / np.sum(before_corrected**2)) - row_dict['R_leftover'] = QC_R_leftover - row_dict['QC.R_leftover'] = np.all(row_dict['R_leftover'] < 0.5) + row['R_leftover'] = QC_R_leftover + row['QC.R_leftover'] = np.all(row['R_leftover'] < 0.5) # Infer reversal potential and store, creating plots in the process E_rev_before = infer_reversal_potential( @@ -765,18 +807,18 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, known_Erev=reversal_potential) # Store Erevs parameters - row_dict['E_rev'] = E_rev - row_dict['E_rev_before'] = E_rev_before - row_dict['E_rev_after'] = E_rev_after + row['E_rev'] = E_rev + row['E_rev_before'] = E_rev_before + row['E_rev_after'] = E_rev_after # Add Erev QC - row_dict['QC.Erev'] = E_rev < -50 and E_rev > -120 + row['QC.Erev'] = E_rev < -50 and E_rev > -120 # Check QC6 for each protocol (not just the staircase) # For this, use a capacitance-filtered trace filtered = hergqc.filter_capacitive_spikes( subtracted, times, voltage_steps) - row_dict['QC6'] = hergqc.qc6(filtered, win=hergqc.qc6_win)[0] + row['QC6'] = hergqc.qc6(filtered, win=hergqc.qc6_win)[0] # Check QC1 and QC4, but between before and after traces (assume # there is only one sweep for non-QC protocols) @@ -784,16 +826,16 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, rseal_after, cm_after, rseries_after = qc_after[well][0] qc1_1 = hergqc.qc1(rseal_before, cm_before, rseries_before) qc1_2 = hergqc.qc1(rseal_after, cm_after, rseries_after) - row_dict['QC1'] = all([x for x, _ in qc1_1 + qc1_2]) + row['QC1'] = all([x for x, _ in qc1_1 + qc1_2]) qc4 = hergqc.qc4([rseal_before, rseal_after], [cm_before, cm_after], [rseries_before, rseries_after]) - row_dict['QC4'] = all([x for x, _ in qc4]) + row['QC4'] = all([x for x, _ in qc4]) # Add a "total flux" value to the data frame, based on the drug # subtracted and capacitance filtered current dt = times[1] - times[0] - row_dict['total before-drug flux'] = np.sum(filtered) / dt + row['total before-drug flux'] = np.sum(filtered) / dt # TODO: This measure can be dropped # Estimate time constants, based on a trace with yet another round @@ -805,11 +847,11 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, res = get_time_constant_of_first_decay( filtered - leak, times, desc, fname, figure_size) - row_dict['-120mV decay time constant 1'] = res[0][0] - row_dict['-120mV decay time constant 2'] = res[0][1] - row_dict['-120mV decay time constant 3'] = res[1] - row_dict['-120mV peak current'] = res[2] - rows.append(row_dict) + row['-120mV decay time constant 1'] = res[0][0] + row['-120mV decay time constant 2'] = res[0][1] + row['-120mV decay time constant 3'] = res[1] + row['-120mV peak current'] = res[2] + rows.append(row) extract_df = pd.DataFrame.from_dict(rows) logging.debug(extract_df) @@ -817,9 +859,9 @@ def extract_protocol(readname, savename, time_strs, selected_wells, savedir, return extract_df -def run_qc_for_protocol(readname, savename, time_strs, data_path, wells): +def run_staircase_qc(readname, savename, time_strs, data_path, wells): """ - Runs a QC procedure for a single protocol, on the selected wells. + Runs a QC procedure for a single staircase run, on the selected wells. Assumes: - time_strs has length 2, corresponding to a before- and after-drug trace @@ -830,7 +872,6 @@ def run_qc_for_protocol(readname, savename, time_strs, data_path, wells): - Traces are drug subtracted - No capacitative spike filtering - @param readname The protocol name, without the time part @param savename The shorter name for the protocol @param time_strs The time part of the protocol names, must have length 2 @@ -935,7 +976,7 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, staircase with the second sweep of the second staircase. Also: - - Creates a directory ``output_path/3-qc3-bookend`` + - Creates a directory ``output_path/qc3-bookend`` - Creates plots comparing staircase 1, sweep 1, with staircase 2, sweep 2. TODO: This method repeats lots of steps, unneccesarily: @@ -957,8 +998,8 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, filepath_first_before = os.path.join(data_path, f'{readname}_{time_strs[0]}') filepath_last_before = os.path.join(data_path, f'{readname}_{time_strs[1]}') - json_file_first_before = f"{readname}_{time_strs[0]}" - json_file_last_before = f"{readname}_{time_strs[1]}" + json_file_first_before = f'{readname}_{time_strs[0]}' + json_file_last_before = f'{readname}_{time_strs[1]}' #  Each Trace object contains two sweeps first_before_trace = Trace(filepath_first_before, json_file_first_before) @@ -973,8 +1014,8 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, filepath_first_after = os.path.join(data_path, f'{readname}_{time_strs[2]}') filepath_last_after = os.path.join(data_path, f'{readname}_{time_strs[3]}') - json_file_first_after = f"{readname}_{time_strs[2]}" - json_file_last_after = f"{readname}_{time_strs[3]}" + json_file_first_after = f'{readname}_{time_strs[2]}' + json_file_last_after = f'{readname}_{time_strs[3]}' first_after_trace = Trace(filepath_first_after, json_file_first_after) last_after_trace = Trace(filepath_last_after, json_file_last_after) @@ -1003,7 +1044,7 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, voltage=voltage) # Update output path - output_path = os.path.join(output_path, '3-qc3-bookend') + output_path = os.path.join(output_path, 'qc3-bookend') os.makedirs(output_path, exist_ok=True) # Create figure - will be reused @@ -1045,12 +1086,24 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, def get_time_constant_of_first_decay( trace, times, protocol_desc, output_path, figure_size): """ - Fits a XXX exponential to the -120mV step following the 40mV step at the - start of most protocols. - ??? + Fits a biexponential and single exponential curve to the -120mV step + following the 40mV step at the start of a protocol, returning time + constants estimated both ways. + + @param trace The current + @param times The associated times + @param protocol_desc A list of protocol steps and ramps, as returned by + :meth:`voltage_protocol.get_all_sections()` + @param output_path A filename for a figure. Will store 3 others with + similar names + @param figure_size A matplotlib figsize argument + + Returns a tuple ``(bi_taus), tau, peak_current`` where ``bi_taus`` contains + two time constants determined from a biexponential fit, where ``tau`` is a + single time constant from the single exponential fit, and where + ``peak_current`` is the peak (most negative) current. If any time constants + could not be determined, ``np.nan`` is returned. """ - os.makedirs(os.path.dirname(output_path), exist_ok=True) - first_120mV_step_index = [ i for i, line in enumerate(protocol_desc) if line[2] == 40][0] + 1 @@ -1123,73 +1176,66 @@ def fit2(x): if not res2: logging.warning(f'finding 120mv decay timeconstant failed: {res}') + # # Create a plot + # fig = plt.figure(figsize=figure_size, constrained_layout=True) axs = fig.subplots(2) - for ax in axs: ax.spines[['top', 'right']].set_visible(False) ax.set_ylabel(r'$I_\mathrm{obs}$ (pA)') - - axs[-1].set_xlabel(r'$t$ (ms)') - protocol_ax, fit_ax = axs + protocol_ax.set_title('a', fontweight='bold', loc='left') + protocol_ax.plot(times, trace) + protocol_ax.axvspan(peak_time, tend - 50, alpha=.5, color='grey') + fit_ax.set_title('b', fontweight='bold', loc='left') + fit_ax.set_xlabel(r'$t$ (ms)') fit_ax.plot(peak_time, peak_current, marker='x', color='red') a, b, c, d = res1.x - if d < b: b, d = d, b - e, f = res2.x - fit_ax.plot(times[indices], trace[indices], color='grey', alpha=.5) fit_ax.plot(times[indices], c * np.exp((-1.0/d) * (times[indices] - peak_time)) + a * np.exp(-(1.0/b) * (times[indices] - peak_time)), color='red', linestyle='--') - res_string = r'$\tau_{1} = ' f"{d:.1f}" r'\mathrm{ms}'\ - r'\; \tau_{2} = ' f"{b:.1f}" r'\mathrm{ms}$' - - fit_ax.annotate(res_string, xy=(0.5, 0.05), xycoords='axes fraction') - - protocol_ax.plot(times, trace) - protocol_ax.axvspan(peak_time, tend - 50, alpha=.5, color='grey') + fit_ax.annotate( + rf'$\tau_1$ = {d:.1f} ms; $\tau_2$ = {b:.1f} ms', + xy=(0.5, 0.05), xycoords='axes fraction') + # Store + dirname, filename = os.path.split(output_path) + os.makedirs(dirname, exist_ok=True) fig.savefig(output_path) - fit_ax.set_yscale('symlog') - dirname, filename = os.path.split(output_path) - filename = 'log10_' + filename - fig.savefig(os.path.join(dirname, filename)) + # Store again, now with symmetric log scaling on the y-axis + fit_ax.set_yscale('symlog') + fig.savefig(os.path.join(dirname, 'log10_' + filename)) + # Remove biexponential and store again fit_ax.cla() + fit_ax.plot(times[indices], trace[indices], color='grey', alpha=.5) + fit_ax.plot( + times[indices], e * np.exp((-1.0/f) * (times[indices] - peak_time)), + color='red', linestyle='--') + fit_ax.annotate( + rf'$\tau$ = {f:.1f} ms', xy=(0.5, 0.05), xycoords='axes fraction') - dirname, filename = os.path.split(output_path) - filename = 'single_exp_' + filename - output_path = os.path.join(dirname, filename) - - fit_ax.plot(times[indices], trace[indices], color='grey', - alpha=.5) - fit_ax.plot(times[indices], e * np.exp((-1.0/f) * (times[indices] - peak_time)), - color='red', linestyle='--') + fig.savefig(os.path.join(dirname, 'single_exp_' + filename)) - res_string = r'$\tau = ' f"{f:.1f}" r'\mathrm{ms}$' - - fit_ax.annotate(res_string, xy=(0.5, 0.05), xycoords='axes fraction') - fig.savefig(output_path) - - dirname, filename = os.path.split(output_path) - filename = 'log10_' + filename + # And once again with symmetric log scaling fit_ax.set_yscale('symlog') - fig.savefig(os.path.join(dirname, filename)) - + fig.savefig(os.path.join(dirname, 'log10_single_exp_' + filename)) plt.close(fig) - return (d, b), f, peak_current if res1 else (np.nan, np.nan), np.nan, peak_current + params1 = (d, b) if res1 else (np.nan, np.nan) + param2 = f if res2 else np.nan + return params1, param2, peak_current if __name__ == '__main__': # pragma: no cover From 0c99933fc58e4bdd732585fb8924c20d7f233fd9 Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Wed, 7 Jan 2026 22:17:19 +0000 Subject: [PATCH 7/9] Tidying run_herg_qc script --- pcpostprocess/scripts/run_herg_qc.py | 270 ++++++++++++++------------- 1 file changed, 142 insertions(+), 128 deletions(-) diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index eaadd8b..0584954 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -111,38 +111,52 @@ def run_from_command_line(): # pragma: no cover run( args.data_directory, args.output_dir, - export_config.D2S_QC, + export_config.saveID, + staircase_protocols=export_config.D2S_QC, + additional_protocols=export_config.D2S, wells=args.wells, write_traces=args.output_traces, - write_failed_traces=args.export_failed, - write_map=export_config.D2S, + include_failed_traces=args.export_failed, reversal_potential=args.Erev, reversal_spread_threshold=args.reversal_spread_threshold, max_processes=args.no_cpus, figure_size=args.figsize, - save_id=export_config.saveID, ) -def run(data_path, output_path, qc_map, wells=None, - write_traces=False, write_failed_traces=False, write_map={}, +def run(data_path, output_path, save_id, staircase_protocols, + additional_protocols={}, wells=None, + write_traces=False, include_failed_traces=False, reversal_potential=-90, reversal_spread_threshold=10, - max_processes=1, figure_size=None, save_id=None): + max_processes=1, figure_size=None): """ - Imports traces and runs QC+. - - Makes the following assumptions: + Imports traces and runs several layers of quality control (staircase QC, + followed by secondary QC). This proceeds with the following steps: 1. All wells (or those selected with ``wells``) are run through staircase - QC, for every protocol in ``qc_map``. These protocols are expected to be - run either 2 times (before all other protocols, with and without IKr - blocker) or 4 times (before _and_ after all other protocols). This test - is performed by :meth:`run_staircase_qc`. + QC, for every protocol in ``staircase_protocols``. These protocols are + expected to be run either 2 times (before all other protocols, with and + without IKr blocker) or 4 times (before _and_ after all other + protocols). Timestamps will be used to determine the order of the + sweeps. 2. The list of wells that pass staircase QC are then stored in a file ``selected-{save_id}-txt``. - 3. For all protocols in either ``qc_map`` or + 3. For all protocols in ``staircase_protocols`` and + ``additional_protocols``, a secondary QC layer is run. This is similar + to a stripped-down version of the staircase QC. If ``write_traces=True`` + this step includes storing before, after, and subtracted traces as CSV. + By default, this step is run only for wells that passed staircase QC, + but it can be run for all wells by setting + ``include_failed_traces=True``. + 4. Another staircase QC measure, "QC3 bookend" is run (on all wells, not + just those that passed previous QC). **This should probably be included + in step 1, and affect the selected wells**. + 5. A file "chrono.txt" is created with the order that protocols were run in + 6. Multi-protocol QC is performed on all wells that passed staircase QC (or + all wells, if ``include_failed_traces=True``. + 7. @@ -150,27 +164,27 @@ def run(data_path, output_path, qc_map, wells=None, @param data_path The path to read data from @param output_path The path to write output to - @param qc_map A dictionary mapping staircase protocol names to output - names. All protocols in this dictionary will be used for staircase - quality control and export. + @param save_id An "id" string used in e.g. filenames of created CSVs. + @param staircase_protocols The protocols to run staircase QC on, specified + as a dictionary mapping MapDataControl384 names to shorter names to + use in the output. All protocols in this dictionary will be used for + staircase QC, secondary QC and export of traces. + @param additional_protocols The non-staircase protocols to run secondary QC + on and export traces for, specified as a dictionary mapping + MapDataControl384 names to shorter names to use in the output. @param wells A list of strings indicating the wells to use, or ``None`` for all wells. @param write_traces Set to ``True`` to write (raw and processed) traces to the output directory in CSV format. - @param write_failed_traces Set to ``True`` to write traces for wells - failing quality control. Ignored if ``write_traces=False`. - @param write_map A dictionary like ``qc_map``, but specifying non-staircase - protocols to run generic QC on and export. Ignored if - ``write_traces=False`. + @param include_failed_traces Set to ``True`` to perform QC and export even + on wells failing staircase quality control. @param reversal_potential The calculated reversal potential, in mV. @param reversal_spread_threshold The maximum range of reversal potential - values in all tested protocols (``qc_map`` and ``write_map`` if - ``write_traces=True``). + values allowed in secondary QC, in mV. @param max_processes The maximum number of processes to run simultaneously @param figure_size An optional tuple specifying the size of figures to create - @param save_id Used in some outputs, e.g. as part of CSV names """ # TODO reversal_spread_threshold should be specified the same way as all @@ -179,7 +193,7 @@ def run(data_path, output_path, qc_map, wells=None, # Create output path if necessary, and write info file output_path = setup_output_directory(output_path) - # TODO Remove protocol selection here: this is done via the export file! + # TODO Remove protocol selection here: this is done via the export file # Only protocols listed there are accepted # Select wells to use @@ -215,27 +229,21 @@ def run(data_path, output_path, qc_map, wells=None, # { protocol_name: [time1, time2, time3, ...] } # such that protocol_name_time is a directory - # TODO: Replace this by looping over qc_map and write_map? + # TODO: Replace this by looping over all_protocols? + all_protocols = staircase_protocols | additional_protocols res_dict = {} for dirname in os.listdir(data_path): dirname = os.path.basename(dirname) match = protocols_regex.match(dirname) - if match is None: continue - protocol_name = match.group(1) - - if not (protocol_name in qc_map or protocol_name in write_map): - print(f'Skipping {protocol_name}') + if protocol_name not in all_protocols: + print(f'Skipping unknown protocol {protocol_name}') continue - - time = match.group(2) - if protocol_name not in res_dict: res_dict[protocol_name] = [] - - res_dict[protocol_name].append(time) + res_dict[protocol_name].append(match.group(2)) #TEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMP @@ -262,9 +270,9 @@ def pront(*args, t=None): # Select QC protocols and times readnames, savenames, times_list = [], [], [] for protocol in res_dict: - if protocol not in qc_map: + if protocol not in staircase_protocols: continue - savename = qc_map[protocol] + savename = staircase_protocols[protocol] times = sorted(res_dict[protocol]) if len(times) == 2: @@ -278,8 +286,8 @@ def pront(*args, t=None): times_list.append([times[0], times[2]]) # Make seperate savename for protocol repeat - savename = qc_map[protocol] + '_2' - assert savename not in write_map.values() + savename = staircase_protocols[protocol] + '_2' + assert savename not in additional_protocols.values() savenames.append(savename) times_list.append([times[1], times[3]]) readnames.append(protocol) @@ -339,16 +347,15 @@ def pront(*args, t=None): # # Now go over _all_ protocols, including the QC protocols (again!), and - # call run_generic_qc() on them + # call run_secondary_qc() on them # - combined_dict = qc_map | write_map # Export all protocols savenames, readnames, times_list = [], [], [] for protocol in res_dict: # Sort into chronological order - savename = combined_dict[protocol] + savename = all_protocols[protocol] readnames.append(protocol) times = sorted(res_dict[protocol]) @@ -361,8 +368,8 @@ def pront(*args, t=None): times_list.append(times[::2]) # Make seperate savename for protocol repeat - savename = combined_dict[protocol] + '_2' - assert savename not in combined_dict.values() + savename = all_protocols[protocol] + '_2' + assert savename not in all_protocols.values() savenames.append(savename) readnames.append(protocol) times_list.append(times[1::2]) @@ -370,20 +377,20 @@ def pront(*args, t=None): else: raise Exception('Expecting QC protocol run 2 or 4 times') - wells_to_export = wells if write_failed_traces else selection + wells_to_export = wells if include_failed_traces else selection logging.info(f'exporting wells {wells_to_export}') m = len(readnames) n = min(max_processes, m) args = zip(readnames, savenames, times_list, [wells_to_export] * m, [output_path] * m, [data_path] * m, [figure_size] * m, - [reversal_potential] * m, [save_id] * m) - dfs = starmap(n, run_generic_qc, args) + [reversal_potential] * m, [save_id] * m, [write_traces] * m) + dfs = starmap(n, run_secondary_qc, args) if not dfs: raise Exception('No data exported') # # At this point, dfs is a list containing one dataframe _per protocol_. If - # write_failed_traces is set, each dataframe contains information on all + # include_failed_traces is set, each dataframe contains information on all # wells, if not, each contains information only on wells passing # staircase QC. # @@ -393,8 +400,8 @@ def pront(*args, t=None): # # At this point, extract_df is a big dataframe with information about all - # protocols for all wells (write_failed_traces=True) or just the wells - # passing staircase QC (write_failed_traces=False). + # protocols for all wells (include_failed_traces=True) or just the wells + # passing staircase QC (include_failed_traces=False). # It inludes new calculated quantities such as ``Erev`` and ``R_leftover``. # @@ -405,7 +412,11 @@ def pront(*args, t=None): # # This is run on all wells, regardless of staircase QC. # - protocol, savename = list(qc_map.items())[0] + # TODO: This doesn't require the export and is staircase only, so should + # probably happen earlier in the sequence and affect the well selection for + # secondary QC. + # + protocol, savename = list(staircase_protocols.items())[0] times = sorted(res_dict[protocol]) if len(times) == 4: # If we have staircase-other-staircase before & after drug @@ -423,9 +434,21 @@ def pront(*args, t=None): del(qc3_bookend_dict) # + # Write a file chrono.txt containing the order that protocols were run in # - # + chrono = {times[0]: prot for prot, times in zip(savenames, times_list)} + with open(os.path.join(output_path, 'chrono.txt'), 'w') as fout: + fout.write('\n'.join(chrono[key] for key in sorted(chrono)) + '\n') + # + # Perform multi-protocol QC per well. + # - Check that QC1, QC4, QC6 and QC_Erev pass for all protocols. + # - Check that QC3 bookend passes for "all" staircase protocols (typically + # this is just a single check). + # - Check that the range of Erev values is below a threshold + # - Check that all staircase QCs passed + # - Check that QC_R_leftover passed for all staircase protocols + # erev_spreads = {} qc_erev_spread = {} passed_qc_dict = {} @@ -437,40 +460,38 @@ def pront(*args, t=None): sub_qc_df = qc_df[qc_df.well == well] # Check if generic QC passed for all protocols for this well - passed_QC_Erev_all = np.all(sub_df['QC.Erev'].values) passed_QC1_all = np.all(sub_df.QC1.values) passed_QC4_all = np.all(sub_df.QC4.values) passed_QC6_all = np.all(sub_df.QC6.values) - passed_qc3_bookend = np.all(sub_qc_df['qc3.bookend'].values) passed_QC_Erev_all = np.all(sub_df['QC.Erev'].values) + passed_qc3_bookend = np.all(sub_qc_df['qc3.bookend'].values) logging.info(f'passed_QC1_all {passed_QC1_all}') logging.info(f'passed_QC4_all {passed_QC4_all}') logging.info(f'passed_QC6_all {passed_QC6_all}') - logging.info(f'passed_QC3_bookend_all {passed_qc3_bookend}') logging.info(f'passed_QC_Erev_all {passed_QC_Erev_all}') - + logging.info(f'passed_QC3_bookend_all {passed_qc3_bookend}') # Perform "QC_Erev_spread": this passes if the range of E_rev values in # all protocols is below a threshold # QC Erev spread: check spread in reversal potential isn't too large E_revs = sub_df['E_rev'].values.flatten().astype(np.float64) E_rev_spread = E_revs.max() - E_revs.min() + del(E_revs) passed_QC_Erev_spread = E_rev_spread <= reversal_spread_threshold logging.info(f'passed_QC_Erev_spread {passed_QC_Erev_spread}') - del(E_revs) erev_spreads[well] = E_rev_spread qc_erev_spread[well] = passed_QC_Erev_spread - # R_leftover only considered for protocols used for QC (i.e. staircase protocols) - passed_QC_R_leftover = np.all(sub_df[sub_df.protocol.isin(qc_map.values())] - ['QC.R_leftover'].values) - - logging.info(f"passed_QC_R_leftover {passed_QC_R_leftover}") - + # R_leftover only considered for protocols used for QC (staircases) + passed_QC_R_leftover = np.all( + sub_df[sub_df.protocol.isin(staircase_protocols.values())]['QC.R_leftover'].values) + logging.info(f'passed_QC_R_leftover {passed_QC_R_leftover}') + # All staircase protocol QCs passed_staircase_QC = np.all(sub_df['selected'].values) - passed_qc = ( + # Combine all into a final verdict + passed_qc_dict[well] = ( passed_staircase_QC and passed_QC1_all and passed_QC4_all @@ -480,53 +501,43 @@ def pront(*args, t=None): and passed_QC_Erev_spread ) - passed_qc_dict[well] = passed_qc - - pront(extract_df, t='extract_df, pre') extract_df['passed QC'] = [passed_qc_dict[well] for well in extract_df.well] extract_df['QC.Erev.spread'] = [qc_erev_spread[well] for well in extract_df.well] extract_df['Erev_spread'] = [erev_spreads[well] for well in extract_df.well] - pront(extract_df, t='extract_df, post') - sys.exit(1) - chrono_dict = {times[0]: prot for prot, times in zip(savenames, times_list)} + # + # At this point, extract_df has 3 new columns. + # Confusingly, it has the old column "selected" indicating whether or not + # staircase QC passed, and a new column "passed QC" indicating final + # selection. + # - pront('CHRONO', os.path.join(output_path, 'chrono.txt')) - with open(os.path.join(output_path, 'chrono.txt'), 'w') as fout: - for key in sorted(chrono_dict): - val = chrono_dict[key] - #  Output order of protocols - fout.write(val) - fout.write('\n') - - #  Update qc_df - update_cols = [] + # + # Copy true/false assessments from extract_df into qc_df + # + new_cols = [] for index, vals in qc_df.iterrows(): - append_dict = {} + sub_df = extract_df[(extract_df.well == vals['well'])] + row = {} + row['QC.Erev.all_protocols'] = np.all(sub_df['QC.Erev']) + row['QC.Erev.spread'] = np.all(sub_df['QC.Erev.spread']) + row['QC1.all_protocols'] = np.all(sub_df['QC1']) + row['QC4.all_protocols'] = np.all(sub_df['QC4']) + row['QC6.all_protocols'] = np.all(sub_df['QC6']) + new_cols.append(row) + for key in new_cols[0]: + qc_df[key] = [row[key] for row in new_cols] - well = vals['well'] - - sub_df = extract_df[(extract_df.well == well)] - - append_dict['QC.Erev.all_protocols'] =\ - np.all(sub_df['QC.Erev']) - - append_dict['QC.Erev.spread'] =\ - np.all(sub_df['QC.Erev.spread']) + # + # At this point, extract_df contains mostly numerical values, and a few + # True/False results, while qc_df contains pass/fails. + # - append_dict['QC1.all_protocols'] =\ - np.all(sub_df['QC1']) + sys.exit(1) - append_dict['QC4.all_protocols'] =\ - np.all(sub_df['QC4']) - append_dict['QC6.all_protocols'] =\ - np.all(sub_df['QC6']) - update_cols.append(append_dict) - for key in append_dict: - qc_df[key] = [row[key] for row in update_cols] qc_styled_df = create_qc_table(qc_df) logging.info(qc_styled_df) @@ -623,16 +634,16 @@ def agg_func(x): return ret_df -def run_generic_qc(readname, savename, time_strs, selected_wells, savedir, - data_path, figure_size, reversal_potential, save_id): +def run_secondary_qc(readname, savename, time_strs, selected_wells, savedir, + data_path, figure_size, reversal_potential, save_id, + write_traces): """ - Performs QC on a protocol (staircase or other), and extracts and stores the - data. + Performs QC on a protocol (staircase or other), and exports the traces. Loads the data (again!) and - leak-subtracts it using the leak step - - stores before, after, and subtracted traces in ``traces`` (for all - sweeps), along with time and voltage traces + - if ``write_traces`` is True``, stores before, after, and subtracted + traces in ``traces`` (for all sweeps), along with time and voltage traces - performs QC (1, 4, 6, Erev, R_leftover) using the meta data, leak step, and up-down step after the leak step - creates and returns a data frame, with entries detailed below @@ -710,11 +721,12 @@ def run_generic_qc(readname, savename, time_strs, selected_wells, savedir, # Store voltages and times, using 2 different libraries... # TODO: REPLACE WITH SINGLE TIMES, SINGLE VOLTAGE - voltage_df = pd.DataFrame(np.vstack((times, voltages)).T, - columns=['time', 'voltage']) - voltage_df.to_csv(os.path.join( - trace_dir, f'{save_id}-{savename}-voltages.csv')) - #write_csv(times, trace_dir, f'{save_id}-{savename}-times.csv') + if write_traces: + voltage_df = pd.DataFrame( + np.vstack((times, voltages)).T, columns=['time', 'voltage']) + voltage_df.to_csv(os.path.join( + trace_dir, f'{save_id}-{savename}-voltages.csv')) + #write_csv(times, trace_dir, f'{save_id}-{savename}-times.csv') qc_before = before_trace.get_onboard_QC_values() qc_after = after_trace.get_onboard_QC_values() @@ -723,16 +735,17 @@ def run_generic_qc(readname, savename, time_strs, selected_wells, savedir, after_data = after_trace.get_trace_sweeps() # Save before and after drug traces as .csv - for well in selected_wells: - for sweep in range(nsweeps): - write_csv( - before_data[well][sweep], trace_dir, - f'{save_id}-{savename}-{well}-before-sweep{sweep}.csv', - header='current') - write_csv( - after_data[well][sweep], trace_dir, - f'{save_id}-{savename}-{well}-after-sweep{sweep}.csv', - header='current') + if write_traces: + for well in selected_wells: + for sweep in range(nsweeps): + write_csv( + before_data[well][sweep], trace_dir, + f'{save_id}-{savename}-{well}-before-sweep{sweep}.csv', + header='current') + write_csv( + after_data[well][sweep], trace_dir, + f'{save_id}-{savename}-{well}-after-sweep{sweep}.csv', + header='current') # Reusable figure to plot subtraction fig = plt.figure(figsize=figure_size, layout='constrained') @@ -761,11 +774,12 @@ def run_generic_qc(readname, savename, time_strs, selected_wells, savedir, before_corrected = before_data[well][sweep] - before_leak after_corrected = after_data[well][sweep] - after_leak - # Store drug-subtracted trace + # Create and store drug-subtracted trace subtracted = before_corrected - after_corrected - write_csv( - subtracted, trace_dir, - f'{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv') + if write_traces: + write_csv( + subtracted, trace_dir, + f'{save_id}-{savename}-{well}-sweep{sweep}-subtracted.csv') # Create subtraction plot fig = plt.figure(figsize=figure_size, layout='constrained') @@ -1053,7 +1067,7 @@ def qc3_bookend(readname, savename, time_strs, wells, output_path, #  Iterate over all wells, perform qc3-bookend, plot and store res_dict = {} - for well in np.array(wells).flatten(): + for well in wells: # First staircase, before drug and after drug, first sweep before_trace_first = get_leak_corrected( first_before_current_dict[well][0], voltage, times, *ramp_bounds) From d560541bd9d3ac7f2e2dc7d1dd7e20c0fddf6272 Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Thu, 8 Jan 2026 08:07:04 +0000 Subject: [PATCH 8/9] Tidied run_herg_qc script, for a decent bit --- pcpostprocess/scripts/run_herg_qc.py | 57 +++++++++------------------- 1 file changed, 18 insertions(+), 39 deletions(-) diff --git a/pcpostprocess/scripts/run_herg_qc.py b/pcpostprocess/scripts/run_herg_qc.py index 0584954..b03d65e 100644 --- a/pcpostprocess/scripts/run_herg_qc.py +++ b/pcpostprocess/scripts/run_herg_qc.py @@ -156,11 +156,8 @@ def run(data_path, output_path, save_id, staircase_protocols, 5. A file "chrono.txt" is created with the order that protocols were run in 6. Multi-protocol QC is performed on all wells that passed staircase QC (or all wells, if ``include_failed_traces=True``. - 7. - - - - + 7. QC results are stored in CSV, JSON, and tex, plus a file + ``passed_wells.txt`` listing the wells that passed final QC. @param data_path The path to read data from @param output_path The path to write output to @@ -185,7 +182,6 @@ def run(data_path, output_path, save_id, staircase_protocols, @param figure_size An optional tuple specifying the size of figures to create - """ # TODO reversal_spread_threshold should be specified the same way as all # other QC thresholds & parameters. @@ -245,23 +241,10 @@ def run(data_path, output_path, save_id, staircase_protocols, res_dict[protocol_name] = [] res_dict[protocol_name].append(match.group(2)) - #TEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMP - - def pront(*args, t=None): - if t is None: - print('********') - else: - print(f'******** {t} ********') - for arg in args: - print(arg) - - pd.set_option('display.max_columns', None) - pd.set_option('display.max_rows', None) - - #TEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMPTEMP - + # # At this point, despite its name, res_dict is not a dictionary of results, # but a map of QC protocol names onto lists of times (see comment above) + # # # Prepare arguments to call `run_staircase_qc` @@ -533,30 +516,22 @@ def pront(*args, t=None): # True/False results, while qc_df contains pass/fails. # - sys.exit(1) - - - - - - qc_styled_df = create_qc_table(qc_df) - logging.info(qc_styled_df) - qc_styled_df.to_latex(os.path.join(output_path, 'qc_table.tex')) - - # Save in csv format + # Store decisions in tex, csv, and JSON + create_qc_table(qc_df).to_latex(os.path.join(output_path, 'qc_table.tex')) qc_df.to_csv(os.path.join(output_path, f'QC-{save_id}.csv')) - - # Write data to JSON file qc_df.to_json(os.path.join(output_path, f'QC-{save_id}.json'), orient='records') - #  Load only QC vals. TODO use a new variabile name to avoid confusion - qc_vals_df = extract_df[['well', 'sweep', 'protocol', 'Rseal', 'Cm', 'Rseries']].copy() + # Store subtraction / extract / secondary QC numbers as CSV + extract_df.to_csv(os.path.join(output_path, 'subtraction_qc.csv')) + + #  Store QC numbers as CSV + fields = ['well', 'sweep', 'protocol', 'Rseal', 'Cm', 'Rseries'] + qc_vals_df = extract_df[fields].copy() qc_vals_df['drug'] = 'before' qc_vals_df.to_csv(os.path.join(output_path, 'qc_vals_df.csv')) - extract_df.to_csv(os.path.join(output_path, 'subtraction_qc.csv')) - + # Store passed wells with open(os.path.join(output_path, 'passed_wells.txt'), 'w') as fout: for well, passed in passed_qc_dict.items(): if passed: @@ -566,7 +541,11 @@ def pront(*args, t=None): def create_qc_table(qc_df): """ - ??? + Create a tex table based on a dataframe containing true/false for various + decision criteria. + + The table is returned as another dataframe, which can be made into tex + using ``.to_latex(path)``. """ if len(qc_df.index) == 0: From e6c7b4407b89604a3164026903c73f3155c6121d Mon Sep 17 00:00:00 2001 From: Michael Clerx Date: Thu, 8 Jan 2026 13:38:27 +0000 Subject: [PATCH 9/9] Updated README --- README.md | 146 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 75 insertions(+), 71 deletions(-) diff --git a/README.md b/README.md index 62f4d7d..0937ad2 100644 --- a/README.md +++ b/README.md @@ -46,102 +46,106 @@ python3 -m unittest ## Usage -### Classic case +We are working to make pcpostprocess a reusable tool for different types of data. +At the moment, it is geared towards the specific use case detailed below. -- Syncropatch 384 ("DataControl 384" output) -- A full repeat of all protocols with an IKr blocker, e.g. E-4031 -- A staircase protocol, run 4 times: +### Experimental data + +- Syncropatch 384 ("DataControl 384") data +- The full set of protocols is repeated after addition of a strong IKr blocker, e.g. E-4031 +- Each set of protocols starts and ends with a "staircase" protocol, leading to a total of 4 staircases: - Before and after all other protocols, before drug block - Before and after all other protocols, after drug block -- Additional protocols ran in between the staircases -- Leak subtraction by... +So a full set of experiments would go "staircase, other1, other2, ... staircase, drug addition & wash in, staircase, other1, other2, ..., staircase". -Quality control (QC) may be run using the criteria outlined in [Rapid Characterization of hERG Channel Kinetics I](https://doi.org/10.1016/j.bpj.2019.07.029) and [Evaluating the predictive accuracy of ion channel models using data from multiple experimental designs](https://doi.org/10.1101/2024.08.16.608289). -These criteria assume the use of the `staircase` protocol for quality control, which should be the first and last protocol performed. -We also assume the presence of repeats after the addition of an IKr blocker (such as dofetilide). +Each protocol +- Starts with a leak estimation ramp +- Followed by a step to 40mV and then to -120mV +- Ends with a reversal potential step +- Provides an estimate of Rseal, Rseries, and Cm -Prior to performing QC and exporting, an `export_config.py` file should be added to the root of the data directory. -This file (see `example_config.py`) contains a Python `dict` (`Q2S_DC`) specifying the filenames of the protocols used for QC, and names they should be outputted with, as well as a Python dict (`D2S`) listing the other protocols and names to be used for their output. -Additionally, the `saveID` field specifies the name of the expeirment which appears in the output file names. +### Leak correction -An example input directory might contain the following subdirectories -- `staircaseramp (2)_2kHz_15.01.07`, staircase 1 before E-4031 -- `StaircaseInStaircaseramp (2)_2kHz_15.01.51`, non-QC protocol -- `staircaseramp (2)_2kHz_15.06.53`, staircase 2 before E-4031 -- `staircaseramp (2)_2kHz_15.11.33`, staircase 1 after E-4031 -- `StaircaseInStaircaseramp (2)_2kHz_15.12.17`, non-QC protocol -- `staircaseramp (2)_2kHz_15.17.19`, staircase 2 after E-4031 +- Leak correction is applied by estimating linear leak from the leak step. +- Before and after drug traces are both corrected +- The final signal is leak-corrected-before minus leak-corrected-after -`run_herg_qc` will use the time parts of the staircase names to work out which is which. -This assumes there are either 2 staircase runs (once before drug, once after), or 4 (as above) +### Quality control -Example: -``` -# Configuration for run_herg_qc +Quality control is based on [Lei et al. (2019) Rapid Characterization of hERG Channel Kinetics I](https://doi.org/10.1016/j.bpj.2019.07.029) and [Shuttleworth et al. (2025) Evaluating the predictive accuracy of ion channel models using data from multiple experimental designs](https://doi.org/10.1098/rsta.2024.0211). -# Save name for this set of data -saveID = 'EXPERIMENT_NAME' +First, all four staircase protocols are checked in all wells using the Lei et al. criteria, along with new measures described in Shuttleworth et al. +Any wells not rejected are then tested using a smaller set of (less protocol specific) criteria detailed in Shuttleworth et al. +Only wells passing all tests on all protocols are retained. -# Name of protocols to use in QC, mapped onto names used in output -D2S_QC = { - 'staircaseramp (2)_2kHz': 'staircaseramp' -} +### Writing an export_config.py -# Name of additional protocols to export, but not use in QC (again as a dict) -D2S = { - 'StaircaseInStaircaseramp (2)_2kHz': 'sis', -} -``` +Prior to performing QC and exporting, an `export_config.py` file should be added to the root of the data directory. +An example file is available [here](./example_config.py). +This should contain three entries: -Next, we call `run_herg_qc`: -```sh -pcpostprocess run_herg_qc test_data/13112023_MW2_FF -w A01 A02 A03 -o output --output_traces +1. A `saveID` variable, providing a name to use in exported data. +2. A dictionary variable `Q2S_DC` indicating the name of the staircase protocol, and mapping it onto an more user-friendly name used in export. +3. A dictionary variable `D2S` indicating names of other protocols to export, again mapping onto names for export. -``` +For example, in the test data (see "Getting Started") above, we have six subdirectories: +- `staircaseramp (2)_2kHz_15.01.07`, staircase run as first protocol before E-4031 +- `StaircaseInStaircaseramp (2)_2kHz_15.01.51`, non-QC protocol +- `staircaseramp (2)_2kHz_15.06.53`, staircase run as last protocol before E-4031 +- `staircaseramp (2)_2kHz_15.11.33`, staircase run as first protocol after E-4031 addition +- `StaircaseInStaircaseramp (2)_2kHz_15.12.17`, non-QC protocol +- `staircaseramp (2)_2kHz_15.17.19`, staircase run as final protocol +Here each directory name is a protocol name followed by a timestamp. +Corresponding dictionaries could be `Q2S_DC = {'staircaseramp (2)_2kHz': 'staircase'}` (indicating the staircase protocol) and `D2S = {'StaircaseInStaircaseramp (2)_2kHz': 'sis'}`. +The saveID could be any string, but in our example we use `saveID = '13112023_MW2'` +### Running +To run, we call `run_herg_qc` with: ```sh -$ pcpostprocess run_herg_qc --help +pcpostprocess run_herg_qc test_data/13112023_MW2_FF -o output --output_traces -w A01 A02 A03 ``` +Here: +- `test_data/13112023_MW2_FF` is the path to the test data, +- `-o output` tells the script to store all results in the folder `output` +- `--output_traces` tells the script to export the data (instead of running QC only), and +The last part, `-w A01 A02 A03`, tells the script to only check the first three wells. +This speeds things up for testing, but would be omitted in normal use. - - - - -### Exporting Summary - -The `summarise_herg_export` command produces additionally output after `run_herg_qc` has been run. +To see the full set of options, use ```sh -$ pcpostprocess summarise_herg_export --help - -usage: pcpostprocess summarise_herg_export [-h] [--cpus CPUS] - [--wells WELLS [WELLS ...]] [--output OUTPUT] - [--protocols PROTOCOLS [PROTOCOLS ...]] [-r REVERSAL] - [--experiment_name EXPERIMENT_NAME] - [--figsize FIGSIZE FIGSIZE] [--output_all] - [--log_level LOG_LEVEL] - data_dir qc_estimates_file - -positional arguments: - data_dir path to the directory containing the run_herg_qc results - -options: - -h, --help show this help message and exit - --cpus CPUS, -c CPUS - --wells WELLS [WELLS ...], -w WELLS [WELLS ...] wells to include in the output (default: all) - --output OUTPUT, -o OUTPUT path where the output will be saved - --protocols PROTOCOLS [PROTOCOLS ...] protocols to include (default: all) - -r REVERSAL, --reversal REVERSAL the reversal potential during the experiment - --experiment_name EXPERIMENT_NAME - --figsize FIGSIZE FIGSIZE - --output_all Flag specifying whether to output all plots (default: false) - --log_level LOG_LEVEL +$ pcpostprocess run_herg_qc --help ``` +### Interpreting the output + +After running the command above on the test data provided, the directory `output` will contain the subdirectories: + +- `-120mV time constant` Plots illustrating the fits performed when estimating time constants. These are output but not used in QC. +- `leak_correction` Plots illustrating the _linear_ leak correction process in all wells passing staircase QC. +- `qc3-bookend` Plots illustrating the "QC3 bookend" criterion used in Shuttleworth et al. +- `reversal_plots` Plots illustrating the reversal potential estimation +- `subtraction_plots` Plots illustrating the drug subtracted trace leak removal. +- `traces` The exported traces (times, voltages, currents) + +and the files: + +- `chrono.txt` Lists the order in which protocols were run. For the staircase, the repeat at the end of the sequence is indicated by an added `_2`. +- `passed_wells.txt` Lists the wells that **passed all QC**. +- `pcpostprocess_info.txt` Information on the run that generated this output (date, time, command used etc.) +- `QC-13112023_MW2.csv` A CSV representation of the pass/fail results for each individual criterion +- `QC-13112023_MW2.json` A JSON representation of the same +- `qc_table.tex` A table in Tex format with the above. +- `qc_vals_df.csv` Numerical values used in the final QC. +- `selected-13112023_MW2.txt` **Preliminary QC selection** based on staircase protocol 1 (before other protocols) and 2 (after other protocols) +- `selected-13112023_MW2-staircaseramp.txt` Preliminary selection based on staircase 1 +- `selected-13112023_MW2-staircaseramp_2.txt` Preliminary selection based on staircase 2 +- `subtraction_qc.csv` Numerical values used in the final QC and leak subtraction. + ## Contributing Although `pcpostprocess` is still early in its development, we have set up guidelines for user contributions in [CONTRIBUTING.md](./CONTRIBUTING.md).