0
Follow
2
View

How does Matlab filter data according to conditions to find the daily mean value

d20130101 注册会员
2023-02-28 10:22

You can use the functions provided in Matlab, such as isnan and find, to filter the data by conditions, and then use the mean function to calculate the daily mean, the specific steps are as follows:

Step1: Use isnan function to get all NaN values, and store the results in logical variable;

Step2: Screen out
Method 1:
1 according to the content of logical variable. Use the Matlab built-in functions find and isnan to find and determine the number of NaN values in each daily data.
2. Use the built-in function mean in Matlab to calculate the daily mean value and ignore the NaN value.
3. Judge whether the number of NaN values in a day is greater than 40%. If so, the mean value of the day is regarded as invalid.

Method 2
Method 1:
1. You can use the find function of Matlab to find out the index number of NaN value, that is, find out all NaN and store it in a matrix. 2. Use Matlab's reshape function, and stretch the matrix into a column, with 48 elements in each row. This splits the day's data.
3. First of all, the original data can be screened according to the requirements, and only the data that meets the requirements can be placed in the same variable:

% Extract the data that meets the conditions
data_valid = rawdata(~isnan(rawdata));

% Filters by day
num_data = length(rawdata); % Data total
num_day = num_data/48; % Days
samples= reshaping(data_valid, 48, num_day); %
Use Matlab's find command to help filter NaN values. In addition, the daily mean of this dataset can be computed using the nanmean function, which automatically skips the NaN value calculation without further filtering.

For the data values of each day, first use the find command to find the number of NaN values, and then the total number. First, function reshaping and nanmean in Matlab can be used to solve.

First, use the reshaping function to rearrange the original data into the result of sampling 48 times a day. for example, for a data of 88 days, an 88x48 matrix can be reconstructed, and the element of nan can be changed to 0. In this way, each of the processed matrix can use Matlab's for loop statements. Judge whether the number of NaN values of each day exceeds 40%. If not, all the effective values of the day are summed up and recorded, and then divided by the number of effective values to get the average value of the day.
sundahai_3 注册会员
2023-02-28 10:22

table link to a

qklwdd1 注册会员
2023-02-28 10:22

This answer quotes ChatGPT

Let's assume that the data has been stored in a matrix called "data", where each row represents a half-hour average for a point in time, including date and time, and each column represents a different measurement. To calculate the average for each day, you can use the following code:

% 读取数据
data = readmatrix('data.csv');

% 将日期转换为 MATLAB 的日期序列
dates = datenum(data(:,1));

% 将NaN值替换为0
data(isnan(data)) = 0;

% 计算每一天的总和和有效值的数量
daySum = accumarray(floor(dates),data(:,2:end),[],@sum);
dayValidCount = accumarray(floor(dates),data(:,2:end)~=0,[],@sum);

% 计算每一天的平均值,如果无效值的数量大于40%,则将该天的平均值设置为NaN
dayMean = daySum./dayValidCount;
dayMean(dayValidCount./size(data,2) < 0.6) = NaN;

% 将结果保存为一个csv文件
csvwrite('day_mean.csv', dayMean);


duanduan630 注册会员
2023-02-28 10:22

Reference to GPT and their own ideas, you can use MATLAB built-in functions to achieve conditional screening data for daily mean.

First, you can use the readtable function to read the data into MATLAB and convert the timestamp into MATLAB's datetime format. Assuming that the data file is named data.csv, the timestamp column is named Time, and the average value column is named Avg, you can use the following code:

data = readtable('data.csv');
data.Time = datetime(data.Time, 'InputFormat', 'yyyy-MM-dd HH:mm:ss');
You can then use the day function to extract the date corresponding to each data point and group the data by date:
dates = day(data.Time);
groupedData = splitapply(@(x) {x}, data, dates);

groupedData is now an array of cells, each element containing data for a date. Next, you can loop through the data for each date, filtering the data conditionally and averaging it:

dailyAverages = NaN(size(groupedData));
for i = 1:length(groupedData)
    dailyData = groupedData{i};
    nanRatio = sum(isnan(dailyData.Avg)) / height(dailyData);
    if nanRatio > 0.4
        dailyAverages(i) = NaN;
    else
        dailyAverages(i) = mean(dailyData.Avg, 'omitnan');
    end
end

In the above code, the isnan function is used to determine if the data is NaN, the sum function is used to count the number of NaN values, the height function is used to get the number of rows in the table, the mean function is used to calculate the mean value, and the 'omitnan' option is used to ignore NaN values.

Finally, dailyAverages will contain the average for each date, where invalid values are replaced with NaN.

dapsjj 注册会员
2023-02-28 10:22

In Matlab, the following steps can be used to achieve the daily mean solution and invalid value judgment of the half-hour average data with NaN value:
1. Divide the data into multiple subarrays by date. Each subarray contains a day's worth of data. You can use the datetime function to convert the time string into Matlab's date format and use the unique function to get all the dates in the data.
2. For each subarray, calculate the mean of the non-nan values. You can calculate the mean using the nanmean function, which ignores NaN values
3. For each subarray, calculate the ratio of NaN values in it. You can use the isnan function to determine which elements are Nans, and then use the sum function to calculate the number of Nans.
4. For each subarray, determine whether the proportion of NaN value is greater than 40%. If so, the average value of the subarray is invalid
Here is the code implementation

% 带有 NaN 值的逐半小时平均数据
data = [1 2 NaN NaN 5 NaN NaN 8 9 NaN 10 11];

% 时间戳,假设每个时间戳间隔为半小时
timestamps = datetime('now') - hours(length(data)/2:-0.5:0.5);

% 按照日期划分数据
dates = dateshift(timestamps, 'start', 'day');
unique_dates = unique(dates);
daily_data = cell(length(unique_dates), 1);
for i = 1:length(unique_dates)
    daily_data{i} = data(dates == unique_dates(i));
end

% 计算日均值
daily_means = nan(length(unique_dates), 1);
for i = 1:length(unique_dates)
    daily_means(i) = nanmean(daily_data{i});
end

% 判断无效值
threshold = 0.4;
is_invalid = cellfun(@(x) sum(isnan(x))/length(x) > threshold, daily_data);
daily_means(is_invalid) = NaN;

About the Author

Question Info

Publish Time
2023-02-28 10:22
Update Time
2023-02-28 10:22